4

I have a dataset with missing or wrong values. A module should replace those values so that the dataset length is preserved. For identifying the positions to be replaced I use Position[].

Here is some data

data = {1, 2, 0, 4, , 6, "", 8} (* test set *);
criterList = {0, "", Null} (* selection criteria *);

Outside a Module it works correctly like this

temp = Or @@ Map[u == # &, criterList];
Position[data, u_ /; Evaluate@temp]
{{3}, {5}, {7}}

The positions of values to be replaced are correctly identified.

However, inside a Module the same Code doesn't evaluate as above:

posMod[data_, criterList_] := Module[{u, res},
     temp = Or @@ Map[u == # &, criterList];
     Print[temp];
     Print["a)  ", Position[data, u_ /; Evaluate@temp]]   (* Why doesnt it work? *);
     Print["b)  ", Position[data, u_ /; {u == 0 || u == "" || u == Null}]] (* temp evaluated *);
     Print["c)  ", Position[data, u_ /; Evaluate[Or @@ Map[u == # &, criterList]]  ]] (*just copied *);
     ]

posMod[data, criterList]

a) {}
b) {}
c) {{3},{5},{7}}

Why does a) and b) not evaluate like c)?

Kuba
  • 136,707
  • 13
  • 279
  • 740
Hargrot
  • 195
  • 1
  • 6
  • 1
    This boils down to why Module[{u}, {u, u_, u_ /; u}] returns what it returns. Aside of the main question, you can use Replace[data, Alternatives@@cri -> whatever, {1}] – Kuba Jan 06 '18 at 12:21
  • for (b) use u_ /. u == 0 || u == "" || u == Null instead of u_ /; {u == 0 || u == "" || u == Null}? – kglr Jan 06 '18 at 12:37
  • @Kuba: Thanks for giving the hint for Alternatives. I'd like to replace the value by e.g. the Mean of the nearest non-matching neighbors. – Hargrot Jan 06 '18 at 12:38
  • 1
    for (a) Position[data, Alternatives @@ criterList]? – kglr Jan 06 '18 at 12:41
  • use Block instead of Module? – kglr Jan 06 '18 at 12:46

1 Answers1

7

Understanding what is going on

Condition seems to be considered a scoping construct and issues with automatic renaming apply.

It is a good idea to read this topic with understanding: Enforcing correct variable bindings and avoiding renamings for conflicting variables in nested scoping constructs

So essentially u in temp is not the same as u in u_ /; ... as the first one was scoped and the second on is treated by Module as an inner scoping construct and left alone.

Include Print[u : _ /; Evaluate@temp] to see

u_ /; u$6121==0 || u$6121== " " || u$6121==Null

Let's take a closer look:

Module[{u, f}, Column@{ u,   Module[{u}, u],   u_ /; u,   u_ /; f + u,   }]
u$8962
u$8963
u_/;u
u$_/;f$8962+u$

1st line shows scoped u

2nd line shows that the outer module left inner's module u alone, otherwise we would see u$8962$8963

3rd line shows that here u is left alone too

4th line shows that u is replaced with $u, just in case, because another scoped variable appears in condition's body. This often causes confusion as shown in linked topic.

Why does a) and b) not evaluate like c)?

Module[{u}, Column@{
   u_ /; Evaluate@temp,
   u_ /; {u == 0 || u == "" || u == Null},
   u_ /; Evaluate[Or @@ Map[u == # &, criterList]]
}]
u_/;u$6121==0||u$6121==||u$6121==Null
u_/;{u==0||u==||u==Null}
u_/;u==0||u==||u==Null

a) because of the issue explained above b) because the condition test results in a List {...} not a True/False c) everything is fine because u was not scoped by Module as it saw it being internal to Condition.

What to do?

Replace[data, Alternatives@@criterList-> whatever, {1}]
Kuba
  • 136,707
  • 13
  • 279
  • 740