2

In an answer to a question (as given here: code), the proposed function was to to subtract a set B from a Set A:

removeFrom2[b_List, a_List] := Module[{f, g},
  (f[#] = -#2) & @@@ Tally[a];
  g[x_] /; f[x] < 0 := f[x]++;
  g[_] = True;
  Select[b, g]
]

I cannot understand how that code works. Especially lines 2 and 3 remain a mistery. Can someone explain to me what is meant by line 2 (what I understand is, g[x]=f[x]++ if f[x]<0 (is it a sort of loop?)).

BetterEnglish
  • 2,026
  • 13
  • 19

1 Answers1

7

I'm just going to walk through all of it. If something is too pedantic, skip it.

Module[{f,g}...

creates a scoping construct so the definitions of f and g are local to this code.

Tally[a] produces a list of all the elements in a and a count for each element. For instance, Tally[{a,a,b,c,a,d,d}] would give {{a,3},{b,1},{c,1},{d,2}}.

The strange notation f[#] = -#2&@@@Tally[a] is equivalent to looping over the list returned byTally[a] and assigning the count of element j to f[j]. So in the example above f[a] = -3, f[b] = -1, f[c] = -1, f[d] = -2. You can read about pure functions, slots, and Apply in the docs to make sense of the notation here. (N.B. The -#2 should probably be -(#2+1), since this doesn't seem to handle duplicates correctly as-is).

Now we define the criterion function g. Select will apply this function to the list b and select any element for which g[element] = True. The line

g[x_]/; f[x]< 0 := f[x]++;

says for any value of x such that f[x]<0 g[x] := f[x]++; (/; is called a Condition if you want to look it up and can be read as 'such that'). The := is SetDelayed meaning the value of g[x] is recomputed each time using the current value of f. So as we select x from b, f[x] is incremented, and g[x] checks the condition again the next time it's called. Notice that if we ask for a value of f that hasn't been defined, for instance, using the a I defined above f[q] returns f[q], which won't pass the test given by the condition.

So what if we've incremented f[i] to the point that f[i] = 0 or otherwise fails the test in the condition? MMA will look for another definition for g that may apply until it runs out of definitions to try (which is what happened when I asked for f[q]). But, here we've provided a default definition that will apply for any argument, so MMA falls back to g[_]=True. (You can read more about UpValues and DownValues if you're curious how this stuff works.)

As Select loops through the i elements in list b it asks what g[ b[[i]] ] is, if the number is negative the counter for that value is increased. If we encounter the same number again, the counter is increased again, unless the counter is now 0. If that's the case we fall back to our default definition of g which returns True, and we take that number. We skip over any number for which g does not return True. Since we only defined f for values in the list a any value not in list a automatically returns True. If a value j appeared in the list a n times we only select j once we've seen n of them, such that f[j]=0 and g is True by default.


The fact that Select moves element by element keeping track of f without doing some clever parallel task with copies of f behind the scenes is beyond my understanding. @Mr.Wizard usually knows what he's doing though, and it seems to work. Maybe he'd be willing to comment on that bit.

N.J.Evans
  • 5,093
  • 19
  • 25