33

I would like to understand the implementation that allows MemberQ and FreeQ to be as fast as they are.

I noticed this thanks to this fine answer.

I start with a list of True|False values:

lst = Insert[Table[True, {500000}], False, 499000];

It is not packed:

Developer`PackedArrayQ[lst]

False

I compare timings:

Scan[Identity, lst] ~Do~ {100} // Timing
MemberQ[lst, False] ~Do~ {100} // Timing
FreeQ[lst, False]   ~Do~ {100} // Timing

{4.93, Null}

{0.405, Null}

{0.25, Null}

What allows these functions to be more than an order of magnitude faster than simply Scanning the list?

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371

1 Answers1

32

What you observed seems to be an instance of the general behavior of the pattern-matcher when used with what I call "syntactic patterns" - patterns which only reflect the rigid structure of an expression, like e.g. _f. The speed-up with respect to the scanning is because the main evaluation loop is avoided - for FreeQ and MemberQ, the scannng is done all inside the pattern-matcher, which is lower-level compared to the main evaluator.

In this answer, and also here, there are some examples of this behavior, and further discussion. I think that a good rule of thumb is that you gain an order and a half of magnitude speed-up by clever use of syntactic patterns in place of top-level scanning code (pushing all work into the pattern-matcher), and you gain 2-3 orders of magnitude speed-up if you manage to recast the problem as a vectorized numerical problem on packed arrays.

Dennis Guse
  • 287
  • 1
  • 7
Leonid Shifrin
  • 114,335
  • 15
  • 329
  • 420
  • Would you add some examples to this, please? – Mr.Wizard Feb 06 '12 at 22:05
  • 1
    @Spartacus I added links to two relevant questions / answers, containing some examples - will this satisfy you? I am afraid that otherwise right now I'd have to just copy those example and paste here, which doesn't make much sense IMO. – Leonid Shifrin Feb 06 '12 at 22:12
  • 1
    @Spartacus Most functions that do their job "in one go" (Union, Split, etc.) seems to be fast too. – Szabolcs Feb 06 '12 at 22:13
  • @Szabolcs they are only fast with default comparison etc. functions, and that is also because they bypass the main evaluator with default functions (not to mention that Union switched to a pairwise comparisons for explicit comparison function, which leads to a quadratic-time algorithm, so Union, as well as DeleteDuplicates, are bad examples here. Sort and Split are good examples). This is probably what you meant by "in one go" - they don't have to oscillate between the kernel code and high-level evaluations. It is these oscillations that kill the performance. – Leonid Shifrin Feb 06 '12 at 22:17
  • @Leonid, it is good. Szabolcs yes I suppose so. I guess I saw FreeQ / MemberQ (in the basic form) as more general than e.g. DeleteDuplicates (also in the basic form). Perhaps that was a mistake. I would not have posted this question except that it seemed like a slow day, but I did not honestly know the answer beforehand. – Mr.Wizard Feb 06 '12 at 22:26
  • Dear @LeonidShifrin, do patterns with Head specification (as in x_Head) count as what you call "syntactic patterns"? I would have thought that checking for matched head is just like conditionals... (i.e. Does x have the right Head?) – QuantumDot Nov 08 '14 at 18:07
  • also @LeonidShifrin, do I pay a speed-penalty if I use Alternatives compared with Condition? – QuantumDot Nov 08 '14 at 18:11
  • @QuantumDot Yes, _head is a good example of syntactic pattern. As to Alternatives, it is still syntactic pattern (as far as the alternative patterns are all syntactic), since it only involves pattern-matcher and not evaluator. – Leonid Shifrin Nov 08 '14 at 19:26