12

Consider a large expression, say a polynomial with 25 terms

expr=Product[Unique["a"],{i,1,25}];

to which you apply the following replacement rule:

rep={x_ f[y_] /; FreeQ[x, y] -> 0};

Since f does not appear in expr, the replacement rule has no effect. However, it takes about 10 seconds on my computer to evaluate expr/.rep. This is surprizing long and becomes longer for larger polynomials.

Moreover I realize that it takes a fraction of a second to evaluate

MatchQ[expr,x_ f[y_] /; FreeQ[x, y]]

which returns (as it should) False

Why does it take so much time to perform the replacement if there is no pattern matching? How can I speed-up the evaluation?

M. Tissier
  • 121
  • 3
  • Welcome to Mathematica.SE! I suggest the following: 1) As you receive help, try to give it too, by answering questions in your area of expertise. 2) Take the tour! 3) When you see good questions and answers, vote them up by clicking the gray triangles, because the credibility of the system is based on the reputation gained by users sharing their knowledge. Also, please remember to accept the answer, if any, that solves your problem, by clicking the checkmark sign! – Michael E2 Feb 04 '16 at 01:42
  • 5
    I mean, doesn't MatchQ try to match the entire expression, whereas ReplaceAll has to test every subexpression? – march Feb 04 '16 at 01:46
  • 1
    What @march said. This is an apples to orchards comparison. – Daniel Lichtblau Feb 04 '16 at 02:02
  • 1
    @Daniel I am getting funny results in 10.1.0. Please have a look at this. – Mr.Wizard Feb 04 '16 at 03:08
  • 1
    It's not clear to me why folks want to close this. Looks like a perfectly legitimate question to me. – Leonid Shifrin Feb 04 '16 at 12:53
  • 1
    @LeonidShifrin Once upon a time (last night), I took for granted that it made sense the replacement attempt was slow. So I voted to close it. I am rethinking that and might well retract, depending on what I can figure out about this. – Daniel Lichtblau Feb 04 '16 at 15:57
  • Okay, retracted. My original reaction was only partially warranted, and the behavior is verily a perplexion. Except (for me at least) not in the way the post suggests. The real question is Why is it ever fast? (To be continued.) – Daniel Lichtblau Feb 04 '16 at 16:27

2 Answers2

15

There seems to be a bug regarding this, in version 10.1.0 under Windows. For a first evaluation in a fresh kernel I get:

expr = Product[Unique["a"], {i, 1, 25}];

rep = {x_ f[y_] /; FreeQ[x, y] -> 0};

expr /. rep // Short // AbsoluteTiming
{9.64113*10^-6, a15 a16 a17 a18 a19 a20 << 14 >> a35 a36 a37 a38 a39}

But the second time I evaluate the same thing in the session I get:

{13.8596, a40 a41 a42 a43 a44 a45 <<14>> a60 a61 a62 a63 a64}

Somehow the second run takes a million times longer?!

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
8

Okay, this has been a bit of a headache. The short answer is that it is all working as expected. Now for too much detail to put into a comment.

To begin, the original question was a bit of a blind, though certainly not by intent. When one considers what has to happen in a match with an Orderless function such as Times, it seems quite plausible that this might be very slow. This is alluded to in Help > Wolfram Documentation refguide page for Orderless, under "Possible Issues":


Pattern matching with orderless functions can lead to a large number of possible cases:

In[1]:= SetAttributes[h, Orderless];
In[2]:= ReplaceList[h[a, b, c], h[x_, y_, z_] :> {x, y, z}]
Out[2]= {{a, b, c}, {a, c, b}, {b, a, c}, {b, c, a}, {c, a, b}, {c, b, a}}

Nonetheless, it is quite clear from what others have posted that sometimes this match is fast and sometimes slow, seemingly on the same input. This confusing flip-flop is due to the fact that what is being tested for a match is changing, due to how Unique works. So the first thing is to get a stable set of examples.

To get this consistent set of behaviors showing both the fast and slow evaluations, I use a different construction of the factors below. I do this for four different products, with the only difference in naming being the first letter of the factors.

n = 22;
expr1 = Product[ToExpression[StringJoin["a", ToString[i]]], {i, n}];
expr2 = Product[ToExpression[StringJoin["b", ToString[i]]], {i, n}];
expr3 = Product[ToExpression[StringJoin["c", ToString[i]]], {i, n}];
expr4 = Product[ToExpression[StringJoin["d", ToString[i]]], {i, n}];
rep = {x_ f[y_] /; FreeQ[x, y] -> 0};
AbsoluteTiming[expr1 /. rep;]
AbsoluteTiming[expr2 /. rep;]
AbsoluteTiming[expr3 /. rep;]
AbsoluteTiming[expr4 /. rep;]

(* Out[359]= {1.482855, Null}

Out[360]= {0.000011, Null}

Out[361]= {8.*10^-6, Null}

Out[362]= {2.291128, Null} *)

This is entirely replicable and consistent: if I repeat the full evaluation I get the same set of fast vs slow evaluations.

So why are some slow and others not? For this we go to one of the deeper corners of documentation, specifically,

tutorial/SomeNotesOnInternalImplementation

Here are two relevant items:

"Each expression contains a special form of hash code that is used both in pattern matching and evaluation."

"A form of hashing that takes account of blanks and other features of patterns is used in pattern matching."

The details of this are outside the scope here. But the upshot is that sometimes this mechanism allows for an early exit, by containing information that entirely rules out a (sub)match. This is why some examples are so fast; they avoid the combinatorial explosion that would otherwise be needed to handle all possible reorderings.

One last detail is that this only happens when dealing with a head that is both Flat and Orderless. The latter means all reorderings of the pattern are required, the former means we also have to consider subsequences in the thing being matched. A reference here is: tutorial/FlatAndOrderlessFunctions

So the following is across the board fast.

n = 22;
ClearAttributes[g, {Flat, Orderless}]
SetAttributes[g, {Orderless}]
expr1 = Apply[g, 
   Table[ToExpression[StringJoin["a", ToString[i]]], {i, n}]];
expr2 = Apply[g, 
   Table[ToExpression[StringJoin["b", ToString[i]]], {i, n}]];
expr3 = Apply[g, 
   Table[ToExpression[StringJoin["c", ToString[i]]], {i, n}]];
expr4 = Apply[g, 
   Table[ToExpression[StringJoin["d", ToString[i]]], {i, n}]];
rep = {g[x_, f[y_]] /; FreeQ[x, y] -> 0};
AbsoluteTiming[expr1 /. rep;]
AbsoluteTiming[expr2 /. rep;]
AbsoluteTiming[expr3 /. rep;]
AbsoluteTiming[expr4 /. rep;]

(* Out[443]= {0.000026, Null}

Out[444]= {9.*10^-6, Null}

Out[445]= {8.*10^-6, Null}

Out[446]= {9.*10^-6, Null} *)

But this variant has the same behavior as the original example that used head of Times.

n = 22;
ClearAttributes[g, {Flat, Orderless}]
SetAttributes[g, {Flat, Orderless}]
expr1 = Apply[g, 
   Table[ToExpression[StringJoin["a", ToString[i]]], {i, n}]];
expr2 = Apply[g, 
   Table[ToExpression[StringJoin["b", ToString[i]]], {i, n}]];
expr3 = Apply[g, 
   Table[ToExpression[StringJoin["c", ToString[i]]], {i, n}]];
expr4 = Apply[g, 
   Table[ToExpression[StringJoin["d", ToString[i]]], {i, n}]];
rep = {g[x_, f[y_]] /; FreeQ[x, y] -> 0};
AbsoluteTiming[expr1 /. rep;]
AbsoluteTiming[expr2 /. rep;]
AbsoluteTiming[expr3 /. rep;]
AbsoluteTiming[expr4 /. rep;]

(* Out[455]= {1.481003, Null}

Out[456]= {7.*10^-6, Null}

Out[457]= {7.*10^-6, Null}

Out[458]= {2.361662, Null} *)

Hoping all this is of some use.

Daniel Lichtblau
  • 58,970
  • 2
  • 101
  • 199
  • Great, this clarifies the situation. I am still surprized by the fact that MatchQ was always fast, but this may be because MatchQ does not test what I think it tests. The work-around I came-up with consists in performing the MatchQ test and making the replacement only if I got True. Is that ok? – M. Tissier Feb 05 '16 at 13:00
  • It really depends on the specifics of the expression and on what you want to accomplish in the replacement. If replacing a proper subexpression is a requirement then whether MatchQ passes or fails on the full expression will not be a correct guide. Also keep in mind that you are working with a Head (Times) that has both Flat and Orderless attributes, and that latter is going to give a "worst case scenario" for subexpression matching (due to need to test against all possible reorders). – Daniel Lichtblau Feb 05 '16 at 18:17