8

While working on a solution to another question, I noticed some performance regression slow performance of SequenceCases and friends. Consider a problem of finding zeroes of a function, given a sorted table of its values:

$Version (* "13.2.0 for Mac OS X x86 (64-bit) (November 18, 2022)" *)

zeros[lst_] := SequenceCases[lst, {a_, b_} /; Sign[a[[2]]] != Sign[b[[2]]]]

For simplicity, instead of finding zeros, we find intervals containing them.

zeros[Table[{t, Sin[t]}, {t, 0, 5, 0.1}]] 
(* {{{0., 0.}, {0.1, 0.0998334}}, {{3.1, 0.0415807}, {3.2, -0.0583741}}} *)

Using SequenceCases with pattern matching results in a very compact code. Unfortunately, this code has some terrible performance characteristics: e.g., asking SequenceCases to process a list of 1000 elements takes considerable time:

zeros[Table[{t, Sin[t]}, {t, 0, 100, 0.1}]] // Length // AbsoluteTiming
(* {1.68121, 32} *)

zeros[Table[{t, Sin[t]}, {t, 0, 200, 0.1}]] // Length // AbsoluteTiming (* {12.2768, 64} *)

Notice that non-linear growth in the execution time: it increases by 7.3x while the input double in size.

Now compare this to a straightforward alternative using a loop, which is almost instantaneous:

fastZeros[lst_] := 
 First[Reap[
    Do[If[Sign[lst[[i, 2]]] != Sign[lst[[i + 1, 2]]], 
      Sow[{lst[[i]], lst[[i + 1]]}]], {i, Length[lst] - 1}]][[2]], {}]

With[{data = Table[{t, Sin[t]}, {t, 0, 5, 0.1}]}, fastZeros[data] == zeros[data]] (* True *)

fastZeros[Table[{t, Sin[t]}, {t, 0, 200, 0.1}]] // Length // AbsoluteTiming (* {0.003667, 64} *)

While I expect to have some speed tradeoff when using more powerful language constructs like pattern matching, it still seems like a severe performance penalty. Similar things were discussed before.

Does anyone has any more charitable explanation of the observed behavior other than "it's a bug that should be reported to Wolfram Research"?

Update. Notice, in particular, that it's not the pattern matching that is slow - it's speedy:

zeros2[lst_] := 
 Cases[Partition[lst, 2, 1], {a_, b_} /; Sign[a[[2]]] != Sign[b[[2]]]]

With[{data = Table[{t, Sin[t]}, {t, 0, 5, 0.1}]}, fastZeros[data] == zeros[data] == zeros2[data]] (* True *)

zeros2[Table[{t, Sin[t]}, {t, 0, 200, 0.1}]] // Length // AbsoluteTiming (* {0.004051, 64} *)

Update. Meanwhile, I've submitted [CASE:4999270].

Victor K.
  • 5,146
  • 3
  • 21
  • 34
  • 3
    Just wanted to point out that this is not a speed regression as far as I can tell. It was (alas) always this slow. – Daniel Lichtblau Jan 25 '23 at 17:56
  • Thanks, @DanielLichtblau - you are right. I shouldn't call it a "regression" which presumes a deterioration of performance. I changed the title to reflect that.

    Rather, it's an observation that a pretty useful function is not optimized for what seems to be a natural use case.

    – Victor K. Jan 25 '23 at 18:02
  • 2
    With SeedRandom[0]; data = RandomInteger[1000, 2000];, Compare SequenceCases[data, {11, x_}] // AbsolutTiming versus SequenceCases[data, {11, x_} /; True] // AbsolutTiming Seems to be a consequence of using Condition – Michael E2 Jan 25 '23 at 18:20
  • Sure - but compare to Cases[RandomInteger[100, {2000, 2}], {11, x_} /; True] // AbsoluteTiming, which is instantaneous. So it's a combination of SequenceCases and Condition that somehow slows things down. – Victor K. Jan 25 '23 at 18:25
  • 2
    (@MichaelE2 bat me to this) I just noticed we have an open bug report to the effect that Condition can severely slow SequenceCases. Also using Alternatives causes a severe speed degradation as compared to two cases without alternatives. – Daniel Lichtblau Jan 25 '23 at 18:25
  • I don't think Cases is comparable to SequenceCases. – Michael E2 Jan 25 '23 at 18:28
  • What I meant to say is that the standalone Condition is pretty fast. – Victor K. Jan 25 '23 at 18:32
  • Jose Martin-Garcia informs me that there is this related MSE thread. – Daniel Lichtblau Jan 25 '23 at 18:33
  • What I mean is that matching sequences is potentially complex, and Condition makes SequenceCases switch to an algorithm with bad time-complexity. Without Condition, SequenceCases analyzes the pattern {11, x_} and has an efficient algorithm it can choose. (All this is guesswork, btw. I assume Condition is taken as an sign of a "complex" pattern that is not worth analyzing.) – Michael E2 Jan 25 '23 at 18:36
  • A reasonable guess, and we are talking about the same thing. – Victor K. Jan 25 '23 at 18:39
  • 1
    This example is now in a suggestion/bug report. – Daniel Lichtblau Jan 26 '23 at 17:46

1 Answers1

4

As of 13.3.1, this particular performance issue is fixed:

enter image description here

Victor K.
  • 5,146
  • 3
  • 21
  • 34