14

The performance of LengthWhile has been improved in v11.1, now the lengthwhile below is no longer faster.


A friend of mine showed me this example, it's a test comparing LengthWhile to a self-made lengthwhile written in a direct and conventional way:

lengthwhile[x_, t_] := Module[{i = 0, l = Length@x}, While[i < l && t@x[[i + 1]], i++]; i]

lst = RandomInteger[{-2, 2}, {10^4, 10}];
rst1 = LengthWhile[#, # >= 0 &] & /@ lst; // AbsoluteTiming
rst2 = lengthwhile[#, # >= 0 &] & /@ lst; // AbsoluteTiming
rst1 == rst2
{3.941000, Null}
{0.474000, Null}
True

LengthWhile is much slower than the reinvented wheel! Why? Simply a bad performance of LengthWhile? Or LengthWhile isn't used in a proper way?

xzczd
  • 65,995
  • 9
  • 163
  • 468

2 Answers2

14

Your test is quite synthetic: you take only few first elements. If you you have longer sequence of positive elements then build-in LengthWhile is faster

lst = RandomInteger[{-1, 30000}, 100000];
rst1 = LengthWhile[lst, # >= 0 &]; // AbsoluteTiming
rst2 = lengthwhile[lst, # >= 0 &]; // AbsoluteTiming
rst1 == rst2
(* {0.096340, Null} *)
(* {0.166603, Null} *)
(* True *)

Update:

Amazingly, the compiled version is considerably faster then LengthWhile.

cLengthWhile = Compile[{{x, _Integer, 1}, {thr, _Integer}}, 
   Module[{i = 0, l = Length@x}, 
    While[i < l && (x[[i + 1]] >= thr), i++]; i], 
   CompilationTarget -> "C", RuntimeAttributes -> {Listable}, 
   RuntimeOptions -> "Speed"];

rst3 = cLengthWhile[lst, 0]; // AbsoluteTiming
rst1 == rst3
(* {0.000138, Null} *)
(* True *)

Update 2:

For your set of short lists there is quite fast uncompiled function

lengthwhile[x_, t_] := 
 Module[{i = 0, l = Length@x}, While[i < l && t@x[[i + 1]], i++]; i]
lengthWhile2[x_, thr_] := 
 Dimensions[x][[2]] - Total@Unitize@Accumulate[Transpose@UnitStep[x - thr] - 1]

lst = RandomInteger[{-2, 2}, {10^4, 10}];
rst1 = LengthWhile[#, # >= 0 &] & /@ lst; // AbsoluteTiming
rst2 = lengthwhile[#, # >= 0 &] & /@ lst; // AbsoluteTiming
rst3 = lengthWhile2[lst, 0]; // AbsoluteTiming
rst4 = cLengthWhile[lst, 0]; // AbsoluteTiming
rst1 == rst2 == rst3 == rst4
(* {3.990231, Null} *)
(* {0.307152, Null} *)
(* {0.004986, Null} *)
(* {0.001347, Null} *)
(* True *)
ybeltukov
  • 43,673
  • 5
  • 108
  • 212
14

There are several reasons. Firstly the built-in function has some minor overhead to check the arguments and call the appropriate internal function depending on whether the first argument is a list, a sparse array or an association.

Secondly, with a packed array, LengthWhile uses compilation in an attempt to increase performance. There is some overhead in evaluating Compile, which is especially noticeable for your example with many small lists. (Note that if you do lst2 = Developer`FromPackedArray[lst] the built-in LengthWhile is faster than it is on the packed list.)

Finally, there appears to be a bug in the implementation of the compilation, such that the compiled function calls back to the main evaluator for the predicate function. You can see this by capturing the CompiledFunction from a Trace and examining it with CompilePrint:

Needs["CompiledFunctionTools`"];

CompilePrint @@ Cases[Trace[LengthWhile[lst[[1]], # >= 0 &]], _CompiledFunction, -1, 1]
blah...
7 B2 = MainEvaluate[ Hold[Statistics`TakeWhileDump`predfun$42706][I5]]
blah...

The internal function calling Compile is Statistics`TakeWhileDump`findLastPosition. It appears that the predicate function is not being inlined as we would desire (despite "InlineExternalDefinitions" being used). I'm not sure what the rules are about inlining external definitions, so I'm not sure if this is due to a change in Compile or bad code in Statistics`TakeWhileDump`findLastPosition.

Simon Woods
  • 84,945
  • 8
  • 175
  • 324
  • 1
    There is line predfun[arg_] := pred[arg]; in findLastPosition. Then Compile is called with predfun. It causes uncompiled evaluation (why?). If I change ...predfun[Compile`GetElement[... to ...pred[Compile`GetElement[... it works as desired. – ybeltukov Oct 07 '14 at 16:30
  • @ybeltukov Just dug out the definition of Statistics\TakeWhileDump`findLastPositionwith??and modified all thepredfunpart, theAbsoluteTiming` changed from 3.7s to 2.7s in my computer. – xzczd Oct 08 '14 at 03:59
  • @xzczd What test did you try? Your test have a big overhead due to the compilation. – ybeltukov Oct 08 '14 at 08:48
  • @ybeltukov I tested the code in my question. After deleting the definition of predfun and replacing all the predfun with pred, I got 1 second speed up. – xzczd Oct 08 '14 at 09:42
  • 1
    @xzczd You will obtain bigger speedup for my test. When you apply /@ for a big set of short lists you compile over and over again. – ybeltukov Oct 08 '14 at 10:48
  • 1
    @ybeltukov BTW it's indeed strange that the predfun is defined inside findLastPosition, it only causes the side-effect: function definitions based on pattern-matching can't be inlined. (There seems to be no specific post for the issue, this is a related one, also notice the comments below. ) – xzczd Oct 08 '14 at 11:28
  • Response from Wolfram company: ……Thank you for your message and the link of the post. I have filed a report on this performance issue of LengthWhile and thank you for bringing it to our attention.…… – xzczd Oct 10 '14 at 06:39