18

I have a dataset of 3D coordinates with a length of about $ 4\times 10^6 $.

From this volume I am sequentially selecting coordinates along one axis and manipulating this subset.

My question: Can the Select function be replaced by something that is faster.

Here is the example code with the needed time for selection:

SeedRandom[1];

coordinates = RandomReal[10, {4000000, 3}]; // AbsoluteTiming

{0.0989835, Null}

selectedCoordinates = Select[coordinates, #[[1]] > 6 && #[[1]] < 7 & ]; // AbsoluteTiming

{5.88215, Null}

Dimensions[selectedCoordinates]

{400416, 3}
kglr
  • 394,356
  • 18
  • 477
  • 896
mrz
  • 11,686
  • 2
  • 25
  • 81
  • 5
    Pick[coordinates, 6 < # < 7 & /@ coordinates[[All, 1]]] is almost twice as fast as Select[..] – kglr Oct 25 '17 at 11:43
  • 6
    You can compile your Select: compiled = Compile[{{coords, _Integer, 2}}, Select[coords, #[[1]] > 6 && #[[1]] < 7 &], CompilationTarget -> "C"] . Then compiled[coordinates] takes 0.2 secs on my machine. – Leonid Shifrin Oct 25 '17 at 11:51
  • 1
    Cases[coordinates, {x_, y_, z_} /; x > 6 && y < 7] Assuming that you want to get #[[1]]>6 &&#[[2]]<7. Otherwise the output would always by {}. No integer can be >6 and <7 at the same time ,-). – RMMA Oct 25 '17 at 12:00
  • @RMMA: Thank you for your remark. I changed to RandomReal. – mrz Oct 25 '17 at 14:30

3 Answers3

29
res1 = Select[coordinates, #[[1]] > 6 && #[[1]] < 7 &]; // 
  AbsoluteTiming // First

6.997629

res2 = Select[coordinates, 6 < #[[1]] < 7 &]; // AbsoluteTiming // First

4.676356

res3 = Pick[coordinates, 6 < # < 7 & /@ coordinates[[All, 1]]]; // 
  AbsoluteTiming // First

5.266651

res4 = Pick[coordinates, (1 - UnitStep[# - 7]) (1 - UnitStep[6 - #]) &@
      coordinates[[All, 1]], 1]; // AbsoluteTiming // First

0.353154

res6 = compiled[coordinates]; // AbsoluteTiming // First

0.667676

where

compiled = Compile[{{coords, _Real, 2}}, Select[coords, #[[1]] > 6 && #[[1]] < 7 &]]`

is the method suggested in Leonid's comment (without the option `CompilationTarget -> "C").

Equal[res1, res2, res3, res4, res5, res6]

True

kglr
  • 394,356
  • 18
  • 477
  • 896
  • Thank so much. The last solution is my case about 35 times faster than Select. – mrz Oct 25 '17 at 14:04
  • A fairer comparison would chain the inequalities for Select as well. And it would be nice to include the comparison for a compiled selector. – Alan Oct 25 '17 at 14:18
  • @mrz, my pleasure, Thank you for the accept. – kglr Oct 25 '17 at 14:25
  • @Alan, I added the variant of Select you suggested. I don't have a c compiler installed, so i cannot include timings for the method suggested by Leonid. Without the CompilationTarget->"C" compiled is slower than Pick. – kglr Oct 25 '17 at 14:32
  • Something like Pick[c,UnitStep[c-6,7-c],1] is more compact, but is about 30 times slower than your res4 formulation! The problem is with the multi-dimensional UnitStep. Something like Pick[c, UnitStep[c - 6]*UnitStep[7 - c], 1] seems as fast or slightly faster than the res4 formulation. – KennyColnago Oct 25 '17 at 16:14
  • @kennyColnago, i think we do need the more cumbersome (1 - UnitStep[# - 7]) (1 - UnitStep[6 - #]) & ; Pick[c, UnitStep[c - 6]*UnitStep[7 - c], 1] does not give the correct result. – kglr Oct 25 '17 at 17:19
  • Sorry, I was being brief to fit a comment, and used an imprecise notation qualified by "something like". Speaking more precisely, I should have said that (1-UnitStep[#-7])(1-UnitStep[6-#]) is the same as the more direct UnitStep[#-6]*UnitStep[7-#]. – KennyColnago Oct 26 '17 at 04:25
22

Slightly faster than @kglr's solution is to use Clip:

SeedRandom[1];
coordinates = RandomReal[10, {4000000, 3}];

r1 = Pick[
    coordinates,
    Unitize @ Clip[coordinates[[All,1]], {6, 7}, {0, 0}],
    1
];//RepeatedTiming

r2 = Pick[
    coordinates,
    (1-UnitStep[#-7]) (1-UnitStep[6-#])&@coordinates[[All,1]],
    1
];//RepeatedTiming

r1 === r2

{0.10, Null}

{0.15, Null}

True

Carl Woll
  • 130,679
  • 6
  • 243
  • 355
4

My question: can the Select function be replaced by something that is faster.

Yes! Check out the BoolEval package.

SeedRandom[1];
coordinates = RandomReal[10, {4000000, 3}]; // AbsoluteTiming
(* {0.118832, Null} *)

selectedCoordinates = 
   Select[coordinates, #[[1]] > 6 && #[[1]] < 7 &]; // AbsoluteTiming
(* {6.08899, Null} *)
Needs["BoolEval`"]

selectedCoordinates2 = BoolPick[coordinates, 6 < coordinates[[All, 1]] < 7]; // AbsoluteTiming
(* {0.145518, Null} *)

selectedCoordinates == selectedCoordinates2
(* True *)

Be sure to read the documentation of the package to see more usage examples and learn about caveats.

Szabolcs
  • 234,956
  • 30
  • 623
  • 1,263