7

I am creating a random 3d data set in Matematica 12.1. Then I am selecting all points that are in a certain range of one axis.

The same I am doing in Python (same computer, Python 3.8.5, numpy 1.19.2)

RESULT: It seems that Python is able to select much faster (1.7 sec) than Mathematica (5.2 sec). What is the reason for that? For selection in Mathematica I used the fastest solution, which is by Carl Woll (see here at bottom).

SeedRandom[1];
coordinates = RandomReal[10, {100000000, 3}];

selectedCoordinates = Pick[coordinates, Unitize@Clip[coordinates[[All, 1]], {6, 7}, {0, 0}], 1]; // AbsoluteTiming

{5.16326, Null}

Dimensions[coordinates]

{100000000, 3}

Dimensions[selectedCoordinates]

{10003201, 3}

PYTHON CODE:

import time
import numpy as np

np.random.seed(1) coordinates = np.random.random_sample((100000000,3))*10

start = time.time() selectedCoordinates = coordinates[(coordinates[:,0] > 6) & (coordinates[:,0] < 7)] end = time.time()

print(end-start)

print(coordinates.shape)

print(selectedCoordinates.shape)

1.6979997158050537

(100000000, 3)

(9997954, 3)

mrz
  • 11,686
  • 2
  • 25
  • 81

1 Answers1

6

I would argue that it's not a fair comparison unless you consider the performance with NumericArray, as the default Mathematica list-of-lists has many other features regarding numeric stability, etc. that are not present in a merely list of Real32 numbers.

Let me demonstrate: In your code you do something like:

SeedRandom[1];
coordinates = RandomReal[10, {100000000, 3}];

Clear[f] f[coordinates_] := (Pick[coordinates, Unitize@Clip[coordinates[[All, 1]], {6, 7}, {0, 0}], 1];) // AbsoluteTiming

f[coordinates] (4.0478 s on my MacBook)

Now let's convert coordinates into a NumericArray (similar to what is being used internally in numpy):

coordinates32 = NumericArray[coordinates, "Real32"];
f[coordinates32] (* 1.09 s on my MacBook *)

This gives us a 3.7x speedup, comparable to the performance gains you observe in Python.

Joshua Schrier
  • 3,356
  • 8
  • 19