Mathematica code execution speed testing

Question

I am testing the speed of Mathematica's latest version against a VB macro for simple loops to evaluate Pi.

So the VB macro is:

c = 0: n = 1000000: m = n * n

For i = 0 To n

    For j = 0 To n

        If (i * i + j * j) / m < 1 Then

            c = c + 1

        End If

    Next

Next

Selection.Text = 4 * c / m

I know how to write the corresponding code in Mathematica, but I do not know the intricacies of Mathematica regarding faster program execution. Can you help please by giving me an example of best optimized code for this example?

There are approximately 4 trillion operations in this example and the VB macro takes less than a day to execute...

This program partitions a quarter of a circular area of radius = 1 to smaller squares of area $1/n^2$ then counts them and adds them together. Essentially, a Monte Carlo Method but with randomness removed.

With your help I have created this faster code which uses parallel computing:

arg = Range[ 0, 1000000]; calcCompiled = 
Compile[{i}, 
Module[{c = 0}, Do[If[(i*i + j*j)/1000000^2 < 1, ++c], {j, 1000000}]; 
c], CompilationTarget -> "C", RuntimeAttributes -> {Listable}, 
Parallelization -> True, 
RuntimeOptions -> "Speed"];
AbsoluteTiming@N[4 Total[calcCompiled[arg]]/1000000^2, 20]

Any suggestions how to make this faster are welcome....

There is a buildt in benchmark function in Wolfram, see https://reference.wolfram.com/language/Benchmarking/ref/Benchmark.html. An alternative comparison would be to do the same calculations in VB. — FredrikD, Dec 26 '18 at 15:23
"Now I know how to write the corresponding code in Mathematica" ... so include your Mathematica code in the question. Copy and paste in Raw InputForm and format as a code block. — Bob Hanlon, Dec 26 '18 at 16:13
Ok Bob, sorry my mistake, I've been writing code for Mathematica since 2002 and Mathematica 4, what do you think? Mathematica 4 was waayyyyy slower then.... — dimachaerus, Dec 26 '18 at 16:25
Erm. The result of the code seems to be c=1 and that takes an unmeasurable small amount of time to execute. That is eactly one integer operation. But I am not fluent in Visual Basic, so I could be wrong. In total, I do not understand your request. — Henrik Schumacher, Dec 26 '18 at 16:32
@Henrik Schumacher Sorry, just edited the code for better speed, now looks ok. — dimachaerus, Dec 26 '18 at 16:38
@dimachaerus one issue with these kinds of comparisons is you're comparing apples to oranges. Procedural, loop-based code is not the kind of code that is fast in Mathematica (in the absence of Compile). Try a functional algorithm instead. But if VB compiles to low-level C anyway, this might still not quite be the right comparison. Speed tests are tricky if you need to get a truly meaningful result. — b3m2a1, Dec 26 '18 at 19:20

Henrik Schumacher · Answer 1 · 2018-12-26T18:06:50.083

10

This is an algorithm of complexity $O(n)$ instead of $O(n^2)$. It runs through in about 8 milliseconds.

n = 1000000;
piapprox = 
    Total[Floor[
       Sqrt[Ramp[Subtract[N[n^2] - 1., Range[1., n]^2]]]]] (4./n^2); //
   RepeatedTiming // First

piapprox - Pi

0.00812

-4.00401*10^-6

Moreover, you get more bang for your bucks with the trapezoidal rule as follows:

n = 1000000;
weights = ConstantArray[1./n, n + 1];
weights[[{1, -1}]] = 0.5/n;
piapprox2 = 4. (weights.Sqrt[1. - Subdivide[0., 1., n]^2]);
piapprox2 - Pi

-1.17598*10^-9

And this computes Pi with 16-digit precision by approximating the intersection of the unit disk with the first quadrant by $2^k$ congruent triangles; the area of a single triangle is computed (one half of the result of Det) and multiplied by $4\, 2^k$.

RepeatedTiming[
  iters = 25;
  {p, q} = N[IdentityMatrix[2]];
  piapprox = 2^(k + 1) Det[{p, Nest[x \[Function] Normalize[p + x], q, iters]}];
  ][[1]]
piapprox - Pi

0.000048

4.44089*10^-16

This uncompiled versions needs about 0.04 milliseconds.

edited Dec 26 '18 at 18:06

answered Dec 26 '18 at 16:55

Henrik Schumacher

106,770
7
179
309

Sorry, I am not looking for fast methods to evaluate Pi. I am testing the speed of Mathematica for the particular loop example against Visual Basic and I am finding Mathematica to be very slow compared. – dimachaerus Dec 26 '18 at 18:53
1

It is slower in Mathematica because you did it the wrong way. For is well-known to be very slow when not compiled. Try Do instead. – Henrik Schumacher Dec 26 '18 at 19:01
No, even simple: Do[ i * i + j * j, {i, 0, 10000}, {j, 0, 10000}] takes more time than VB. – dimachaerus Dec 26 '18 at 19:15
Then Compile it as it was shown by b3m2a1. – Henrik Schumacher Dec 27 '18 at 09:34

Ruslan · Answer 2 · 2018-12-27T07:21:50.633

You're comparing a compiled language with an interpreted one. Or, at least, you are using the Wolfram language in interpreted mode. You'll have better results if you use a Compiled function.

Example without Compile — directly translated code:

calcNaive = Function[{n},
   Module[{c = 0., m = N[n^2]},
    Do[
     Do[
      If[(i*i + j*j)/m < 1.,
       ++c
       ]
      , {i, 0., n}]
     , {j, 0., n}];
    4. c/m]];
AbsoluteTiming@calcNaive[10000]

{264.352184, 3.14199016}

First value in the output is timing in seconds, second is the function return value.

Note here the N[n^2] instead of simple n^2. This way we avoid using arbitrary-length integers in further code, which would result in a 17% longer computation for no real benefit. In general, if you don't need advanced precision, you'd better enter all your numbers as machine-precision numbers, e.g. 1000.0 instead of 1000.

Now the code using Compile (which always uses machine-precision numbers, in addition to C-language-compiler optimizations):

calcCompiled = Compile[{n},
   Module[{c = 0, m = n^2},
    Do[
     Do[
      If[(i*i + j*j)/m < 1,
       ++c
       ]
      , {i, 0, n}]
     , {j, 0, n}];
    4 c/m]
    , CompilationTarget -> "C"
    , RuntimeOptions -> "Speed"];
AbsoluteTiming@calcCompiled[10000]

{0.918117, 3.14199016}

Notice more than two orders of magnitude faster calculation.

Thanks. The right answer I was looking for. – dimachaerus Dec 26 '18 at 19:56 — dimachaerus, Dec 26 '18 at 19:56

score 6 · Answer 3 · answered Dec 26 '18 at 20:03

Here's another comparison. First let's look at the raw, inefficient version that someone who comes to Mathematica from another language might try:

badForMathematica[n_] :=
 Module[{i, j, c = 0., m = n^2},
  Do[
   If[(i^2 + j^2)/m < 1,
    c = c + 1
    ],
   {i, 0, n - 1},
   {j, 0, n - 1}
   ];
  (4*c)/m
  ]

badForMathematica[1000] // AbsoluteTiming

{3.07878, 3.14552}

Eeesh. But this is the dumb way. Here's a new approach. First we take the your algorithm and realize you're just subdividing and counting hits inside the circle. We can do this much faster using vectorized operations. Here's a way to get all the hits:

countHits[{i1_, i2_}, {j1_, j2_}, m_] :=

  With[{r1 = Range[i1, i2]^2, r2 = Range[j1, j2]^2},
   Total@UnitStep[m - Join @@ Outer[Plus, r1, r2]]
   ];

We generate the Outer product you're really generating, then totaling the counts. Simple as that. Now we cook this into a larger algorithm where we work in chunks because I don't have enough memory to work all at once:

goodForMathematica[n_, chunkSize_: 1000] :=
 Module[{m, c = 0, chunks},
  m = n^2;
  chunks =
   If[chunkSize >= n,
    {{0, n}},
    If[#[[-1]] =!= n, 
       Append[#, n],
       #
       ] &@Range[0, n, chunkSize]
    ];
  Do[c += countHits[c1, c2, m], {c1, chunks}, {c2, chunks}];
  (4.*c)/m
  ]

If you have lots of memory just do it straight out (but you probably don't have enough). Here's that test:

goodForMathematica[10000] // AbsoluteTiming

{3.37823, 3.14199}

We increased the count by an order of magnitude but had similar performance. Note that this algorithm has trash scaling, so we can expect that n = 20000 will take ~12s because we've got four blocks to work with:

1000000 will therefore take about ~35000s or ~10 hours. Not gonna actually do it.

Finally, here's probably the first thing to think of, which is to just use Compile:

compiledVersion =
  Compile[{{n, _Integer}},
   Module[{i, j, c = 0., m = n^2},
    Do[
     If[(i^2 + j^2)/m < 1,
      c = c + 1
      ],
     {i, 0, n - 1},
     {j, 0, n - 1}
     ];
    (4*c)/m
    ],
   RuntimeOptions -> "Speed",
   CompilationTarget -> "C"
   ];

compiledVersion[100000] // AbsoluteTiming

{14.8691, 3.14163}

And this blows everything else out of the water (notice I use 100000 per grid subdivision), but that makes sense, because we're really using C, not Mathematica.

There's no need to put i and j into Module: Do already localizes their names. — Ruslan, Dec 26 '18 at 20:04
@Ruslan ah yeah at first I was just directly copying the OPs For structure but then got annoyed with it. I'll prune it sometime later. — b3m2a1, Dec 26 '18 at 20:06
Excellent answer! I noticed that the "i" outer loop can be further subdivided into n separate smaller loops to utilize the n different cores of my processor and kernels of Mathematica 11.3 for parallel processing. Any idea how to optimize code for this task? — dimachaerus, Dec 26 '18 at 21:40

Mathematica code execution speed testing

3 Answers3

Linked