4

I am testing the speed of Mathematica's latest version against a VB macro for simple loops to evaluate Pi.

So the VB macro is:

c = 0: n = 1000000: m = n * n

For i = 0 To n

    For j = 0 To n

        If (i * i + j * j) / m < 1 Then

            c = c + 1

        End If

    Next

Next

Selection.Text = 4 * c / m

I know how to write the corresponding code in Mathematica, but I do not know the intricacies of Mathematica regarding faster program execution. Can you help please by giving me an example of best optimized code for this example?

There are approximately 4 trillion operations in this example and the VB macro takes less than a day to execute...

This program partitions a quarter of a circular area of radius = 1 to smaller squares of area $1/n^2$ then counts them and adds them together. Essentially, a Monte Carlo Method but with randomness removed.

With your help I have created this faster code which uses parallel computing:

arg = Range[ 0, 1000000]; calcCompiled = 
Compile[{i}, 
Module[{c = 0}, Do[If[(i*i + j*j)/1000000^2 < 1, ++c], {j, 1000000}]; 
c], CompilationTarget -> "C", RuntimeAttributes -> {Listable}, 
Parallelization -> True, 
RuntimeOptions -> "Speed"];
AbsoluteTiming@N[4 Total[calcCompiled[arg]]/1000000^2, 20]

Any suggestions how to make this faster are welcome....

dimachaerus
  • 197
  • 9
  • 1
    There is a buildt in benchmark function in Wolfram, see https://reference.wolfram.com/language/Benchmarking/ref/Benchmark.html. An alternative comparison would be to do the same calculations in VB. – FredrikD Dec 26 '18 at 15:23
  • "Now I know how to write the corresponding code in Mathematica" ... so include your Mathematica code in the question. Copy and paste in Raw InputForm and format as a code block. – Bob Hanlon Dec 26 '18 at 16:13
  • Ok Bob, sorry my mistake, I've been writing code for Mathematica since 2002 and Mathematica 4, what do you think? Mathematica 4 was waayyyyy slower then.... – dimachaerus Dec 26 '18 at 16:25
  • Erm. The result of the code seems to be c=1 and that takes an unmeasurable small amount of time to execute. That is eactly one integer operation. But I am not fluent in Visual Basic, so I could be wrong. In total, I do not understand your request. – Henrik Schumacher Dec 26 '18 at 16:32
  • @Henrik Schumacher Sorry, just edited the code for better speed, now looks ok. – dimachaerus Dec 26 '18 at 16:38
  • 2
    @dimachaerus one issue with these kinds of comparisons is you're comparing apples to oranges. Procedural, loop-based code is not the kind of code that is fast in Mathematica (in the absence of Compile). Try a functional algorithm instead. But if VB compiles to low-level C anyway, this might still not quite be the right comparison. Speed tests are tricky if you need to get a truly meaningful result. – b3m2a1 Dec 26 '18 at 19:20

3 Answers3

10

This is an algorithm of complexity $O(n)$ instead of $O(n^2)$. It runs through in about 8 milliseconds.

n = 1000000;
piapprox = 
    Total[Floor[
       Sqrt[Ramp[Subtract[N[n^2] - 1., Range[1., n]^2]]]]] (4./n^2); //
   RepeatedTiming // First

piapprox - Pi

0.00812

-4.00401*10^-6

Moreover, you get more bang for your bucks with the trapezoidal rule as follows:

n = 1000000;
weights = ConstantArray[1./n, n + 1];
weights[[{1, -1}]] = 0.5/n;
piapprox2 = 4. (weights.Sqrt[1. - Subdivide[0., 1., n]^2]);
piapprox2 - Pi

-1.17598*10^-9

And this computes Pi with 16-digit precision by approximating the intersection of the unit disk with the first quadrant by $2^k$ congruent triangles; the area of a single triangle is computed (one half of the result of Det) and multiplied by $4\, 2^k$.

RepeatedTiming[
  iters = 25;
  {p, q} = N[IdentityMatrix[2]];
  piapprox = 2^(k + 1) Det[{p, Nest[x \[Function] Normalize[p + x], q, iters]}];
  ][[1]]
piapprox - Pi

0.000048

4.44089*10^-16

This uncompiled versions needs about 0.04 milliseconds.

Henrik Schumacher
  • 106,770
  • 7
  • 179
  • 309
10

You're comparing a compiled language with an interpreted one. Or, at least, you are using the Wolfram language in interpreted mode. You'll have better results if you use a Compiled function.

Example without Compile — directly translated code:

calcNaive = Function[{n},
   Module[{c = 0., m = N[n^2]},
    Do[
     Do[
      If[(i*i + j*j)/m < 1.,
       ++c
       ]
      , {i, 0., n}]
     , {j, 0., n}];
    4. c/m]];
AbsoluteTiming@calcNaive[10000]

{264.352184, 3.14199016}

First value in the output is timing in seconds, second is the function return value.

Note here the N[n^2] instead of simple n^2. This way we avoid using arbitrary-length integers in further code, which would result in a 17% longer computation for no real benefit. In general, if you don't need advanced precision, you'd better enter all your numbers as machine-precision numbers, e.g. 1000.0 instead of 1000.

Now the code using Compile (which always uses machine-precision numbers, in addition to C-language-compiler optimizations):

calcCompiled = Compile[{n},
   Module[{c = 0, m = n^2},
    Do[
     Do[
      If[(i*i + j*j)/m < 1,
       ++c
       ]
      , {i, 0, n}]
     , {j, 0, n}];
    4 c/m]
    , CompilationTarget -> "C"
    , RuntimeOptions -> "Speed"];
AbsoluteTiming@calcCompiled[10000]

{0.918117, 3.14199016}

Notice more than two orders of magnitude faster calculation.

Ruslan
  • 7,152
  • 1
  • 23
  • 52
6

Here's another comparison. First let's look at the raw, inefficient version that someone who comes to Mathematica from another language might try:

badForMathematica[n_] :=
 Module[{i, j, c = 0., m = n^2},
  Do[
   If[(i^2 + j^2)/m < 1,
    c = c + 1
    ],
   {i, 0, n - 1},
   {j, 0, n - 1}
   ];
  (4*c)/m
  ]

badForMathematica[1000] // AbsoluteTiming

{3.07878, 3.14552}

Eeesh. But this is the dumb way. Here's a new approach. First we take the your algorithm and realize you're just subdividing and counting hits inside the circle. We can do this much faster using vectorized operations. Here's a way to get all the hits:

countHits[{i1_, i2_}, {j1_, j2_}, m_] :=

  With[{r1 = Range[i1, i2]^2, r2 = Range[j1, j2]^2},
   Total@UnitStep[m - Join @@ Outer[Plus, r1, r2]]
   ];

We generate the Outer product you're really generating, then totaling the counts. Simple as that. Now we cook this into a larger algorithm where we work in chunks because I don't have enough memory to work all at once:

goodForMathematica[n_, chunkSize_: 1000] :=
 Module[{m, c = 0, chunks},
  m = n^2;
  chunks =
   If[chunkSize >= n,
    {{0, n}},
    If[#[[-1]] =!= n, 
       Append[#, n],
       #
       ] &@Range[0, n, chunkSize]
    ];
  Do[c += countHits[c1, c2, m], {c1, chunks}, {c2, chunks}];
  (4.*c)/m
  ]

If you have lots of memory just do it straight out (but you probably don't have enough). Here's that test:

goodForMathematica[10000] // AbsoluteTiming

{3.37823, 3.14199}

We increased the count by an order of magnitude but had similar performance. Note that this algorithm has trash scaling, so we can expect that n = 20000 will take ~12s because we've got four blocks to work with:

1000000 will therefore take about ~35000s or ~10 hours. Not gonna actually do it.

Finally, here's probably the first thing to think of, which is to just use Compile:

compiledVersion =
  Compile[{{n, _Integer}},
   Module[{i, j, c = 0., m = n^2},
    Do[
     If[(i^2 + j^2)/m < 1,
      c = c + 1
      ],
     {i, 0, n - 1},
     {j, 0, n - 1}
     ];
    (4*c)/m
    ],
   RuntimeOptions -> "Speed",
   CompilationTarget -> "C"
   ];

compiledVersion[100000] // AbsoluteTiming

{14.8691, 3.14163}

And this blows everything else out of the water (notice I use 100000 per grid subdivision), but that makes sense, because we're really using C, not Mathematica.

b3m2a1
  • 46,870
  • 3
  • 92
  • 239
  • There's no need to put i and j into Module: Do already localizes their names. – Ruslan Dec 26 '18 at 20:04
  • @Ruslan ah yeah at first I was just directly copying the OPs For structure but then got annoyed with it. I'll prune it sometime later. – b3m2a1 Dec 26 '18 at 20:06
  • Excellent answer! I noticed that the "i" outer loop can be further subdivided into n separate smaller loops to utilize the n different cores of my processor and kernels of Mathematica 11.3 for parallel processing. Any idea how to optimize code for this task? – dimachaerus Dec 26 '18 at 21:40