Writing compiled functions as fast as Python's Numba

Question

I want to write some code to simulate a damped oscillator that is just as fast as code written using Numba's @njit decorator. I've written the mathematica code and mine is 20-40x slower than the python code written by YouTuber Jack of Some.

Here is the code from Jack of Some's video on speeding up Python code with Numba; I've changed it a bit to run in just one jupyter cell:

import numpy as np
from numba import jit, njit, types, vectorize
@njit
def friction_fn(v, vt):
    if v > vt:
        return - v * 3
    else:
        return - vt * 3 * np.sign(v)
@njit
def simulate_spring_mass_funky_damper(x0, T=10, dt=0.0001, vt=1.0):
    times = np.arange(0, T, dt)
    positions = np.zeros_like(times)
v = 0
a = 0
x = x0
positions[0] = x0/x0

for ii in range(len(times)):
    if ii == 0:
        continue
    t = times[ii]
    a = friction_fn(v, vt) - 100*x
    v = v + a*dt
    x = x + v*dt
    positions[ii] = x/x0
return times, positions


_ = simulate_spring_mass_funky_damper(0.1)
%time _ = simulate_spring_mass_funky_damper(0.1)

The output is

CPU times: user 1.38 ms, sys: 337 µs, total: 1.72 ms
Wall time: 1.72 ms

vs my Mathematica code

ClearAll[friction, simulateSpring, jacksSpring];
friction = Compile[{{v, _Real}, {vt, _Real}},
   If[v > vt,
        -v3.0,
        -vt3.0*Sign[v]]
   ];
simulateSpring = 
  Compile[{{x0, _Real}, {t, _Real}, {dt, _Real}, {vt, _Real}},
Module[{τ, times, positions, v = 0.0, a = 0.0, x = x0},
τ = t;
times = Range[0.0, t, dt];
positions = ConstantArray[0.0, Length@times];
positions[[1]] = x0/x0;

Do[
    τ = times[[i]];
    a = friction[v, vt] - 100*x;
    v = v + a*dt;
    x = x + v*dt;
    positions[[i]] = x/x0;
 ,
 {i, 2, Length@times}];
{times, positions}
]

];
jacksSpring[x_] := simulateSpring[x, 10.0, 0.0001, 1.0];
Print["CPU time: ", Timing[jacksSpring[0.1]][[1]]*1000, " ms"]

from which we have

CPU time: 27.703 ms

I think it may be AbsoluteTiming that you want, not Timing. — C. E., Jul 04 '20 at 20:57
Right, or RepatedTiming. And with adding SetOptions[Compile, CompilationTarget->"C" ,RuntimeOptions->"Speed"] I get AbsoluteTiming down to about 3 ms, on Windows. — Rolf Mertig, Jul 04 '20 at 21:13
Thanks! Yes, this helps substantially and I get now 2.639 ms on my machine. I'm still interested if anyone has any ideas to beat the python attempt, especially if they can rewrite the code in a more idiomatic fashion. — Diffycue, Jul 04 '20 at 21:19
Maybe someone could try the newer FunctionCompile. Though probably it will be slower than Compile. — Rolf Mertig, Jul 04 '20 at 21:25
^ Agree with Rolf, if it works fine in python, you can also call python from Mathematica. — flinty, Jul 04 '20 at 21:31
@RolfMertig I am about to set out on writing a package for scientific computations and I'd like to better understand the numerical capabilities of Mathematica. I think the end product will be cleaner if it is feasible to keep it all in one language.
By the way, thank you for FeynCalc-- it really helped me out in my QFT courses. — Diffycue, Jul 04 '20 at 21:40
I checked out FunctionCompile, and the results were so close as to not be worth it on my machine (~6.1ms with Compile and the options above, ~6.2ms with FunctionCompile). I had to wrap Range in a KernelFunction as the built-in Native`Range does not support Real arguments, but since that's only called once I would be surprised if that were the bottleneck. Is there some way to replace the Do with some matrix maths? — Carl Lange, Jul 04 '20 at 22:38
One thing that’s not clear to me is whether the Numba timing includes the compilation, because when used with Jupyter it is much faster on subsequent evaluations, presumably because of caching. — C. E., Jul 05 '20 at 09:38
@C.E. I’m not a python pro but my understanding is that the code provided already runs the function once to compile and then times the second run. Without this, the timing scheme will include the compilation time. — Diffycue, Jul 12 '20 at 04:36

score 18 · Accepted Answer · answered Jul 05 '20 at 02:50

18

Based on the experience obtained here:

friction = Compile[{{v, _Real}, {vt, _Real}}, If[v > vt, -v*3.0, -vt*3.0*Sign[v]]];
simulateSpring = 
  Compile[{{x0, _Real}, {t, _Real}, {dt, _Real}, {vt, _Real}}, 
   Module[{τ, times, positions, v = 0.0, a = 0.0, x = x0}, τ = t;
    times = Range[0.0, t, dt];
    positions = Table[0.0, {Length@times}];
    positions[[1]] = x0/x0;
    Do[τ = times[[i]];
     a = friction[v, vt] - 100x;
     v = v + adt;
     x = x + v*dt;
     positions[[i]] = x/x0;, {i, 2, Length@times}];
    {times, positions}], RuntimeOptions -> Speed, CompilationTarget -> C, 
   CompilationOptions -> {InlineExternalDefinitions -> True, 
     InlineCompiledFunctions -> True}];
lib = simulateSpring[[-1]]
Print["CPU time: ", RepeatedTiming[lib[0.1, 10.0, 0.0001, 1.0]][[1]]*1000, " ms"]

CPU time: 1.6 ms

Compiled with TDM GCC 5.1.0-2 64bit, "SystemCompileOptions"->"-Ofast".

RepeatedTiming of the original jacksSpring[0.1] is 20 ms here. I tried timing the python code but failed. (Seems that Numba is not configured correctly on my laptop, the timing is about 0.35 s. )

answered Jul 05 '20 at 02:50

xzczd

65,995
9
163
468

1

I'm surprised it even works to write RuntimeOptions -> Speed and CompilationTarget -> C, usually they're written "Speed" and "C". Nevertheless, I tried this and it gives me 1.26 both with and without quotation marks, whereas Numba gives me 1.23 ms. So it's very close. +1 – C. E. Jul 05 '20 at 08:59
2

@C.E. String options and option values of Compile don't need to be string, so do those of NDSolve, AFAIK. (But properties of FittedModel and methods of InterpolatingFunction need to be string. ) – xzczd Jul 05 '20 at 09:25
@xzczd great! Can we apply this method for PDEs (FEM) in MMA? – ABCDEMMM Jul 08 '20 at 15:02
@ABCDEMMM If you mean FEM built in NDSolve, then high level functions like NDSolve already makes use of Compile internally AFAIK; if you mean self implementation of FEM, just search in this site, you'll see most (if not all) of the answers implementing FEM in a low level way makes use of Compile, here is an example. – xzczd Jul 08 '20 at 15:10
@xzczd it means that if we use build-in NDSolve, every simulation will run directly in c-code? – ABCDEMMM Jul 08 '20 at 22:11
@ABCDEMMM No, but NDSolve is already optimized with various techniques, including but not limited to Compile. BTW, if you're expecting to use Compile on NDSolve, it's impossible, because Compile can handle only a small part of functions, for more info check this: https://mathematica.stackexchange.com/q/1096/1871 – xzczd Jul 09 '20 at 01:44

Writing compiled functions as fast as Python's Numba

1 Answers1