Fast evaluation of a function in many points

Question

I need to feed to an external program a number of points (in Complex128 format) generated from the numerical evaluation of some function, e.g. $e^{i \vec{k}\cdot\vec{x}}$. The function is evaluated at points of some evenly spaced grid. I currently generate the values with Table (possibly multidimensional, and then Flatten'd) and send it to a file with BinaryWrite. However, this is rather slow, because I use a number of points in the order of $2^{24}$ or more. I tried with ParallelTable, but I inspected with perf and it appears that when transferring the data to the main kernel there's a lot of UTF string encoding going on for whatever reason, which defeats the speedup in the parallelization.

Is there any smarter way to accomplish what I need?

I think you need to be more specific. What are the values? Are the values you generate a function of, say, the position in the (multidimensional) table? — Marius Ladegård Meyer, Dec 10 '15 at 11:17
Taking the example in the question, I need to generate the values of $e^{ikx}$ for a fixed $k$ and for $x = 0, .., 100$. I clarified in the question that the function is evaluated on some grid. — Lorenzo Pistone, Dec 10 '15 at 11:25

score 7 · Accepted Answer · edited Apr 13 '17 at 12:55

In cases like this where the function to be evaluated a) has the Listable attribute, and b) is not very compicated, we can get a huge benefit from exploiting this listability. In my experience, exploiting vectorized operations, listability and packed arrays, along with occational compilation to C, usually has a much larger impact on speed than parallellizing.

One dimension

For k = 2.5 and $2^{24}$ points:

First@AbsoluteTiming[ans1 = Exp[I*2.5*Range[2^24]]];
(* 1.270682 *)

First@AbsoluteTiming[ans2 = Table[Exp[I*2.5*x], {x, 2^24}]];
(* 19.620243 *)

First@AbsoluteTiming[ans3 = Exp /@ (I*2.5*Range[2^24])];
(* 3.358687 *)

ans1 == ans2 == ans3
(* True *)

Note that the last one uses the Listable attribute of Times, but not of Exp. On my machine it's still way faster than Table though.

Three dimensions

Since $e^{a+b} = e^a e^b$ we can still do most of the calculation vectorized. Note that we don't need a square grid. For $\vec{k} = \{k_x, k_y, k_z\} = \{2.5, 3.5, 4.5\}$:

(* using Table *)
First@AbsoluteTiming[
  ans1 = Table[Exp[I (k1 x + k2 y + k3 z)]
  , {x, 100}, {y, 101}, {z, 102}];
]
(* 1.979192 *)

(* using Table, but vectorizing over one coordinate *)
First@AbsoluteTiming[
  ans2 = Table[Exp[I (k1 x + k2 y + k3 Range[102])]
  , {x, 100}, {y, 101}];
]
(* 0.225951 *)

(* My favorite *)
First@AbsoluteTiming[
  x = Exp[I k1 Range[100]];
  y = Exp[I k2 Range[101]]; 
  z = Exp[I k3 Range[102]];
  ans3 = Outer[Times, x, y, z];
]
(* 0.028202 *)

They are equal up to roundoff:

Chop[ans1 - ans2] == Chop[ans1 - ans3] == ConstantArray[0, {100, 101, 102}]
(* True *)

More complicated example

When the function to be evaluated is not easily split into parts that depend only on one component, I usually check this list to see if it can be compiled. If for instance the function is

f[x_, y_, z_] := Sin[2x + Exp[I(x + y*z)]]

we find:

First@AbsoluteTiming[ans1 = Table[f[x, y, z], {x, 100}, {y, 101}, {z, 102}];]
(* 4.599261 *)

First@AbsoluteTiming[ans2 = Outer[f, Range[100], Range[101], Range[102]];]
(* 3.056824 *)

(* We can still vectorize over one coordinate *)
First@AbsoluteTiming[ans3 = Table[f[x, y, Range[102]], {x, 100}, {y, 101}];]
(* 1.463473 *)

But here, a compiled function is much faster:

comp = Compile[{},
  Table[Sin[2 x + Exp[I (x + y*z)]]
  ,{x, 100}, {y, 101}, {z, 102}], 
  CompilationTarget -> "C", RuntimeOptions -> "Speed"
];
First@AbsoluteTiming[ans4 = comp[];]
(* 0.145347 *)

Oddly enough the Fourier transform can be competitive until n is quite large. Only applies if the grid divides the unit circle into equal segments though. A modification of the example above should show what I mean. `In[501]:= n = 25; ivec = UnitVector[2^n, 2]; Timing[ans1 = Fourier[ivec];] AbsoluteTiming[ ans2 = Exp[2.PiI*Range[0, 2^n - 1]/2.^n]/Sqrt[2.^n];] Max[Abs[ans1 - ans2]]
Out[503]= {2.229677, Null}

Out[504]= {1.564221, Null}

Out[505]= 1.53329341668*10^-19` — Daniel Lichtblau, Dec 10 '15 at 15:17
I get the gist of it. However I'm thinking how this could be extended to the vector case (or in other words when the function depends on multiple variables). — Lorenzo Pistone, Dec 10 '15 at 20:14

Fast evaluation of a function in many points

1 Answers1

Linked