How to speed up the evaluation of a function with a large LeafCount but many repeated sub-functions for use in NIntegrate

Question

General Context

Apparently, the memory leak in NIntegrate is unavoidable pre-MMA-10.2. I have some (two-dimensional) numerical integration to do on a complicated function (LeafCount[g] is approximately 70 million or so) that is a function of a handful of parameters. Specifically, there is a function g[kx_, ky_, x_, y_, params__] for which I run

ParallelTable[
  NIntegrate[
    g[kx, ky, x, y, params]
    , {kx, 0, <large number>}
    , {ky, 0, <large number>}
  ]
  , {x, 0, 10, 0.2}
  , {y, 0, 10, 0.2}
]

where <large number> is large enough to guarantee reasonable convergence for the integral (the upper limit is actually infinite, and the function goes to zero at a rate of approximately 1/k^3). Run on 8 kernels for a single choice of params, every single one of these integrals typically runs about 100 seconds, and the entire evaluation of the Table ends up eating up about 50 GB (of a total 64 GB) of RAM, and since NIntegrate doesn't release this memory, the computer crashes once the integration for the second set of parameters starts up. Of course, I have many different parameter sets I'd like to run, so I'd like this to be automated to run over a weekend or something rather than have to Quit[] and run for a different set of parameters by hand. Some day, perhaps, I'll get access to 10.2, but for now I have to look for work-arounds.

Specific Context of This Question

Due to this, I am trying my best to cut down on how much work NIntegrate has to do in order to bring down the RAM usage. I am currently simplifying the function "by hand" as much as possible, but I hit on one possible idea that might help out.

The idea is that within the function g, there are repeated uses of another function f that has a somewhat large LeafCount on its own (maybe 2000 or so), and the repeated use of f inside g is part of the reason g has such a large LeafCount when I substitute f into g. My thought was to leave f in (rather than substitute in the corresponding expression involving kx, ky, x, y, and params) and somehow make a single call to f when evaluating g at a particular value in order to speed things up. One thought I had was to use memoization, as in the example below. Another possibility is to define g with a Symbol h in place of f[vars__], and define

g1[vars__] := Block[{h = f[vars]}, g[vars]];

Questions

(1) What is the best way to go about speeding up the evaluation of a function g with a large LeafCount due to a function f with large LeafCount appearing in the definition of g multiple times?

(2) What is the best way to accomplish (1) when the function g is to be NIntegrated?

I suspect that the answer to this second question depends a lot on the actual structure of both f and g. Given LeafCount of g is approximately 70 million, I'm willing to bet you don't want to see the actual definitions, so the best I can say is that there are a lot of nested square roots (my test function below does not).

Note that I have an answer below, but I welcome more answers, since this was the first thing that I hit upon, and it may not work for my actual function.

Minimal Examples, Possible Solutions, and Exploration

Let's take as our sample function

g[x_] = Nest[f[x] + 1./# &, f[x], 500];

which is a continued fraction of depth-500 with f[x] appearing at every level. For the purposes of having a function f that takes a little bit of time to compute, let's use

f[x_] = Sum[1/100 Erfc[-(x^2/k)], {k, 100}];

Now, we note the following. (All timings were done with MMA 10.0.2.0 on a MacBook Air running OSX 10.10.5.)

Evaluation of `g`

There is a factor of ~50 speed-up for a memo-ized version of f over f defined using either Set (even more for SetDelayed):

Clear[f]
f[x_] = Sum[1/100 Erfc[-(x^2/k)], {k, 100}];
First@AbsoluteTiming@g[5.]
(* 0.195406 *)

Clear[f]
f[x_] := Sum[1/100 Erfc[-(x^2/k)], {k, 100}]
First@AbsoluteTiming@g[5.]
(* 0.285290 *)

Clear[f]
f[x_] := f[x] = Sum[1/100 Erfc[-(x^2/k)], {k, 100}];
First@AbsoluteTiming@g[5.]
(* 0.004211 *)

The Block solution gives a 6x speed-up over the Set version:

ClearAll[f, g]
g[x_] = Nest[h + 1./# &, h, 500];
f[x_] = Sum[1/100 Erfc[-(x^2/k)], {k, 100}];
g1[x_] := Block[{h = f[x]}, g[x]]
First@AbsoluteTiming@g1[5.]
(* 0.040647 *)

Plotting `g`

There is a factor of ~70 speed for the memo-ized version of f:

Clear[f]
f[x_] := Sum[1/100 Erfc[-(x^2/k)], {k, 100}]
First@AbsoluteTiming@Plot[g[x], {x, 0, 5}]
(* 46.138497 *)

Clear[f]
f[x_] := f[x] = Sum[1/100 Erfc[-(x^2/k)], {k, 100}]
First@AbsoluteTiming@Plot[g[x], {x, 0, 5}]
Length@DownValues@f
(* 0.652008 *)
(* 161 *)

f[x] was evaluated 161 times.

The Block solution gives a factor of 6 speed-up over the non-memoized version:

ClearAll[f, g]
g[x_] = Nest[h + 1./# &, h, 500];
f[x_] = Sum[1/100 Erfc[-(x^2/k)], {k, 1, 100}];
g1[x_] := Block[{h = f[x]}, g[x]]
First@AbsoluteTiming@Plot[g1[x], {x, 0, 5}]
(* 5.879858 *)

`NIntegrate` `g`

The memo-ized version doesn't help much at all, and I think that is because NIntegrate constructs the symbolic function g completely before it goes through the numerical integration process; that is, f[x] gets plugged into g[x] before the integration occurs, which defeats the whole purpose:

Clear[f]
f[x_] = Sum[1/100 Erfc[-(x^2/k)], {k, 100}]
First@AbsoluteTiming@NIntegrate[g[x], {x, 0, 5.}]
(* 0.824581 *)

Clear[f]
f[x_] := f[x] = Sum[1/100 Erfc[-(x^2/k)], {k, 100}]
First@AbsoluteTiming@NIntegrate[g[x], {x, 0, 5.}]
{First@#, Length@#}& @ DownValues@f
(* 0.512885 *)
(* {HoldPattern[f[5.]] :> 1.59104, 3} *)

DownValues of f includes only f[5.], which seems to verify my hypothesis.

The Block version doesn't help, again probably because it all gets plugged in before NIntegrate occurs:

ClearAll[f, g]
g[x_] = Nest[h + 1./# &, h, 500];
f[x_] = Sum[1/100 Erfc[-(x^2/k)], {k, 100}];
g1[x_] := Block[{h = f[x]}, g[x]]
First@AbsoluteTiming@NIntegrate[g1[x], {x, 0, 5.}]
(* 0.531379 *)

By using NIntegrate[...,Method->{Automatic,"SymbolicProcessing"->0}] I could again gain a factor of ~2 speed up for the integration. Maybe that also helps a little — Lukas, Aug 27 '15 at 17:11
@Lukas. It's worth a shot, although I think some symbolic processing is good, as shown by the answer below: it seems like it might be a balancing act. — march, Aug 27 '15 at 17:13
That's definitely true. I'm not experienced enough to find that balancing, though. But I am very curious if someone will... Or if someone has a completely different way to solve it — Lukas, Aug 27 '15 at 17:19

score 5 · Answer 1 · edited Jun 16 '20 at 09:23

Compile `f` and use a memo-ized version of it

Since it seems like NIntegrate decides to symbolically evaluate its argument first, I thought I'd force it not to by compiling the function f. This seems to make a significant difference:

Clear[f, f1, g]
g[x_] = Nest[f[x] + 1./# &, f[x], 500];
f1 = Compile[{x}, Sum[1/100 Erfc[-(x^2/k)], {k, 100}]];
f[x_?NumericQ] := f1[x]
First@AbsoluteTiming@NIntegrate[g[x], {x, 0, 5.}]
(* 0.049145 *)

This is a factor of 10 speed-up over the versions in the question. We can also memo-ize f, leading to slightly more speed-up, and we can in the mean-time verify that the DownValues actually get assigned:

Clear[f, f1, g]
g[x_] = Nest[f[x] + 1./# &, f[x], 500];
f1 = Compile[{x}, Sum[1/100 Erfc[-(x^2/k)], {k, 100}]];
f[x_?NumericQ] := f[x] = f1[x]
First@AbsoluteTiming@NIntegrate[g[x], {x, 0, 5.}]
Length@DownValues@f
(* 0.040601 *)
(* 35 *)

Now, I expect that memoizing f in this way is a bad idea, due to NIntegrate running many times over for different parameter sets, and so a huge number of DownValues will eventually be attached to f, possible leading to slow-down. It may be reasonable to Block off the memoization to a single call to NIntegrate.

Interestingly,

the memoized version doesn't provide extra speed-up once called, even if the integration is exactly the same:

g[x_] = Nest[f[x] + 1./# &, f[x], 500];
f1 = Compile[{x}, Sum[1/100 Erfc[-(x^2/k)], {k, 100}]];
f[x_?NumericQ] := f[x] = f1[x]
First@AbsoluteTiming@NIntegrate[g[x], {x, 0, 5.}]
First@AbsoluteTiming@NIntegrate[g[x], {x, 0, 5.}]
(* 0.041787 *)
(* 0.040696 *)

I definitely don't understand why this doesn't make a difference, since the x-values chosen are definitely the same, and so the second NIntegrate calls g on these values, and so uses f evaluated at these values, which it has already done.

On the other hand, if we Compile g instead and memoize, the first integration takes a long time, but after that it is very quick:

ClearAll[g, g1, g2]
g2[x_] = Nest[Sum[1/100 Erfc[-(x^2/k)], {k, 100}] + 1./# &, Sum[1/100 Erfc[-(x^2/k)], {k, 100}], 500];
g1 = Compile[{x}, g2[x]];
g[x_?NumericQ] := g[x] = g1[x];
First@AbsoluteTiming@NIntegrate[g[x], {x, 0, 5.}]
First@AbsoluteTiming@NIntegrate[g[x], {x, 0, 5.}]
(* 3.471362 *)
(* 0.003585 *)

I suspect that the integral takes so much longer because NIntegrate can't do any examinations of the symbolic structure of g at the beginning and so can't decide how to cleverly evaluate the integral. In any case, this would be useful if we could guarantee that NIntegrate would choose the same x-values on which to evaluate g. This is not the case in general, of course.

How to speed up the evaluation of a function with a large LeafCount but many repeated sub-functions for use in NIntegrate

General Context

Specific Context of This Question

Questions

Minimal Examples, Possible Solutions, and Exploration

Evaluation of g

Plotting g

NIntegrate g

1 Answers1

Compile f and use a memo-ized version of it

Evaluation of `g`

Plotting `g`

`NIntegrate` `g`

Compile `f` and use a memo-ized version of it