General Context
Apparently, the memory leak in NIntegrate is unavoidable pre-MMA-10.2. I have some (two-dimensional) numerical integration to do on a complicated function (LeafCount[g] is approximately 70 million or so) that is a function of a handful of parameters. Specifically, there is a function g[kx_, ky_, x_, y_, params__] for which I run
ParallelTable[
NIntegrate[
g[kx, ky, x, y, params]
, {kx, 0, <large number>}
, {ky, 0, <large number>}
]
, {x, 0, 10, 0.2}
, {y, 0, 10, 0.2}
]
where <large number> is large enough to guarantee reasonable convergence for the integral (the upper limit is actually infinite, and the function goes to zero at a rate of approximately 1/k^3). Run on 8 kernels for a single choice of params, every single one of these integrals typically runs about 100 seconds, and the entire evaluation of the Table ends up eating up about 50 GB (of a total 64 GB) of RAM, and since NIntegrate doesn't release this memory, the computer crashes once the integration for the second set of parameters starts up. Of course, I have many different parameter sets I'd like to run, so I'd like this to be automated to run over a weekend or something rather than have to Quit[] and run for a different set of parameters by hand. Some day, perhaps, I'll get access to 10.2, but for now I have to look for work-arounds.
Specific Context of This Question
Due to this, I am trying my best to cut down on how much work NIntegrate has to do in order to bring down the RAM usage. I am currently simplifying the function "by hand" as much as possible, but I hit on one possible idea that might help out.
The idea is that within the function g, there are repeated uses of another function f that has a somewhat large LeafCount on its own (maybe 2000 or so), and the repeated use of f inside g is part of the reason g has such a large LeafCount when I substitute f into g. My thought was to leave f in (rather than substitute in the corresponding expression involving kx, ky, x, y, and params) and somehow make a single call to f when evaluating g at a particular value in order to speed things up. One thought I had was to use memoization, as in the example below. Another possibility is to define g with a Symbol h in place of f[vars__], and define
g1[vars__] := Block[{h = f[vars]}, g[vars]];
Questions
(1) What is the best way to go about speeding up the evaluation of a function g with a large LeafCount due to a function f with large LeafCount appearing in the definition of g multiple times?
(2) What is the best way to accomplish (1) when the function g is to be NIntegrated?
I suspect that the answer to this second question depends a lot on the actual structure of both f and g. Given LeafCount of g is approximately 70 million, I'm willing to bet you don't want to see the actual definitions, so the best I can say is that there are a lot of nested square roots (my test function below does not).
Note that I have an answer below, but I welcome more answers, since this was the first thing that I hit upon, and it may not work for my actual function.
Minimal Examples, Possible Solutions, and Exploration
Let's take as our sample function
g[x_] = Nest[f[x] + 1./# &, f[x], 500];
which is a continued fraction of depth-500 with f[x] appearing at every level. For the purposes of having a function f that takes a little bit of time to compute, let's use
f[x_] = Sum[1/100 Erfc[-(x^2/k)], {k, 100}];
Now, we note the following. (All timings were done with MMA 10.0.2.0 on a MacBook Air running OSX 10.10.5.)
Evaluation of g
There is a factor of ~50 speed-up for a memo-ized version of f over f defined using either Set (even more for SetDelayed):
Clear[f]
f[x_] = Sum[1/100 Erfc[-(x^2/k)], {k, 100}];
First@AbsoluteTiming@g[5.]
(* 0.195406 *)
Clear[f]
f[x_] := Sum[1/100 Erfc[-(x^2/k)], {k, 100}]
First@AbsoluteTiming@g[5.]
(* 0.285290 *)
Clear[f]
f[x_] := f[x] = Sum[1/100 Erfc[-(x^2/k)], {k, 100}];
First@AbsoluteTiming@g[5.]
(* 0.004211 *)
The Block solution gives a 6x speed-up over the Set version:
ClearAll[f, g]
g[x_] = Nest[h + 1./# &, h, 500];
f[x_] = Sum[1/100 Erfc[-(x^2/k)], {k, 100}];
g1[x_] := Block[{h = f[x]}, g[x]]
First@AbsoluteTiming@g1[5.]
(* 0.040647 *)
Plotting g
There is a factor of ~70 speed for the memo-ized version of f:
Clear[f]
f[x_] := Sum[1/100 Erfc[-(x^2/k)], {k, 100}]
First@AbsoluteTiming@Plot[g[x], {x, 0, 5}]
(* 46.138497 *)
Clear[f]
f[x_] := f[x] = Sum[1/100 Erfc[-(x^2/k)], {k, 100}]
First@AbsoluteTiming@Plot[g[x], {x, 0, 5}]
Length@DownValues@f
(* 0.652008 *)
(* 161 *)
f[x] was evaluated 161 times.
The Block solution gives a factor of 6 speed-up over the non-memoized version:
ClearAll[f, g]
g[x_] = Nest[h + 1./# &, h, 500];
f[x_] = Sum[1/100 Erfc[-(x^2/k)], {k, 1, 100}];
g1[x_] := Block[{h = f[x]}, g[x]]
First@AbsoluteTiming@Plot[g1[x], {x, 0, 5}]
(* 5.879858 *)
NIntegrate g
The memo-ized version doesn't help much at all, and I think that is because NIntegrate constructs the symbolic function g completely before it goes through the numerical integration process; that is, f[x] gets plugged into g[x] before the integration occurs, which defeats the whole purpose:
Clear[f]
f[x_] = Sum[1/100 Erfc[-(x^2/k)], {k, 100}]
First@AbsoluteTiming@NIntegrate[g[x], {x, 0, 5.}]
(* 0.824581 *)
Clear[f]
f[x_] := f[x] = Sum[1/100 Erfc[-(x^2/k)], {k, 100}]
First@AbsoluteTiming@NIntegrate[g[x], {x, 0, 5.}]
{First@#, Length@#}& @ DownValues@f
(* 0.512885 *)
(* {HoldPattern[f[5.]] :> 1.59104, 3} *)
DownValues of f includes only f[5.], which seems to verify my hypothesis.
The Block version doesn't help, again probably because it all gets plugged in before NIntegrate occurs:
ClearAll[f, g]
g[x_] = Nest[h + 1./# &, h, 500];
f[x_] = Sum[1/100 Erfc[-(x^2/k)], {k, 100}];
g1[x_] := Block[{h = f[x]}, g[x]]
First@AbsoluteTiming@NIntegrate[g1[x], {x, 0, 5.}]
(* 0.531379 *)
NIntegrate[...,Method->{Automatic,"SymbolicProcessing"->0}]I could again gain a factor of ~2 speed up for the integration. Maybe that also helps a little – Lukas Aug 27 '15 at 17:11