NonlinearModelFit runs too slow when fitting data to a function that does multiple numerical integrations

Question

I have a function of several parameters defined in a module which does multiple numerical integrations. Then I try to fit that function to some sample data via with NonlinearModelFit, varying several parameters, using differential evolution as the minimization method, and applying constraints. The problem I have run into is that it takes approximately an hour for NonlinearModelFit to find a solution. Since playing with initial guesses is necessary to get a decent fit, this makes it too slow to be useful.

Below, fTest is a a simplified version of the model function I describe above. This function depends only on one parameter, D. It contains the same numerical integrals as my original function but I have removed all the other operations for simplicity. When I try to use it for fitting with NonlinearModelFit, it now takes approximately 7-10 minutes to find a solution.

particleSize = {a2 -> 3, b2 -> 2, c2 -> 1};

fTest[f_, D_?NumericQ] := 
   Module[{a, b, c, prefactor, Lx, Ly, Lz},
     a = a2 /. particleSize;
     b = b2 /. particleSize;
     c = c2 /. particleSize;
     prefactor = (Sqrt[a^2 + D] Sqrt[b^2 + D] Sqrt[c^2 + D])/2;
     Lx = 
       Re[
         prefactor 
           NIntegrate[
             1/((s + a^2 + D) Sqrt[(s + a^2 + D) (s + b^2 + D) (s + c^2 + D)]), 
             {s, 0, ∞}, 
             WorkingPrecision -> 10]];
     Ly = 
       Re[
         prefactor 
           NIntegrate[
             1/((s + b^2 + D) Sqrt[(s + a^2 + D) (s + b^2 + D) (s + c^2 + D)]), 
             {s, 0, ∞}, 
             WorkingPrecision -> 10]];
     Lz = 
       Re[
         prefactor 
           NIntegrate[
             1/((s + c^2 + D) Sqrt[(s + a^2 + D) (s + b^2 + D) (s + c^2 + D)]), 
             {s, 0, ∞}, 
             WorkingPrecision -> 10]];
     (*We know that Lx+Ly+Lz = 1, the below is for testing*)
     f (Lx + Ly + Lz)]

Testing

fTest[10, 0.1]

Generate some fake data to fit to the above function:

dataTest = Table[{n, n + RandomReal[{-0.6, 0.6}]}, {n, 1, 10}];
errorDataTest = 1 & /@ Range[10];

Then I do some non-linear fitting:

AbsoluteTiming[
  nlm = 
    NonlinearModelFit[
      dataTest,{fTest[f, Delta], 0 < Delta < 1}, 
      {{Delta, 0.085}}, f, 
      Weights -> errorDataTest , 
      Method -> {NMinimize, Method -> {"DifferentialEvolution"}}]]

The non-linear fitting takes more than 7-10 min to run, depending of the noise of the data.

Reducing WorkingPrecision to 5 instead of 10 introduces small errors and does not speed the calculation up.

To look at the fit results:

 Show[
   ListPlot[dataTest], 
   Plot[nlm[f], {f, 1, 10}, PlotStyle -> Orange], 
   PlotRange -> All]

 nlm["ParameterTable"]

 Grid[Transpose[{#, nlm[#]} &[{"AdjustedRSquared", "AIC", "BIC", "RSquared"}]], 
   Alignment -> Left]

I have also tested doing a fit varying two parameters instead of one. Defining a second test function, fTest2, with two parameters:

fTest2[f_, D_?NumericQ, offset_] := 
   Module[{a, b, c, prefactor, Lx, Ly, Lz},
     a = a2 /. particleSize;
     b = b2 /. particleSize;
     c = c2 /. particleSize;
     prefactor = (Sqrt[a^2 + D] Sqrt[b^2 + D] Sqrt[c^2 + D])/2;
     Lx = Re[prefactor NIntegrate[1/((s + a^2 + D) Sqrt[(s + a^2 + D) (s + b^2 + D) (s + c^2 + D)]), {s, 0, ∞}, WorkingPrecision -> 5]];
     Ly = Re[prefactor NIntegrate[1/((s + b^2 + D) Sqrt[(s + a^2 + D) (s + b^2 + D) (s + c^2 + D)]), {s, 0, ∞}, WorkingPrecision -> 5]];
     Lz = Re[prefactor NIntegrate[1/((s + c^2 + D) Sqrt[(s + a^2 + D) (s + b^2 + D) (s + c^2 + D)]), {s, 0, ∞}, WorkingPrecision -> 5]];
    (*We know that Lx+Ly+Lz = 1, the below is for testing*)
    offset + f (Lx + Ly + Lz)]

And now trying a non-linear fit varying two parameters instead of one:

AbsoluteTiming[
  nlm = 
    NonlinearModelFit[
      dataTest, {fTest2[f, Delta, offset1], 0 < Delta < 1, 0 < offset1 < 1},
      {{Delta, 0.085}, {offset1, 0}}, f,
      Weights -> errorDataTest ,
      Method -> {NMinimize, Method -> {"DifferentialEvolution"}}]]

The above takes approx. 25 minutes.

For the actual calculations I need to do (as opposed to the above test functions) the original function depends on various parameters and when I try the non-linear fitting varying four parameters, it takes around an hour. Since the fitting is very sensitive to initial guesses, I would need to play a bit with the guesses and also ideally with the parameters of the DifferentialEvolution method ("ScalingFactor", "CrossProbability"), but I cannot do this if it takes approximately one hour per attempt.

Is there any way to improve the speed or is this it?

After the comments I now have a faster 2-parameter test function (fTest4 below) that does Lz=1-Lx-Ly, and also I have turned SymbolicProcessing off in NIntegrate. The two changes make the two-parameter fitting (see below) take approx. 6 minutes instead of approx. 25 minutes for fTest2 above. So that's already a substantial improvement:

fTest4[f_, D_?NumericQ, offset_] := Module[{a, b, c, prefactor, Lx, Ly, Lz},
 a = a2 /. particleSize;
 b = b2 /. particleSize;
 c = c2 /. particleSize;
 prefactor = (Sqrt[a^2 + D] Sqrt[b^2 + D] Sqrt[c^2 + D])/2;
 Lx = Re[prefactor NIntegrate[1/((s + a^2 + D) Sqrt[(s + a^2 + D) (s + b^2 + D) (s + c^2 + D)]), {s, 0, \[Infinity]}, WorkingPrecision -> 10, 
  Method -> {Automatic, "SymbolicProcessing" -> 0}]];
 Ly = Re[prefactor NIntegrate[1/((s + b^2 + D) Sqrt[(s + a^2 + D) (s + b^2 + D) (s + c^2 + D)]), {s, 0, \[Infinity]}, WorkingPrecision -> 10, 
  Method -> {Automatic, "SymbolicProcessing" -> 0}]];
 Lz = 1 - Lx - Ly;
 (*We know that Lx+Ly+Lz = 1, the below is for testing*)
 offset + f (Lx + Ly + Lz)
]

Two-parameter fitting:

AbsoluteTiming[nlm = NonlinearModelFit[dataTest,
 {fTest4[f, Delta, offset1], 0 < Delta < 1, 0 < offset1 < 1},
 {{Delta, 0.085}, {offset1, 0}},
 f,
 Weights -> errorDataTest ,
 Method -> {NMinimize, Method -> {"DifferentialEvolution"}}
]]

Any more suggestions for speeding this up?

Since you know that Lx+Ly+Lz == 1 there is no need to calculate all three integrals. Calculate Lx and Ly then set Lz=1-Lx-Ly — Bob Hanlon, Aug 24 '17 at 20:15
Try to find a resonably accurate quadrature rule (evaluation points and weights) and apply it instead of NIntegrate. This should speed things up considerably. This may also reduce numerical noise (originating in subdivision strategies of NIntegrate) and help NonlinearModelFit to estimate derivates. The method choice here is probably the Gauss-Newton method (or variants of it) which requires first derivatives with respect to the unknown parameters. However, I am not familiar with the internals of NonlinearModelFit... — Henrik Schumacher, Aug 24 '17 at 20:24
@BobHanlon Many thanks. This is an obvious step to speed things up that I should have already included. The two-parameter solution now takes ~17 minutes instead of ~25min. Bit by bit. Many thanks. — ILLG, Aug 25 '17 at 10:46
@HenrikSchumacher What exactly do you mean by a quadrature rule to substitute NIntegrate? Could you point me to somewhere that explains this method? Many thanks. — ILLG, Aug 25 '17 at 11:08
RE @HenrikSchumacher comment:
In numeric integration we approximate the integral of a function as $\int {a}^{b}f(x),dx=\sum _{i=1}^{n}w{i}f(x_{i})$. I think what the suggestion was hinting at was to find an appropriate rule, both weighting function and number of integration points, such that the integration can be approximated without the use of NIntegrate. — Marchi, Aug 25 '17 at 14:58
@Marchi: Exactly. I tried to give an example using Gauss quadrature rules as an answer. — Henrik Schumacher, Aug 25 '17 at 15:03

score 2 · Answer 1 · edited Aug 26 '17 at 00:53

This is sort of an expansion of my comment above.

Function to integrate:

F = {s, a, b, c, d} \[Function] (Sqrt[a^2 + d] Sqrt[b^2 + d] Sqrt[c^2 + d])/2/((s + a^2 + d) Sqrt[(s + a^2 + d) (s + b^2 + d) (s + c^2 + d)]);

Transformation of the unit interval to $[0,\infty]$:

\[Phi] = t \[Function] Evaluate[Simplify[1/(1 - t) - 1]];

Computation of evaluation points and integration weights by using Gauss quadrature and transformation formula for integrals (incorporating @J.M.'s comment):

m = 3;
n = 100;
{p, \[Omega]} = Developer`ToPackedArray[N[Most[NIntegrate`GaussRuleData[m, 16]]]]/n;
(* quadrature data for the unit interval *)
pts0 = Join @@ Table[p + i/n, {i, 0, n - 1}];
weights0 = Join @@ ConstantArray[\[Omega], n];
(* quadrature data for the interval from 0 to Infinity *)
pts = (\[Phi] /@ pts0);
weights = (\[Phi]' /@ pts0) weights0;

Compilation of function:

Block[{s, a, b, c, d},
  With[{code = F[s, a, b, c, d]},
   cF = Compile[{{s, _Real}, {a, _Real}, {b, _Real}, {c, _Real}, {d, _Real}}, code,
     CompilationTarget -> "C",
     RuntimeAttributes -> {Listable},
     Parallelization -> True
     ]
   ]
];

Test example:

a = 3.; b = 2.; c = 1.; d = 1.;
bla = Simplify[Integrate[F[s, a, b, c, d], s]];
int = F[s, a, b, c, d];
trueLx = Limit[bla, s -> \[Infinity]] - Limit[bla, s -> 0]; //AbsoluteTiming // First
Lx = Re[ NIntegrate[int, {s, 0, \[Infinity]}, WorkingPrecision -> 5]];// AbsoluteTiming // First
myLx = weights.cF[pts, a, b, c, d]; // AbsoluteTiming // First
trueLx - myLx
trueLx - Lx

(* 0.001997 *)
(* 0.003963 *)
(* 0.000082 *)
(* -0.0000127242 *)
(* -0.0000624892 *)

Thus, with comparable error, we have a 50x speed-up. Accuracy can be increased by increasing m and n. See also the documentation of NumericalDifferentialEquationAnalysis`GaussianQuadratureWeights (this function is another way to compute the Gauss quadrature points and weights).

However, this method relies on the assumption that the integrant is several times differentiable (2 m-fold) and has no singularities in $[0,\infty]$. Moreover, it is hard to obtain more that single precision accuracy with this method.

Many thanks for your answer. I got a 4-5 times increase in speed by doing Lz = 1 - Lx - Ly and by turning off SymbolicProcessing in NIntegrate, so that was enough to get to do the fits I needed for now. I will revisit and implement your answer in a few weeks when I run more tests. Many thanks! — ILLG, Sep 11 '17 at 09:20

NonlinearModelFit runs too slow when fitting data to a function that does multiple numerical integrations

1 Answers1