Behavior remains unchanged through Mathematica version 13.0.0
(Cross-posted on Wolfram Community.)
I've spent much time finding a minimal example demonstrating this problem with FindMinimum. Normally one faces this problem when fitting very large and complicated functions which cannot be posted here. But the following is a simplified case which reproduces this behavior.
This behavior is somehow connected to the scale of the residuals. The essence of the problem seems to be that FindMinimum doesn't increase step size when it should. The outcome is a ridiculously slow minimization process where FindMinimum takes so little steps in the direction of the minimum that it virtually never achieves it. But with a simple workaround it is possible to get FindMinimum coming to the minimum in a few steps. I hope this post will help many people working with FindMinimum and also motivate the developers to improve this function.
The model
Let us define a model function kM[f] which depends on parameters ζ[1], ν0[1], Γ[1], ζ[2], ν0[2], Γ[2], ζ[3]:
ν2f = 29979245800*^-9;
f2ν = 1/ν2f;
oscRe[ν, j] := ((ζ[j] - ζ[j + 1]) (ν0[j]^2 - ν^2) ν0[j]^2)/((ν0[j]^2 - ν^2)^2 + (Γ[j] ν)^2);
oscIm[ν, j] := ((ζ[j] - ζ[j + 1]) Γ[j] ν0[j]^2 ν)/((ν0[j]^2 - ν^2)^2 + (Γ[j] ν)^2);
ϵRe[ζRe_, ζIm_] := ((1 - ζRe) (1 + 2 ζRe) - 2 ζIm^2)/((1 - ζRe)^2 + ζIm^2);
ϵIm[ζRe_, ζIm_] := (3 ζIm)/((1 - ζRe)^2 + ζIm^2);
nIm[εRe_, εIm_] := Sqrt[(Sqrt[εRe^2 + εIm^2] - εRe)/2];
ζReM[f_] = Total[Table[oscRe[f2ν f, j], {j, 2}]];
ζImM[f_] = Total[Table[oscIm[f2ν f, j], {j, 2}]];
εReM[f_] = ϵRe[ζReM[f], ζImM[f]];
εImM[f_] = ϵIm[ζReM[f], ζImM[f]];
kM[f_] = nIm[εReM[f], εImM[f]];
Data, residual vector and initial guess
data1 = Import[
"https://raw.githubusercontent.com/AlexeyPopkov/FindMinimum/master/data1.m"];
(* Uncomment the next line if you want higher speed: the overall picture won't change! *)
(* data1 = data1[[;; ;; 25]]; *)
ζ[1] = 0.2346231895551235`30 (* fixed parameter *)
residualVect = Table[data1[[i, 2]] - kM[ν2f data1[[i, 1]]], {i, Length[data1]}];
init = {{ν0[1], 665.2694828745317914180715698144229362278430.}, {Γ[1], 263.0114435543297811058213734371058867205430.},
{ζ[2], 0.210704184928628552537414816962272593332062980706619687734330.}, {ν0[2], 750}, {Γ[2], 100}, {ζ[3], 0.210704184928628552537414816962272593332062980706619687734330.}};
Note that all the parameters have precision 30 or higher as well as data1:
Precision[data1]
30
An auxiliary function
This function passes initial guess and options to FindMinimum, monitors the process of minimization and returns statistics and achieved minimum:
findMinimum[init_, opts__] :=
Module[{startTime}, steps = 0; PrintTemporary[Dynamic[steps]];
startTime = AbsoluteTime[];
min = FindMinimum[, init, opts];
{DateString[AbsoluteTime[] - startTime, {"Hour", ":", "Minute", ":", "Second"}], steps,
Block[{ζ, ν0, Γ}, Set @@@ min[[2]]; Total[residualVect^2]],
NumberForm[min[[2]], 5]}]
Note that the variable min is not scoped by Module on purpose.
Testing
All the timings here are for Mathematica 8.0.4. With versions from 9 to 10.4.1 these timings are more than 15 times slower on the same machine as compared to version 8.0.4 (the bug is fixed in version 11.0.0). You can uncomment the commented code line in the "Data, residual vector and initial guess" section in order to get higher speed: the overall picture won't change, but in this case you should also pay attention to the achieved minimum in every case - it will be telling!
The default setup seems to be working well in this simplified example:
findMinimum[init, MaxIterations -> 500, WorkingPrecision -> 20, PrecisionGoal -> 3,
StepMonitor :> ++steps, Method -> {"LevenbergMarquardt", "Residual" -> residualVect}]
{"00:02:39", 220, 0.00405321003823167, {ν0[1]->406.18, Γ[1]->346.16, ζ[2]->0.22879, ν0[2]->666.41, Γ[2]->239.54, ζ[3]->0.20278}}
But what if we multiply residualVect by 10:
findMinimum[init, MaxIterations -> 500, WorkingPrecision -> 20, PrecisionGoal -> 3,
StepMonitor :> ++steps, Method -> {"LevenbergMarquardt", "Residual" -> 10 residualVect}]
{"00:00:43", 51, 0.00405321003823167, {ν0[1]->406.18, Γ[1]->346.16, ζ[2]->0.22879, ν0[2]->666.41, Γ[2]->239.54, ζ[3]->0.20278}}
We have got the same minimum in 51 steps instead of 220 steps!
And what if we divide residualVect by 2:
findMinimum[init, MaxIterations -> 2000, WorkingPrecision -> 20, PrecisionGoal -> 3,
StepMonitor :> ++steps, Method -> {"LevenbergMarquardt", "Residual" -> residualVect/2}]
{"00:12:37", 1005, 0.00405321003823167, {ν0[1]->406.18, Γ[1]->346.16, ζ[2]->0.22879, ν0[2]->666.41, Γ[2]->239.54, ζ[3]->0.20278}}
Now we have got the same in 1005 steps! If we will divide residualVect by 20, even 3000 steps won't allow us to get this minimum (I have tried)!
The workaround
If we restrict FindMinimum to 100 iterations and then feed the results (saved in the global variable min) to the next FindMinimum we get the minimum in a reasonable time even if we divide residualVect by 20:
Quiet@findMinimum[init, MaxIterations -> 100, WorkingPrecision -> 20, PrecisionGoal -> 3,
StepMonitor :> ++steps,
Method -> {"LevenbergMarquardt", "Residual" -> residualVect/20}];
findMinimum[List @@@ min[[2]], MaxIterations -> 100, WorkingPrecision -> 20,
PrecisionGoal -> 3, StepMonitor :> ++steps,
Method -> {"LevenbergMarquardt", "Residual" -> residualVect/20}]
{"00:00:26", 20, 0.00405321003823471, {ν0[1]->406.18, Γ[1]->346.16, ζ[2]->0.22879, ν0[2]->666.41, Γ[2]->239.54, ζ[3]->0.20278}}
Now we have got the minimum in 100 + 20 = 120 steps!
The question
Why this happens? Is correct my hypothesis that FindMinimum for some reason can't increase the step size? Is it a bug? And the most important: is there a way to overcome this problem and always get the result in as little number of steps as possible?