When using NMinimize[...,Method->"DifferentialEvolution"] is an "iteration" different than a "step"?

Question

Hopefully this question has a simple answer. I have the following bit of code:

fits = Quiet[
Check[
NMinimize[
{ssd, And @@ cons}, vars, 
AccuracyGoal -> accur, 
PrecisionGoal -> Infinity, 
Method -> {"DifferentialEvolution", "ScalingFactor" -> 1.0, "CrossProbability" -> 0.0}, 
MaxIterations -> maxIter, 
StepMonitor :> (fitStep++)
], {}
]
];

where the variables accur and maxIter are set elsewhere. I've noticed that fitStep sometimes exceeds maxIter during the minimization, which seems to indicate that an "iteration" and a "step" are not the same thing in this context. If they're different, is there a way to monitor the "iteration" and not the "step"? I just want to be able to limit the fit to a certain number of iterations (to prevent it from going on too long) and monitor the progress against that limit.

EDIT: 16 May 2013

Alright I've put together a self-contained chunk of code to demonstrate the MaxIterations issue. I know it's not the most basic example but it's sure to reproduce what I'm seeing. The chunk of code below is commented, but just to summarize: I have some spectral data that I'm trying to fit with a sum of Logistic Power Peak functions using the Differential Evolution method. The important variables here are vars, cons, args, and ssd, as those are the arguments fed to NMinimize[]. The ssd variable contains a very large expression so evaluate it with caution.

Evaluate the following expression first in a separate cell to watch the progress of the minimization.

(* Watch the progress here. *)

Dynamic["Iterating: " <> ToString[fitStep] <> " / " <> ToString[maxIter] <> " ..."]

Change maxIter and the "PostProcess" option inside NMinimize[] (bottom) to see the various behaviors.

(* Set the number of iterations. *)

maxIter = 500;

(* Here's some data I'm trying to fit. *)

pts={
    {525,  3.95}, {527,  5.07}, {529,  5.15}, {531,  5.58}, {533,  5.85},
    {535,  6.53}, {537,  7.01}, {539,  6.79}, {541,  7.01}, {543,  7.43},
    {545,  7.18}, {547,  6.92}, {549,  6.34}, {551,  7.21}, {553,  7.75},
    {555,  7.65}, {557,  8.71}, {559,  8.89}, {561,  9.02}, {563,  9.72},
    {565,  9.62}, {567,  9.88}, {569, 10.50}, {571,  9.65}, {573,  9.71},
    {575,  8.52}, {577,  7.70}, {579,  8.15}, {581,  8.29}, {583, 10.00},
    {585, 10.00}, {587,  9.28}, {589, 10.80}, {591, 11.90}, {593, 13.80},
    {595, 14.90}, {597, 14.30}, {599, 15.30}, {601, 16.20}, {603, 16.30},
    {605, 17.90}, {607, 17.70}, {609, 19.20}, {611, 20.40}, {613, 20.80},
    {615, 21.40}, {617, 20.30}, {619, 18.40}, {621, 19.80}, {623, 19.90},
    {625, 20.70}, {627, 19.80}, {629, 18.60}, {631, 19.40}, {633, 20.40},
    {635, 20.50}, {637, 23.10}, {639, 22.50}, {641, 25.20}, {643, 26.40},
    {645, 26.10}, {647, 25.60}, {649, 25.20}, {651, 23.30}, {653, 23.10},
    {655, 23.00}, {657, 24.00}, {659, 22.70}, {661, 22.00}, {663, 21.70},
    {665, 21.30}, {667, 19.60}, {669, 20.30}, {671, 17.60}, {673, 18.20},
    {675, 17.40}, {677, 16.10}, {679, 15.60}, {681, 14.10}, {683, 12.20},
    {685, 12.10}, {687, 13.50}, {689, 13.90}, {691, 12.50}, {693, 12.00},
    {695, 11.60}, {697, 11.10}, {699,  9.41}, {701,  8.56}, {703,  6.69},
    {705,  6.50}, {707,  7.20}, {709,  7.03}, {711,  6.75}, {713,  6.01},
    {715,  6.97}, {717,  7.54}, {719,  7.76}, {721,  7.47}, {723,  7.33},
    {725,  6.93}, {727,  6.64}, {729,  6.25}, {731,  4.93}, {733,  4.67},
    {735,  4.06}, {737,  4.17}, {739,  4.90}, {741,  4.64}, {743,  4.63},
    {745,  5.11}, {747,  4.75}, {749,  4.45}
    };

(* This is the Logistic Power Peak (LPP) function. *)
(* I'm fitting with a sum of these (the sum is handled below). *)
(* #1 = a = height *)
(* #2 = b = center *)
(* #3 = c = width *)
(* #4 = d = asymmetry *)
(* #5 = s = reflection *)

LPP = (#2/#5)(#5+1)^((#5+1)/#5)
    * (1+Exp[(#6(#1-#3)+#4*Log[#5])/#4])^(-((#5+1)/#5))Exp[(#6(#1-#3)+#4*Log[#5])/#4]&;

(* These are just labels for each of the five peaks being fitted. *)

pkIndx = {121101022, 121111022, 121121022, 121131022, 121141022, 121151022};

(* For each of the five peaks, set initial values for the parameters a, b, c, d, and s. *)
(* #1 = a = height *)
(* #2 = b = center *)
(* #3 = c = width *)
(* #4 = d = asymmetry *)
(* #5 = s = reflection *)

MapThread[(a[#1] = #2) &, {pkIndx, {4.26, 6.41, 3.68, 11.70, 3.87, 15.38}}];
MapThread[(b[#1] = #2) &, {pkIndx, {534.67, 559.62, 583.72, 607.08, 629.76, 651.83}}];
MapThread[(c[#1] = #2) &, {pkIndx, {5.02, 5.49, 5.96, 6.43, 6.91, 7.39}}];
MapThread[(d[#1] = #2) &, {pkIndx, {5.94, 6.16, 6.38, 6.59, 6.79, 6.98}}];
MapThread[(s[#1] = #2) &, {pkIndx, {1, 1, 1, 1, 1, 1}}];

(* Set lower constraints for each peak parameter. *)

MapThread[(daMins[#1] = #2) &, {pkIndx, {3, 5, 2, 5, 0, 5}}];
MapThread[(dbMins[#1] = #2) &, {pkIndx, {10, 10, 10, 10, 10, 10}}];
MapThread[(dcMins[#1] = #2) &, {pkIndx, {3, 3, 3, 3, 3, 3}}];
MapThread[(ddMins[#1] = #2) &, {pkIndx, {3, 3, 3, 3, 3, 3}}];

(* Set upper constrainsts for each peak parameter. *)

MapThread[(daPlus[#1] = #2) &, {pkIndx, {5, 5, 5, 5, 5, 5}}];
MapThread[(dbPlus[#1] = #2) &, {pkIndx, {10, 10, 10, 10, 10, 10}}];
MapThread[(dcPlus[#1] = #2) &, {pkIndx, {3, 3, 3, 3, 3, 3}}];
MapThread[(ddPlus[#1] = #2) &, {pkIndx, {3, 3, 3, 3, 3, 3}}];

(* In my larger program, this checks to see which peak parameters are being fitted. *)
(* Here we're fitting all but s, which always takes the value 1. *)

bool = {
(daPlus[#] != 0 || daMins[#] != 0),
(dbPlus[#] != 0 || dbMins[#] != 0),
(dcPlus[#] != 0 || dcMins[#] != 0),
(ddPlus[#] != 0 || ddMins[#] != 0)
} & /@ pkIndx;

aPos = Flatten[Position[bool[[All, 1]], True]];
bPos = Flatten[Position[bool[[All, 2]], True]];
cPos = Flatten[Position[bool[[All, 3]], True]];
dPos = Flatten[Position[bool[[All, 4]], True]];

(* Make a table of variables for NMinimize[]. *)

vars = Join[
a0[#] & /@ pkIndx[[aPos]],
b0[#] & /@ pkIndx[[bPos]],
c0[#] & /@ pkIndx[[cPos]],
d0[#] & /@ pkIndx[[dPos]]
];

(* Make a table of constraints for NMinimize[]. *)

cons = Join[
(a[#] - daMins[#] < a0[#] < a[#] + daPlus[#]) & /@ pkIndx[[aPos]],
(b[#] - dbMins[#] < b0[#] < b[#] + dbPlus[#]) & /@ pkIndx[[bPos]],
(c[#] - dcMins[#] < c0[#] < c[#] + dcPlus[#]) & /@ pkIndx[[cPos]],
(d[#] - ddMins[#] < d0[#] < d[#] + ddPlus[#]) & /@ pkIndx[[dPos]]
];

(*
Make a table of arguments to be fed to the LLP[] function in the Sum below.
Normally this table contains numerical values for the parameters that aren't being fitted    
and variables for the ones that are.
Here we're fitting everything except s so the table contains only variables.
*)

args = MapThread[
{
aDum -> #2[[1]] /. {True -> a0[#1], False -> a[#1]},
bDum -> #2[[2]] /. {True -> b0[#1], False -> b[#1]},
cDum -> #2[[3]] /. {True -> c0[#1], False -> c[#1]},
dDum -> #2[[4]] /. {True -> d0[#1], False -> d[#1]}
} &,
{pkIndx, bool}
];

(*
Create the sum of squared deviations for NMinimize[] to minimize.
The inner Sum[] sums over the 5 LLP[] functions at a given x, and each LLP[] gets its     
arguments from the "args" table.
The outer Sum[] sums the squared deviations over all the data points.
*)

ssd = Sum[
(
Sum[
LPP[pts[[i, 1]], aDum, bDum, cDum, dDum, s[pkIndx[[j]]]] /. args[[j]],
{j, Length[pkIndx]}
]
- pts[[i, 2]]
)^2,
{i, 1, Length[pts], 1}
];

(* Run the fit. *)

fitStep = 0;

fits = NMinimize[{ssd, And @@ cons}, vars,
AccuracyGoal -> 3,
PrecisionGoal -> Infinity,
Method -> {
"DifferentialEvolution", 
"ScalingFactor" -> 1.0, 
"CrossProbability" -> 0.0, 
"PostProcess" -> False
},
MaxIterations -> maxIter,
StepMonitor :> (fitStep++)
];

My observations:

with "PostProcess->False":
- MaxIterations = 300 ---> # iterations = 220
- MaxIterations = 500 ---> # iterations = 250
- MaxIterations = 1000 ---> # iterations = 500
- MaxIterations = 1200 ---> # iterations = 600
with "PostProcess->FindMinimum"
- MaxIterations = 300 ---> # iterations = 221
- MaxIterations = 500 ---> # iterations = 251
- MaxIterations = 1000 ---> # iterations = 501
with "PostProcess->True"
- MaxIterations = 300 ---> # iterations = 286
- MaxIterations = 500 ---> # iterations = 316
- MaxIterations = 1000 ---> # iterations = 566
- MaxIterations = 1500 ---> # iterations = 816

Lastly, I did find a comment in The Mathematica Guidebook for Numerics by Michael Trott claiming that (paraphrasing) in FindMinimum[], MaxIterations determines the effective number of evaluations in each search direction. I believe it was on page 371 but I'll have to double-check.

Can't you Sow them and find out? (I don't know a lot about how this works, so I could be wrong). In any case, please include a minimal example (i.e., definitions for ssd, cons, vars, etc.) — rm -rf, May 14 '13 at 22:03
I'm not sure how to go about Sow-ing the iteration number. As far as I know, only StepMonitor and EvaluationMonitor allow access to information about the progress of the optimization. In this case I think EvaluationMonitor doesn't apply (I may be wrong) and I'm already using StepMonitor to keep track of the "step". I guess I'm actually claiming that they are different because I set the MaxIteration and I'm seeing that the output of the StepMonitor exceeds MaxIteration. — Adam, May 14 '13 at 22:07
Also, not sure that the definitions of sad, cons, and vars matter for this question. They're not simple definitions, but in short ssd is the "sum of squared deviations" that I'm minimizing, cons is a set of "Anded" constraints, and vars is the set of variables that I'm fitting. — Adam, May 14 '13 at 22:09
^ I'll see what I can do about posting an example. I'm going to have to dissect my code a little (this excerpt is part of a much larger program). — Adam, May 14 '13 at 22:15
@Adam That's why I asked for a minimal example, not your actual problem. It could just be the simplest "Hello world" of NMinimize that is sufficient to illustrate your problem. Regardless of whether it matters or not, it makes life simple for everyone who might answer your question if they can simply copy your code and run it to test it. Most of us don't have the time or inclination to get creative and think up of a valid function to minimize, appropriate conditions, etc — all of this before we actually get to the heart of your problem. — rm -rf, May 14 '13 at 22:15
^ Understood. I just thought it might be as simple as the definition of "iteration" vs. "step". I'll put together a basic example. — Adam, May 14 '13 at 22:18
An indicator that they're different: f[x_?NumericQ] := Sin[.4 x] + .7 Cos[.6 x] x; {steps, eval} = First /@ Last@ Reap[NMinimize[{f@x, -10 < x < 10}, {x}, Method -> "DifferentialEvolution", StepMonitor :> Sow[{x, f@x}, "steps"], EvaluationMonitor :> Sow[{x, f@x}, "eval"]], {"steps", "eval"}]; Plot[f@x, {x, -10, 10}, Epilog -> {{Red, Point[eval]}, {Green, Point[steps]}}] Also look at the lengths of steps and eval. — rm -rf, May 14 '13 at 23:12
Regarding differential evolution itself: there is no conceptual distinction between a "step" and an "iteration" since this algorithm does not perform line searches or similar. As for what NMinimize thinks of as a step or an iteration, that's a different matter, especially for constrained optimizations where barrier parameters or penalty functions may need to be adjusted in between DE iterations to guide the process into the correct minimum. For more, see my answers here and here. — Oleksandr R., May 15 '13 at 00:30
I've figured it out, I think: "iterations" seems to be the sum of NMinimize steps and FindMinimum postprocessing steps. If you want to disable the postprocessing (not recommended, since FindMinimum converges very quickly), use Method -> {"DifferentialEvolution", "PostProcess" -> False}. Interestingly, "PostProcess" -> {FindMinimum, MaxIterations -> #} has no effect on the actual number of postprocessing iterations, while "PostProcess" -> {KKT, MaxIterations -> #} does. I guess FindMinimum is not so easily dissuaded from its mission. — Oleksandr R., May 15 '13 at 00:39
Can you confirm that you're using "ScalingFactor" -> 1.0, "CrossProbability" -> 0.0 deliberately so that the optimization never converges? This may influence how NMinimize behaves if after a certain number of iterations it continues to see no improvement. Anyway, I would repeat my previous suggestion to try this again with diagnostics switched on: this way, NMinimize will tell you explicitly what it is doing. If you specify MaxIterations much larger than needed for convergence, some may well be used to adjust barrier parameters, etc.; for small MaxIterations, the limit is obeyed. — Oleksandr R., May 17 '13 at 02:49
@Oleksandr: Yes, the settings are deliberate. I browsed publications, tutorials, etc. a little when I was selecting values for the ScalingFactor and CrossProbability and what I gathered (which may be totally wrong) is that low values of the CP lead to searches that are slow and robust, while larger values yield fewer improved solutions, but the improvements can be big. I chose to go with slow and robust. For the SF, 1 is supposed to allow greater mobility in the solution space.
Anyway, I'll take a look at the diagnostics and report back what I find; thanks very much for the suggestion! — Adam, May 17 '13 at 04:21
@Oleksandr: Also, if the optimization weren't converging with an accuracy of 3 wouldn't MMA generate a message? — Adam, May 17 '13 at 04:36
It isn't clear to me whether you already took this into consideration, but it's important to note that Mathematica's CrossProbability is reversed relative to the literature--traditionally 1 retains the mutant vector while 0 keeps the target, but here it's the other way around. That is, a value of 0 (Mathematica's 1) yields a one-dimensional hill climbing approach, assuming that Mathematica implements DE as described by Storn and Price rather than using this as a literal probability. (I did look at the code at one point, but I've now forgotten what it does.) ... — Oleksandr R., May 18 '13 at 06:40
... Good general settings are something like $F \approx 0.5-0.9$, $CR \approx 0.1, 0.9$, so your values are a bit extreme, but if it works for you, fine. As for the lack of message: NMinimize won't produce one if the global optimization doesn't converge but the later postprocessing does; both must fail for an error to be returned to the user. I suppose this is because global optimization algorithms like DE provide no useful error estimate, whereas local methods do, and thus the message is only really meaningful in the latter case. — Oleksandr R., May 18 '13 at 06:45
Oleksandr, thanks for those pointers. I had not noticed the difference in the CP definitions. So far I've been able to get my fits to converge using F = 1.0, CR = 0.0, but they take quite a long time (several minutes for ~5 peaks). I'll try adjusting the values a bit and see if it doesn't improve performance. Much appreciated. — Adam, May 20 '13 at 06:02

When using NMinimize[...,Method->"DifferentialEvolution"] is an "iteration" different than a "step"?

0 Answers0

Linked