Problem using the DifferentialEvolution method of NMinimize

Question

I have a function of 20 parameters, which 3 of the parameters are my physical parameters, and the others are pull terms to fix the errors. The goal is finding the global minimum of this function, to find the best fit of those three physical parameters which are minimizing my function. I am using NMinimize, with the DifferentialEvolution method; however, choosing different options of DifferentialEvolution changes my results drastically. I really don't know if I am using those options correctly or not, and I can't be sure if they are really giving my correct minimum. How can I choose these options?: "CrossProbability", "InitialPoints", "PenaltyFunction", "PostProcess", "RandomSeed", "ScalingFactor", "SearchPoints", "Tolerance".

My code is long, I can't make it shorter, as I have to make some tables and do some Interpolation before defining my final function; however, I put my code here if somebody may need it:

d11 = 667.9; d12 = 451.8; d13 = 304.8; d14 = 336.1; d15 = 513.9; d16= 739.1;
d21 = 1556.5; d22 = 1456.2; d23 = 1395.9; d24 = 1381.3; d25 = 1413.8; d26 = 1490.1;
f11 = 0.0678; f12 = 0.1493; f13 = 0.3419; f14 = 0.2701; f15 = 0.115; f16 = 0.0558;
f21 = 0.1373; f22 = 0.1574; f23 = 0.1809; f24 = 0.1856; f25 = 0.178;f26 = 0.1608;

rhodatar = {{1.70059, 1.38938}, {1.88047, 1.24779}, {2.13609, 
1.08850}, {2.39172, 0.93805}, {2.68521, 0.76991}, {2.97870, 
0.61947}, {3.42367, 0.45133}, {3.88757, 0.30973}, {4.28521, 
0.21239}, {4.68284, 0.14159}, {5.09941, 0.08850}, {5.55385, 
0.06195}, {5.88521, 0.03540}, {6.39645, 0.01770}, {6.99290, 
0.01770}, {7.68402, 0.01770}, {8.41302, 0.00885}, {9.25562, 
0.00885}, {9.89941, 0.00885}, {10.89941, 0.00885}, {12., 0.00885}};

rhor = Interpolation[rhodatar];

rhofinalr[x_] := rhor[x]/NIntegrate[rhor[x], {x, 1.8, 12}];

sterm2ofp11 =NIntegrate[Sin[1.267*2.32*10^-3*d11/enu]^2*rhofinalr[enu], {enu, 1.8, 12}];
f11sin1 =Table[{w,NIntegrate[Sin[1.267*w*d11/enu]^2*rhofinalr[enu], {enu, 1.8, 12}]},{w, 
Table[10^w, {w, -2.634512, -0.5, 0.01}]}];sterm3ofp11 = Interpolation[f11sin1];
f11sin2 =Table[{w,NIntegrate[Sin[1.267*(w - 2.32*10^-3)*d11/enu]^2*rhofinalr[enu], {enu,1.8,12}]}, {w, Table[10^w, {w, -2.634512, -0.5, 0.01}]}];
sterm4ofp11 = Interpolation[f11sin2];
p11n =.;
p11n[y_, z_, w_] :=f11*(1 - y*((1 + Sqrt[1 - z])/2)^2*sterm2ofp11 - ((1 + Sqrt[1 y])/2)*z*sterm3ofp11[w] - ((1 - Sqrt[1 - y])/2)*z*sterm4ofp11[w]);


sterm2ofp12 =NIntegrate[Sin[1.267*2.32*10^-3*d12/enu]^2*rhofinalr[enu], {enu, 1.8, 12}];
f12sin1 = Table[{w,NIntegrate[Sin[1.267*w*d12/enu]^2*rhofinalr[enu], {enu, 1.8, 12}]}{w, Table[10^w, {w, -2.634512, -0.5, 0.01}]}];
sterm3ofp12 = Interpolation[f12sin1];
f12sin2 = Table[{w,NIntegrate[Sin[1.267*(w - 2.32*10^-3)*d12/enu]^2*rhofinalr[enu],{enu,1.8,12}]}, {w, Table[10^w, {w, -2.634512, -0.5, 0.01}]}];
sterm4ofp12 = Interpolation[f12sin2];
p12n =.;
p12n[y_, z_, w_] :=f12*(1 - y*((1 + Sqrt[1 - z])/2)^2*sterm2ofp12 - ((1 + Sqrt[1 - y])/2)*z*sterm3ofp12[w] - ((1 - Sqrt[1 - y])/2)*z*sterm4ofp12[w]);


sterm2ofp13 =NIntegrate[Sin[1.267*2.32*10^-3*d13/enu]^2*rhofinalr[enu], {enu, 1.8, 12}];
f13sin1=Table[{w,NIntegrate[Sin[1.267*w*d13/enu]^2*rhofinalr[enu], {enu, 1.8, 12}]}, {w,Table[10^w, {w, -2.634512, -0.5, 0.01}]}];
sterm3ofp13 = Interpolation[f13sin1];
f13sin2=Table[{w,NIntegrate[Sin[1.267*(w - 2.32*10^-3)*d13/enu]^2*rhofinalr[enu], {enu,1.8,12}]}, {w, Table[10^w, {w, -2.634512, -0.5, 0.01}]}];
sterm4ofp13 = Interpolation[f13sin2];
p13n =.;
p13n[y_, z_, w_] :=f13*(1 - y*((1 + Sqrt[1 - z])/2)^2*sterm2ofp13 - ((1 + Sqrt[1-y])/2)*z*sterm3ofp13[w] - ((1 - Sqrt[1 - y])/2)*z*sterm4ofp13[w]);


sterm2ofp14=NIntegrate[Sin[1.267*2.32*10^-3*d14/enu]^2*rhofinalr[enu], {enu, 1.8, 12}];
f14sin1=Table[{w,NIntegrate[Sin[1.267*w*d14/enu]^2*rhofinalr[enu], {enu, 1.8, 12}]}, {w,Table[10^w, {w, -2.634512, -0.5, 0.01}]}];
sterm3ofp14 = Interpolation[f14sin1];
f14sin2=Table[{w,NIntegrate[Sin[1.267*(w - 2.32*10^-3)*d14/enu]^2*rhofinalr[enu], {enu, 1.8,12}]}, {w, Table[10^w, {w, -2.634512, -0.5, 0.01}]}];
sterm4ofp14 = Interpolation[f14sin2];
p14n =.;
p14n[y_, z_, w_]:=f14*(1 - y*((1 + Sqrt[1 - z])/2)^2*sterm2ofp14 - ((1 + Sqrt[1 -y])/2)*z*sterm3ofp14[w] - ((1 - Sqrt[1 - y])/2)*z*sterm4ofp14[w]);


sterm2ofp15=NIntegrate[Sin[1.267*2.32*10^-3*d15/enu]^2*rhofinalr[enu], {enu, 1.8, 12}];
f15sin1=Table[{w,NIntegrate[Sin[1.267*w*d15/enu]^2*rhofinalr[enu], {enu, 1.8, 12}]}, {w, Table[10^w, {w, -2.634512, -0.5, 0.01}]}];
sterm3ofp15 = Interpolation[f15sin1];
f15sin2=Table[{w,NIntegrate[Sin[1.267*(w - 2.32*10^-3)*d15/enu]^2*rhofinalr[enu], {enu,1.8,12}]}, {w, Table[10^w, {w, -2.634512, -0.5, 0.01}]}]; 
sterm4ofp15 = Interpolation[f15sin2];
p15n =.;
p15n[y_, z_, w_] :=f15*(1 - y*((1 + Sqrt[1 - z])/2)^2*sterm2ofp15 - ((1 + Sqrt[1 - y])/2)*z*sterm3ofp15[w] - ((1 - Sqrt[y])/2)*z*sterm4ofp15[w]);


sterm2ofp16=NIntegrate[Sin[1.267*2.32*10^-3*d16/enu]^2*rhofinalr[enu], {enu, 1.8, 12}];
f16sin1=Table[{w,NIntegrate[Sin[1.267*w*d16/enu]^2*rhofinalr[enu], {enu, 1.8, 12}]}{w,Table[10^w, {w, -2.634512, -0.5, 0.01}]}];
sterm3ofp16 = Interpolation[f16sin1];
f16sin2 =Table[{w,NIntegrate[Sin[1.267*(w - 2.32*10^-3)*d16/enu]^2*rhofinalr[enu], {enu,1.8,12}]}, {w, Table[10^w, {w, -2.634512, -0.5, 0.01}]}]; 
sterm4ofp16 = Interpolation[f16sin2];
p16n =.;
p16n[y_, z_, w_] :=f16*(1 - y*((1 + Sqrt[1 - z])/2)^2*sterm2ofp16 - ((1 + Sqrt[1 -y])/2)*z*sterm3ofp16[w] - ((1 - Sqrt[1 - y])/2)*z*sterm4ofp16[w]);


sterm2ofp21=NIntegrate[Sin[1.267*2.32*10^-3*d21/enu]^2*rhofinalr[enu], {enu, 1.8, 12}];
f21sin1=Table[{w,NIntegrate[Sin[1.267*w*d21/enu]^2*rhofinalr[enu], {enu, 1.8, 12}]}, {w,Table[10^w, {w, -2.634512, -0.5, 0.01}]}]  
sterm3ofp21 = Interpolation[f21sin1];
f21sin2 =Table[{w,NIntegrate[Sin[1.267*(w - 2.32*10^-3)*d21/enu]^2*rhofinalr[enu], {enu,1.8,12}]}, {w, Table[10^w, {w, -2.634512, -0.5, 0.01}]}]; 
sterm4ofp21 = Interpolation[f21sin2];
p21n =.;
p21n[y_, z_, w_] :=f21*(1 - y*((1 + Sqrt[1 - z])/2)^2*sterm2ofp21 - ((1 + Sqrt[1 -y])/2)*z*sterm3ofp21[w] - ((1 - Sqrt[1 - y])/2)*z*sterm4ofp21[w]);

sterm2ofp22=NIntegrate[Sin[1.267*2.32*10^-3*d22/enu]^2*rhofinalr[enu], {enu, 1.8, 12}];
f22sin1=Table[{w,NIntegrate[Sin[1.267*w*d22/enu]^2*rhofinalr[enu], {enu, 1.8, 12}]}, {w,Table[10^w, {w, -2.634512, -0.5, 0.01}]}]  
sterm3ofp22 = Interpolation[f22sin1];
f22sin2 =Table[{w,NIntegrate[Sin[1.267*(w - 2.32*10^-3)*d22/enu]^2*rhofinalr[enu], {enu,1.8,12}]}, {w, Table[10^w, {w, -2.634512, -0.5, 0.01}]}]; 
sterm4ofp22 = Interpolation[f22sin2];
p22n =.;
p22n[y_, z_, w_] :=f22*(1 - y*((1 + Sqrt[1 - z])/2)^2*sterm2ofp22 - ((1 + Sqrt[1 -y])/2)*z*sterm3ofp22[w] - ((1 - Sqrt[1 - y])/2)*z*sterm4ofp22[w]);

sterm2ofp23=NIntegrate[Sin[1.267*2.32*10^-3*d23/enu]^2*rhofinalr[enu], {enu, 1.8, 12}];
f23sin1=Table[{w,NIntegrate[Sin[1.267*w*d23/enu]^2*rhofinalr[enu], {enu, 1.8, 12}]}, {w,Table[10^w, {w, -2.634512, -0.5, 0.01}]}]  
sterm3ofp23 = Interpolation[f23sin1];
f23sin2 =Table[{w,NIntegrate[Sin[1.267*(w - 2.32*10^-3)*d23/enu]^2*rhofinalr[enu], {enu,1.8,12}]}, {w, Table[10^w, {w, -2.634512, -0.5, 0.01}]}]; 
sterm4ofp23 = Interpolation[f23sin2];
p23n =.;
p23n[y_, z_, w_] :=f23*(1 - y*((1 + Sqrt[1 - z])/2)^2*sterm2ofp23 - ((1 + Sqrt[1 -y])/2)*z*sterm3ofp23[w] - ((1 - Sqrt[1 - y])/2)*z*sterm4ofp23[w]);


sterm2ofp24=NIntegrate[Sin[1.267*2.32*10^-3*d24/enu]^2*rhofinalr[enu], {enu, 1.8, 12}];
f24sin1=Table[{w,NIntegrate[Sin[1.267*w*d24/enu]^2*rhofinalr[enu], {enu, 1.8, 12}]}, {w,Table[10^w, {w, -2.634512, -0.5, 0.01}]}]  
sterm3ofp24 = Interpolation[f24sin1];
f24sin2 =Table[{w,NIntegrate[Sin[1.267*(w - 2.32*10^-3)*d24/enu]^2*rhofinalr[enu], {enu,1.8,12}]}, {w, Table[10^w, {w, -2.634512, -0.5, 0.01}]}]; 
sterm4ofp24 = Interpolation[f24sin2];
p24n =.;
p24n[y_, z_, w_] :=f24*(1 - y*((1 + Sqrt[1 - z])/2)^2*sterm2ofp24 - ((1 + Sqrt[1 -y])/2)*z*sterm3ofp24[w] - ((1 - Sqrt[1 - y])/2)*z*sterm4ofp24[w]);


sterm2ofp25=NIntegrate[Sin[1.267*2.32*10^-3*d25/enu]^2*rhofinalr[enu], {enu, 1.8, 12}];
f25sin1=Table[{w,NIntegrate[Sin[1.267*w*d25/enu]^2*rhofinalr[enu], {enu, 1.8, 12}]}, {w,Table[10^w, {w, -2.634512, -0.5, 0.01}]}]  
sterm3ofp25 = Interpolation[f21sin1];
f25sin2 =Table[{w,NIntegrate[Sin[1.267*(w - 2.32*10^-3)*d25/enu]^2*rhofinalr[enu], {enu,1.8,12}]}, {w, Table[10^w, {w, -2.634512, -0.5, 0.01}]}]; 
sterm4ofp25 = Interpolation[f25sin2];
p25n =.;
p25n[y_, z_, w_] :=f25*(1 - y*((1 + Sqrt[1 - z])/2)^2*sterm2ofp25 - ((1 + Sqrt[1 -y])/2)*z*sterm3ofp25[w] - ((1 - Sqrt[1 - y])/2)*z*sterm4ofp25[w]);


sterm2ofp26=NIntegrate[Sin[1.267*2.32*10^-3*d26/enu]^2*rhofinalr[enu], {enu, 1.8, 12}];
f26sin1=Table[{w,NIntegrate[Sin[1.267*w*d26/enu]^2*rhofinalr[enu], {enu, 1.8, 12}]}, {w,Table[10^w, {w, -2.634512, -0.5, 0.01}]}]  
sterm3ofp26 = Interpolation[f26sin1];
f26sin2 =Table[{w,NIntegrate[Sin[1.267*(w - 2.32*10^-3)*d26/enu]^2*rhofinalr[enu], {enu,1.8,12}]}, {w, Table[10^w, {w, -2.634512, -0.5, 0.01}]}]; 
sterm4ofp26 = Interpolation[f26sin2];
p26n =.;
p26n[y_, z_, w_] :=f26*(1 - y*((1 + Sqrt[1 - z])/2)^2*sterm2ofp26 - ((1 + Sqrt[1 -y])/2)*z*sterm3ofp26[w] - ((1 - Sqrt[1 - y])/2)*z*sterm4ofp26[w]);

normfactorfar = 17566.8; normfactornear = 151725.5;

obsnear = 149904.8; obsfar = 16161.5;

chi2reno[y_, z_, w_, a_, xinear_, fnear1_, fnear2_, fnear3_, fnear4_,fnear5_, fnear6_,xifar_, ffar1_, ffar2_, ffar3_, ffar4_, ffar5_, ffar6_, bnear_,bfar_] :=
(obsnear + bnear -normfactornear*(1 + a +xinear)*((1 + fnear1)*(p11n[y, z, w]) + 
(1 + fnear2)*(p12n[y, z, w]) + (1 + fnear3)*(p13n[y, z, w]) + 
(1 +fnear4)*(p14n[y, z, w]) + (1 + fnear5)*(p15n[y, z,w]) + 
(1 + fnear6)*(p16n[y, z, w])))^2/obsnear + 
(fnear1^2 + fnear2^2 + fnear3^2 + fnear4^2 +fnear5^2 +fnear6^2)/(0.009)^2 +xinear^2/(0.002)^2 +bnear^2/(1140.93)^2 + 
(obsfar + bfar-normfactorfar*(1 + a +xifar)*((1 + ffar1)*(p21n[y, z, w]) + 
(1 + ffar2)*(p22n[y,z, w]) +(1 + ffar3)*(p23n[y, z,w]) + 
(1 + ffar4)*(p24n[y, z, w]) + (1 + ffar5)*(p25n[y, z, w]) + 
(1 +ffar6)*(p26n[y, z, w])))^2/obsfar + 
(ffar1^2 + ffar2^2 + ffar3^2 + ffar4^2 + ffar5^2 + ffar6^2)/(0.009)^2 + xifar^2/(0.002)^2 + bfar^2/(166.545)^2;

renovars = {y, z, w, a, xinear, fnear1, fnear2, fnear3, fnear4,fnear5, fnear6, xifar,ffar1, ffar2, ffar3, ffar4, ffar5, ffar6,bnear, bfar};

renobounds = {0. <= y <= 1, 0 <= z <= 1, 0.00232 <= w <= 0.1,bnear >= 0, bfar >= 0};

Changing the values of these options, give me very different results:

Do[Print[NMinimize[{chi2reno[y, z, w, a, xinear, fnear1, fnear2, 
fnear3, fnear4, fnear5, fnear6, xifar, ffar1, ffar2, ffar3, 
ffar4, ffar5, ffar6, bnear, bfar], renobounds}, renovars, 
Method -> {"DifferentialEvolution", "SearchPoints" -> Automatic, 
"ScalingFactor" -> 0.9, "CrossProbability" -> 0.1, 
"PostProcess" -> {FindMinimum, Method -> "QuasiNewton"},
"RandomSeed" -> i}]], {i, 10}]

@OleksandrR, I used the methods that you gave me the link, however, I have this problem that I have asked in the zbove question. Do you know how can I solve it? — tenure track job seeker, Mar 14 '13 at 06:03
Hello and welcome to the site. First of all, you have commas missing in your definitions of f12sin1 and f16sin1. Global minimization is a difficult task, and differential evolution, being a heuristic, can't give you strong performance guarantees. Appropriate values of the scaling factor $F$ and the crossover probability $C$ are strongly problem-dependent, and choosing these is arguably beyond Mathematica's scope--you should refer to the literature on differential evolution. One way to tune the parameters is to choose an easier model problem that shares some of the characteristics ... — Oleksandr R., Mar 14 '13 at 06:15
... of the real one, and numerically minimize the result of the minimization of this function with respect to the parameter values. I have done this with "SearchPoints" -> 60 for the (20-dimensional) Rastrigin's function ("ScalingFactor" -> 0.2, "CrossProbability" -> 0.6) and the (20-dimensional) Rosenbrock's function ("ScalingFactor" -> 0.6, "CrossProbability" -> 0.1). Whether these results can be of use to you I don't know. (Incidentally, I didn't use Mathematica for this meta-optimization.) — Oleksandr R., Mar 14 '13 at 06:19
Sorry, I didn't recognise your username from our previous discussion, or otherwise I wouldn't have welcomed you to the site again. :) Perhaps you'd like to choose a more memorable name? As another member once commented, this gives the site less of the feel of a prison and more of a happy community! — Oleksandr R., Mar 14 '13 at 06:23
Dear OleksandrR, thanks a lot for your answer. I chose DifferentialEvolution because I thought it is a better way for finding global minimum, the other methods don't seem to work well. I am dealling with real experimental data in my problem, and to be able to compare my results with the ones given by them, I have to use exactly the same fuction they have defined... — tenure track job seeker, Mar 14 '13 at 06:28
... However, what is your suggestion for solving my problem? What do you use for meta-optimization? Some peaople told me as my function is somehow difficult and full of parameters, it is better to use mathematica. I'm sorry if my questions are too basic, I am very new to data analysis and any kind of code writing :). — tenure track job seeker, Mar 14 '13 at 06:30
I agree that differential evolution is the best available method for this minimization but still the results may not be especially good. Maybe you have misunderstood my point about the meta-optimization--the substitute function is used only for choosing the values of the tuning parameters. You can use your real function if you know what the true minimum is, but usually you don't. FWIW your function is "more like" Rastrigin's function than Rosenbrock's--using the tuned parameters "SearchPoints" -> 60, "ScalingFactor" -> 0.2, "CrossProbability" -> 0.6 leads to somewhat more consistent results. — Oleksandr R., Mar 14 '13 at 06:33
@OleksandrR., thanks a lot again. I have just another question; I don't understand what does RandomSeed do exactly. I know it starts value for random number generator, but I'm not sure I exactly know what it means... — tenure track job seeker, Mar 14 '13 at 06:38
One can also try "SearchPoints" -> 80, "ScalingFactor" -> 0.2, "CrossProbability" -> 0.5. With these settings the result is fairly consistent albeit the true global minimum is still not being found. As for the random seed: differential evolution uses random starting values, which are perturbed in an attempt to minimize the function. A different seed will give you a different sequence of random numbers. If the minimizer is working correctly then each attempt should produce similar (ideally identical) results, as I think you suspected. — Oleksandr R., Mar 14 '13 at 12:11
Good results are also available using "SearchPoints" -> 80, "ScalingFactor" -> 0.55, "CrossProbability" -> 0.05, but the minima found here are of a somewhat different character. (These are the optimized parameters for minimizing the Rosenbrock's function.) If you can think of a better model function than these two I would be pleased to run the meta-optimization for you... although both functions I tried are considered fairly hard minimization problems, it seems your function may not be that similar to either of them. Because your definition is so complicated I have no intuition here. — Oleksandr R., Mar 14 '13 at 12:35
@OleksandrR.: Thanks a lot for all your help. I tried all these different options, and none of them worked! Then I talked to a professor who knows this function and has worked with it before, it seems that the priblem is with the function. It is expected to behave this badly, and it means my problem doesn't have a good answer! Thanks a lot by the way, I learnt many things from you. — tenure track job seeker, Mar 14 '13 at 17:41
I'm glad that my comments were useful to you. From the perspective of the site, though, I think your questions ought to be closed as "too localized", because your difficulties stem from the particular problem rather than something inherent to Mathematica. I want to emphasize that there isn't any shame in having a question closed; it's just that it's considered bad form to have open and unanswered questions, which obviously causes difficulties when, as in this case, a question is intrinsically not answerable. Hope to see more questions from you in future, though! — Oleksandr R., Mar 16 '13 at 21:55

Daniel Lichtblau · Answer 1 · 2013-03-16T22:26:36.707

5

[Not really a solid answer but more code than I want in a comment.]

I think your function is just very difficult to work with. It may have a very flat landscape but I suspect, more likely, it jumpt quite a bit and is numerically sensitive to any number of things. I notice that Interpolation was used to construct it and that can give awkward wiggles. Possibly using Method->"Spline" might help if that is causing trouble.

I do a couple of things that seem to help slightly. One is to use large values for iterations and generation sizes. Another is to pull out all the stops on postprocessing.

This run gives results that tend to be consistently in the range of 0.1-0.3. I believe I have seen results much closer to zero so this might not indicate best possible, but at least they are not giving results like 1000.

Timing[Do[
  Print[Timing[
    NMinimize[{chi2reno[y, z, w, a, xinear, fnear1, fnear2, fnear3, 
       fnear4, fnear5, fnear6, xifar, ffar1, ffar2, ffar3, ffar4, 
       ffar5, ffar6, bnear, bfar], renobounds}, renovars, 
     MaxIterations -> 2000, 
     Method -> {"DifferentialEvolution", 
       "SearchPoints" -> 200,
       "PostProcess" -> {FindMinimum, 
         Method -> {"QuasiNewton", "InteriorPoint", "KKT"}}, 
       "RandomSeed" -> i}]]], {i, 10}]]

Certain variables tend to be near zero in most or all cases. Others vary by a fair amount. This might just be a very sensitive function.

--- edit ---

As a general remark, this example seems to show pernicious effects from the bnear and bfar variables, as results have them varying wildly whereas all others, best I can tell, seem to stay in teh same region from one result to another.

--- end edit ---

edited Mar 16 '13 at 22:26

answered Mar 16 '13 at 21:10

Daniel Lichtblau

58,970
2
101
199

+1; I agree with your assessment although the question should perhaps be closed as too localized or not answerable (through no fault of the OP; this is a general problem). Some comments follow. A large population is seldom useful for differential evolution except in the case of high dimensional functions (contrary to the documentation and the early literature on the method, $NP \approx d$ is better than $NP \gg d$ except where $d$ is so low that this would lead to an impractically small number of search points), and for 20-dimensional functions I find that $NP \approx 40-60$ ... – Oleksandr R. Mar 16 '13 at 21:42
... is sufficient; 80 is ample and 200 is certainly too large. The main problem is tuning $F$ and $CR$. "SearchPoints" -> 80, "ScalingFactor" -> 0.55, "CrossProbability" -> 0.05 works quite well for this problem, though these values are meta-optimized based on the Rosenbrock's function. The function value with these settings usually turns out around $10^{-10}$. I wonder though, could you elaborate on what it means to specify more than one Method option for FindMinimum here? In particular, I'm unsure what relationship "QuasiNewton", "InteriorPoint", and "KKT" have to each other? – Oleksandr R. Mar 16 '13 at 21:45
@Oleksandr R. If I recall correctly, it means the postprocessing step will try all three to home in on a best result. – Daniel Lichtblau Mar 16 '13 at 21:56
@Oleksandr R. Re "SearchPoints" settings: my experience has been that if one is going to do many iterations, fairly high settings for this parameter help to ward off premature convergence. – Daniel Lichtblau Mar 16 '13 at 22:18
@Oleksandr R. Re "CrossProbability" and, to a lesser extent, "ScalingFactor" settings: I rarely have consistent luck with these except when some or all variables are integer valued. Your settings in this case do seem to give an improvement. That said, i will point out that it is inconsistent, and I have seen some amount of fluctuation even with those settings (for example, where using twice as many search points gives a significantly worse claimed optimum). – Daniel Lichtblau Mar 16 '13 at 22:24
Thanks for your further comments. $F$, $CR$, and $NP$ are all codependent and should not be adjusted individually. These particular settings probably aren't optimal for this problem, but there do seem to be specific values that work best for any given case. The problem is that finding them is really only feasible with meta-optimization, and for that you have to think of a model problem that shares salient features with your real one--and the Rosenbrock's function almost certainly is not that good of a model, otherwise the results would be more consistent than they are. – Oleksandr R. Mar 16 '13 at 22:31
Re: premature convergence, sometimes I have the feeling that this is actually the main problem NMinimize encounters and confess to not being a great fan of that particular convergence criterion in any case. Here, for instance, I found that NMinimize can't correctly find the minimum of the 50-dimensional Rosenbrock's function, which seems kind of strange, whereas my own implementation with a different convergence criterion (which I chose having in mind specifically population-based metaheuristics) has relatively little trouble. – Oleksandr R. Mar 16 '13 at 22:40
You can tune the parameter settings on the function itself by using a small number of iterations. (This may not work well if there are nonlinear constraints, but that's not the case here.) SearchPoints settings can typically also be set low for this purpose although that can mess with the ScalingFactor behavior. – Daniel Lichtblau Mar 16 '13 at 22:43
1

@Oleksandr R. I noticed a red up-arrow which means I was impressed the first time. I agree you almost certainly have this (mis)convergence problem figured out better. I forwarded the link to someone here who will start taking on some of our optimization code. – Daniel Lichtblau Mar 16 '13 at 22:47
The big problem with optimizing the tuning parameters with few iterations on an unknown problem is that by doing this one typically tends to optimize for a rapid decrease in the function value, which in many cases will lead to misconvergence unless the function is particularly well-behaved. (In the case of the Rosenbrock's function, for example, one would be likely to end up somewhere in the valley yet not at the relatively shallow true minimum.) – Oleksandr R. Mar 17 '13 at 04:01
@Oleksandr R. That might be why I sometimes need large search point sets, to counteract that effect of too-early convergence from parameters thusly optimized (but that does seem to provide a counteraction). – Daniel Lichtblau Mar 17 '13 at 04:10

Problem using the DifferentialEvolution method of NMinimize

1 Answers1