According to the documentation, when FindMinimum is told to use the method "QuasiNewton" on a unconstrained problem, it uses the BFGS approximation to the Hessian matrix of the function being optimized. How can I recover the Hessian matrix computed by Mathematica using StepMonitor, EvaluationMonitor or any other devious trick?
- 124,525
- 11
- 401
- 574
- 1,285
- 10
- 17
2 Answers
Unfortunately, this requires a lot more devious trickery than I would have preferred. As noted in the documentation at tutorial/UnconstrainedOptimizationQuasiNewtonMethods, the Hessian is not formed directly in the BFGS method, so we have to recover it from the Cholesky factors. However, all of this is done inside the kernel where we cannot access it using ordinary methods, and the only way I can see to obtain these factors is to hook calls to the BLAS function *TRSV (called as LinearAlgebra`BLAS`TRSV in Mathematica) and extract its arguments directly. Needless to say, this is hardly likely to be particularly robust, but it hopefully it will work well enough for your needs.
Let us define:
approximateHessianList[f_, vars_, start_] :=
With[{fmarg2 = Thread[{vars, start}]},
Module[{steps, lowerTriangles, upperTriangles},
Internal`InheritedBlock[{LinearAlgebra`BLAS`TRSV},
Unprotect[LinearAlgebra`BLAS`TRSV];
trsv : LinearAlgebra`BLAS`TRSV[uplo_, trans_, diag_, a_, x_] /; (
Switch[
trans,
"N", Sow[LowerTriangularize[a], "LowerTriangle"],
"T", Sow[UpperTriangularize[a], "UpperTriangle"]
]; True
) := trsv;
Protect[LinearAlgebra`BLAS`TRSV];
{steps, lowerTriangles, upperTriangles} =
Reap[
FindMinimum[
f, fmarg2,
Method -> {"QuasiNewton", "StepMemory" -> Infinity},
StepMonitor :> Sow[vars, "Step"]
], {"Step", "LowerTriangle", "UpperTriangle"}
][[2, {1, 2, 3}, 1]];
Transpose[{steps, MapThread[Dot, {lowerTriangles, upperTriangles}]}]
]
]
];
SetAttributes[approximateHessianList, HoldAll];
We may now write:
approximateHessianList[Cos[x^2 - 3 y] + Sin[x^2 + y^2], {x, y}, {1, 1}]
which gives a list of steps in the BFGS optimization along with the approximate Hessians evaluated at those points. For the sake of brevity (and because it is obviously the most accurate), let us take only the last of these:
{{1.37638, 1.67868}, {{15.1553, 0.986982}, {0.986982, 20.2017}}}
which we may compare to the exact Hessian evaluated at this point:
D[Cos[x^2 - 3 y] + Sin[x^2 + y^2], {{x, y}, 2}] /. {
x -> 1.376384972443001`, y -> 1.6786760817546214`
}
{{15.1555, 0.983708}, {0.983708, 20.2718}}
Clearly, it is not a bad approximation.
Now, some caveats. Since the logic of the code inside the kernel is completely opaque, I am not sure whether this business with the upper and lower triangles is really necessary, since the argument a of LinearAlgebra`BLAS`TRSV appears to be the same for successive calls, firstly with trans == "N" and then with trans == "T". However, this represents at most an inefficiency. A more serious problem is that I am not sure whether the first Hessian obtained using this method corresponds to the initial point, $(x,y)=(1,1)$ , or the point at the end of the first step, which here is $(x,y)=(0.811216, 1.68144)$ . While this does not really matter if the approximation has converged, it does influence the interpretation of the earlier approximations substantially, so hopefully someone with access to the kernel code will be able to clarify this aspect of FindMinimum`QuasiNewton's behaviour.
- 23,023
- 4
- 87
- 125
Using Experimental`NumericalFunction framework directly (FindMinimum uses it under the hood) it is straightforward to get the numerical approximation of the Hessian:
f = Experimental`CreateNumericalFunction[{x, y},
Cos[x^2 - 3 y] + Sin[x^2 + y^2], {1}, Hessian -> FiniteDifference];
f["Hessian"[{1.376384972443001`, 1.6786760817546214`}]]
{{{15.1555, 0.983708}, {0.983708, 20.2718}}}
- 61,809
- 7
- 149
- 368
-
As far as Hessian is concerned, what are the other options apart from FiniteDifference? Thanks! – Chen Stats Yu Nov 29 '14 at 09:41
-
1As explained in the linked answer,
Hessiantakes three possible values:Automatic,SymbolicandFiniteDifferencewithAutomaticbeing equivalent toSymbolicAFAIK. – Alexey Popkov Nov 29 '14 at 09:54 -
@Alexey Popkov How is the scale chosen? Are finite differences adaptive? Is the Richardson extrapolation methodology used? – Valerio May 18 '18 at 19:42
-
@Valerio Since
Experimental`NumericalFunctionis undocumented, we can only guess and experiment. You can create a separate question on it. – Alexey Popkov May 19 '18 at 02:32 -
-
Alexey, what's the {1}, in the definition of the function for? – An old man in the sea. Dec 02 '18 at 17:58
-
@Anoldmaninthesea It is the dimensions of the output matrix produced by the expression, see here. – Alexey Popkov Dec 06 '18 at 22:49
Block(for example, if the definition is already inside anInternal`InheritedBlock). Here we pattern match the function call (with a named pattern), and take advantage of aConditioncontaining(code; True)so thatcodeexecutes (once only) each time the hooked function is called. The call proper is passed through the hook by means of the pattern name (heretrsv). – Oleksandr R. Mar 19 '12 at 23:21trsv: ... := trsvto attach the hook. – rcollyer Mar 20 '12 at 14:59