Obtain approximate Hessian using FindMinimum

Question

According to the documentation, when FindMinimum is told to use the method "QuasiNewton" on a unconstrained problem, it uses the BFGS approximation to the Hessian matrix of the function being optimized. How can I recover the Hessian matrix computed by Mathematica using StepMonitor, EvaluationMonitor or any other devious trick?

score 36 · Accepted Answer · answered Mar 19 '12 at 22:31

Unfortunately, this requires a lot more devious trickery than I would have preferred. As noted in the documentation at tutorial/UnconstrainedOptimizationQuasiNewtonMethods, the Hessian is not formed directly in the BFGS method, so we have to recover it from the Cholesky factors. However, all of this is done inside the kernel where we cannot access it using ordinary methods, and the only way I can see to obtain these factors is to hook calls to the BLAS function *TRSV (called as LinearAlgebra`BLAS`TRSV in Mathematica) and extract its arguments directly. Needless to say, this is hardly likely to be particularly robust, but it hopefully it will work well enough for your needs.

Let us define:

approximateHessianList[f_, vars_, start_] :=
  With[{fmarg2 = Thread[{vars, start}]},
   Module[{steps, lowerTriangles, upperTriangles},
    Internal`InheritedBlock[{LinearAlgebra`BLAS`TRSV},
     Unprotect[LinearAlgebra`BLAS`TRSV];
     trsv : LinearAlgebra`BLAS`TRSV[uplo_, trans_, diag_, a_, x_] /; (
       Switch[
        trans,
        "N", Sow[LowerTriangularize[a], "LowerTriangle"],
        "T", Sow[UpperTriangularize[a], "UpperTriangle"]
       ]; True
      ) := trsv;
     Protect[LinearAlgebra`BLAS`TRSV];
     {steps, lowerTriangles, upperTriangles} =
      Reap[
       FindMinimum[
        f, fmarg2,
        Method -> {"QuasiNewton", "StepMemory" -> Infinity},
        StepMonitor :> Sow[vars, "Step"]
       ], {"Step", "LowerTriangle", "UpperTriangle"}
      ][[2, {1, 2, 3}, 1]];
     Transpose[{steps, MapThread[Dot, {lowerTriangles, upperTriangles}]}]
    ]
   ]
  ];
SetAttributes[approximateHessianList, HoldAll];

We may now write:

approximateHessianList[Cos[x^2 - 3 y] + Sin[x^2 + y^2], {x, y}, {1, 1}]

which gives a list of steps in the BFGS optimization along with the approximate Hessians evaluated at those points. For the sake of brevity (and because it is obviously the most accurate), let us take only the last of these:

{{1.37638, 1.67868}, {{15.1553, 0.986982}, {0.986982, 20.2017}}}

which we may compare to the exact Hessian evaluated at this point:

D[Cos[x^2 - 3 y] + Sin[x^2 + y^2], {{x, y}, 2}] /. {
  x -> 1.376384972443001`, y -> 1.6786760817546214`
 }

{{15.1555, 0.983708}, {0.983708, 20.2718}}

Clearly, it is not a bad approximation.

Now, some caveats. Since the logic of the code inside the kernel is completely opaque, I am not sure whether this business with the upper and lower triangles is really necessary, since the argument a of LinearAlgebra`BLAS`TRSV appears to be the same for successive calls, firstly with trans == "N" and then with trans == "T". However, this represents at most an inefficiency. A more serious problem is that I am not sure whether the first Hessian obtained using this method corresponds to the initial point, $(x,y)=(1,1)$ , or the point at the end of the first step, which here is $(x,y)=(0.811216, 1.68144)$ . While this does not really matter if the approximation has converged, it does influence the interpretation of the earlier approximations substantially, so hopefully someone with access to the kernel code will be able to clarify this aspect of FindMinimum`QuasiNewton's behaviour.

I'd plus you 10,000 points if I could, Oleksandr. Can you elaborate briefly on the line which implements the hook? I haven't mastered all the ":" and "/;" tricks, and can't quite see what that line does. Brilliant work. — Ian, Mar 19 '12 at 22:55
This is closely related to the Villegas-Gayley trick, but a little more concise when you don't need another Block (for example, if the definition is already inside an Internal`InheritedBlock). Here we pattern match the function call (with a named pattern), and take advantage of a Condition containing (code; True) so that code executes (once only) each time the hooked function is called. The call proper is passed through the hook by means of the pattern name (here trsv). — Oleksandr R., Mar 19 '12 at 23:21
I have to agree with @R.M, the insanity for me was trsv: ... := trsv to attach the hook. — rcollyer, Mar 20 '12 at 14:59

score 12 · Answer 2 · edited Apr 13 '17 at 12:55

12

Using Experimental`NumericalFunction framework directly (FindMinimum uses it under the hood) it is straightforward to get the numerical approximation of the Hessian:

f = Experimental`CreateNumericalFunction[{x, y}, 
   Cos[x^2 - 3 y] + Sin[x^2 + y^2], {1}, Hessian -> FiniteDifference];

f["Hessian"[{1.376384972443001`, 1.6786760817546214`}]]

{{{15.1555, 0.983708}, {0.983708, 20.2718}}}

edited Apr 13 '17 at 12:55

Community

1

answered Jul 15 '14 at 06:04

Alexey Popkov

61,809
7
149
368

As far as Hessian is concerned, what are the other options apart from FiniteDifference? Thanks! – Chen Stats Yu Nov 29 '14 at 09:41
1

As explained in the linked answer, Hessian takes three possible values: Automatic, Symbolic and FiniteDifference with Automatic being equivalent to Symbolic AFAIK. – Alexey Popkov Nov 29 '14 at 09:54
@Alexey Popkov How is the scale chosen? Are finite differences adaptive? Is the Richardson extrapolation methodology used? – Valerio May 18 '18 at 19:42
@Valerio Since Experimental`NumericalFunction is undocumented, we can only guess and experiment. You can create a separate question on it. – Alexey Popkov May 19 '18 at 02:32
@Alexey Popkov I created a new question as you suggested. – Valerio May 19 '18 at 15:44
Alexey, what's the {1}, in the definition of the function for? – An old man in the sea. Dec 02 '18 at 17:58
@Anoldmaninthesea It is the dimensions of the output matrix produced by the expression, see here. – Alexey Popkov Dec 06 '18 at 22:49

Obtain approximate Hessian using FindMinimum

2 Answers2

Linked