FindRoot evaluates the exact same point multiple times. Why?

Question

This is a follow-up question to this question.

Let's say we want to minimize the following function f wrt. x and y. The approach is to minimize first wrt. to y and then wrt. x. If this approach makes sense is not part of the question.

f[x_, y_] := (x - 2)^2 + x ( y - 2)^2
dx[x_, y_] := Block[{xs}, D[f[xs, y], xs] /. xs -> x]
dy[x_, y_] := Block[{ys}, D[f[x, ys], ys] /. ys -> y]
fy[x_?NumericQ] := y /. FindRoot[dy[x, y] == 0, {y, 20}]
dxnoy[x_] := dx[x, fy[x]]

Now using FindRoot on dxnoy yields the correct result:

FindRoot[{dxnoy[x] == 0}, {{x, 5}}]
(* {x -> 2.} *)

However, the process seems to be inefficient. Using EvaluationMonitor in FindRoot reveals that the exact same points x are evaluated multiple times:

Last[Reap[FindRoot[{dxnoy[x] == 0}, {{x, 5}}, EvaluationMonitor :> Sow[x]]]]
(* {{5., 5., 5., 2., 2., 2.}} *)

Why is that? Or better: How can it be avoided? It seems to me that the performance could be a factor 3 better if it wouldn't evaluate the points x=5 and x=2 three times each.

Displaying the result with NumberForm[%, 15] yields {{{5.}, {5.}, {5.000000074505806}, {2.}, {2.}, {2.0000000298023224}}}, so the points x == 5. and x == 2. actually are computed only twice each. Still, this is an interesting question. — bbgodfrey, Nov 23 '16 at 18:10
Even if the values are not exactly the same it would still be a strange sequence. Normally, the approach to the root is a smooth decay function. Now it is a step function. It seems something is not as it should be. — Felix, Nov 23 '16 at 18:17

score 11 · Accepted Answer · answered Nov 23 '16 at 18:36

I speculate that the following is occurring. As I noted in a comment above, the actual values of x returned by EvaluationMonitor are

Last[Reap[FindRoot[{dxnoy[x] == 0}, {{x, 5}}, EvaluationMonitor :> Sow[{x}]]]]
NumberForm[%, 25]
(* {{{5.}, {5.}, {5.000000074505806}, {2.}, {2.}, {2.0000000298023224}}} *)

To use the "Newton" method, FindRoot must evaluate both the dxnoy[x] and its derivative at each point. Since FindRoot cannot see inside dxnoy[x], it must evaluate the derivative numerically. This would seem to account for the second evaluation at x == 5., followed by the evaluation at x ==5.000000074505806. It then does the same at x == 2. Convergence is rapid, because dxnoy[x] is quadratic. Evaluation of the derivative numerically at x == 2. can be avoided by

Last[Reap[FindRoot[{dxnoy[x] == 0}, {{x, 5}}, EvaluationMonitor :> Sow[{x}], 
    Method -> {"Newton", "UpdateJacobian" -> 2}]]]
NumberForm[%, 25]
(* {{{5.}, {5.}, {5.000000074505806}, {2.}}} *)

The second evaluation at x == 5. still seems unnecessary.

True. I guess specifying the Jacobian would result in eliminating the extra points at x=5. — Felix, Nov 23 '16 at 18:50

score 4 · Answer 2 · answered Nov 23 '16 at 18:58

As explained here, the Newton's Method, which is used by default when you have only one starting point, computes Jacobian at every step, which is the most time consuming part. You can tell Mathematica to freeze the Jacobian for few steps, thus improving its performance.

RepeatedTiming[Last[Reap[FindRoot[{dxnoy[x] == 0}, {{x, 5}}, EvaluationMonitor :> Sow[x]]]]]
(* {0.0012, {{5., 5., 5., 2., 2., 2.}}} *)

RepeatedTiming[Last[Reap[FindRoot[{dxnoy[x] == 0}, {{x, 5}}, EvaluationMonitor :> Sow[x], Method -> {"Newton", "UpdateJacobian" -> 2}]]]]
(* {0.00084, {{5., 5., 5., 2.}}} *)

This, however, works not really efficient when your guess is close to the actual minimum point.

RepeatedTiming[Last[Reap[FindRoot[{dxnoy[x] == 0}, {{x, 2.1}}, EvaluationMonitor :> Sow[x]]]]]
(* {0.0014, {{2.1, 2.1, 2.1, 2., 2., 2., 2.}}} *)

RepeatedTiming[Last[Reap[FindRoot[{dxnoy[x] == 0}, {{x, 2.1}}, EvaluationMonitor :> Sow[x], Method -> {"Newton", "UpdateJacobian" -> 2}]]]]
(* {0.0012, {{2.1, 2.1, 2.1, 2., 2., 2.}}} *)

As an alternative, you can force FindRoot to use the Secant method instead (which is the default for two starting points). However, the evaluation will fail, as FindRoot can't work with an identity equation, which is what dy[x, y] == 0 becomes when x is evaluated at 0 (which is what Secant method will try to do). In order to fix it, we need to modify the fy:

xi=0.1;
fy[x_?NumericQ] := If[dya[x, _] =!= 0, y /. FindRoot[dya[x, y] == 0, {y, 20}], y /. FindRoot[dya[x + xi, y] == 0, {y, 20}]]

RepeatedTiming[Last[Reap[FindRoot[{dxnoya[x] == 0}, {{x, 2.1}}, EvaluationMonitor :> Sow[x], Method -> {"Secant"}]]]]
(* {0.00087, {{2.1, 0., 2.1, 2.}}} *)

You might need to try several methods depending on your function and the initial guess.

score 2 · Answer 3 · answered Jan 11 '20 at 17:38

In this sort of problem in which $y=h(x)$ is defined implicitly by $g(x,y)=0$, we can compute its derivative $h'(x)$ in terms of $g(x,h(x))$:

fy /: fy' = (fy'[#] /. First@Solve[D[dy[#, fy[#]] == 0, #], fy'[#]] //
      Evaluate) &;

Here $h$ is fy and $g$ is dy. Now FindRoot can calculate the derivative (Jacobian) of dxnoy.

ClearAll[f, fy, dx, dy, dxnoy];
f[x_, y_] := (x - 2)^2 + x (y - 2)^2;
dx[x_, y_] := Block[{xs}, D[f[xs, y], xs] /. xs -> x];
dy[x_, y_] := Block[{ys}, D[f[x, ys], ys] /. ys -> y];
mem : fy[x_?NumericQ] := mem = y /. FindRoot[dy[x, y] == 0, {y, 20}];
fy /: fy' = (fy'[#] /. First@Solve[D[dy[#, fy[#]] == 0, #], fy'[#]] //
      Evaluate) &;
dxnoy[x_] := dx[x, fy[x]];

Last[Reap[
  FindRoot[{dxnoy[x] == 0}, {{x, 5}}, EvaluationMonitor :> Sow[x]]]]
(*  {{5., 2.}}  *)

You may notice I memoized (cached) values of fy. That's because dxnoy'[x] evaluates fy[x].

(*
  dxnoy'[x]
  2 + (2 (2 - fy[x]) (-2 + fy[x]))/x  
*)

So what you gain in fewer functions evaluations of dxnoy[x] by FindRoot, you would lose in the evaluate of dxnoy'[x] if the values of fy[x] were not saved. Since fy[x] calls FindRoot, it is a comparatively expensive function to evaluate and probably worth memoizing. Local memoization may be obtained with Internal`InheritedBlock:

Internal`InheritedBlock[{fy},
 Last[Reap[
   FindRoot[{dxnoy[x] == 0}, {{x, 5}}, EvaluationMonitor :> Sow[x]]]]
 ]

The saved definitions will be deleted and the memory freed once excution leaves the InheritedBlock.

FindRoot evaluates the exact same point multiple times. Why?

3 Answers3