Nonlinear conjugate gradient restart threshold 1/10

Question

Nocedal and Wright on Conjugate Gradient Methods, p. 123, describe a

restart strategy ... whenever two consecutive gradients are far from orthogonal

$\qquad {{| \nabla f_k^T \ \nabla f_{k-1} |} \over { \|\nabla f_k\|^2 }} \ge \nu $, with $\nu$ typically 1/10.

Can anyone comment on CG with such restarts, or point to test cases on the web ?
Or is the "popular choice max $( 0, \beta^{PR} )$" (Nonlinear_conjugate_gradient_method) good enough, satisficing ?

(A good answer to bfgs-vs-conjugate-gradient-method says,

Anecdotal evidence points to restarting being a tricky issue,
as it is sometimes unnecessary and sometimes very necessary.

Well, that's generally true of a lot of things (taxes come to mind).
Test cases with plots of $\beta_k$ or $\theta_k$ might be interesting.)

A possibly silly test case that led to the question is CG on an ill-conditioned quadratic in 2d:

import numpy as np
from scipy.optimize import fmin_cg

n = 2
cond = 100
eigenvalues = np.linspace( 1./cond, 1, n )
xmin = 1000 * np.ones( n )

def fprime( x ):
    return eigenvalues * (x - xmin)
def f( x ):
    return (x - xmin) .dot( eigenvalues * (x - xmin)) / 2

x0 = np.zeros( n)

ret = fmin_cg( func, x0, fprime )

Added:

for n in [1,2,3,4,5] this takes 80 81 6 40 9 iterations, 721 722 28 94 27 function evaluations. (Is CG generally very sensitive to the linesearch ?)
The Mathematica conjugate gradient minimizer has a RestartThreshold with default 1/10. But, sorry, I don't speak Mathematica. Any native speaker care to try this ?

The many flavors of nonlinear CG and (L-)BFGS all reduce to the same thing under exact line search over a strongly convex quadratic objective (the example which you have given). The development of L-BFGS was spurred in part, by the desire to "robustify" the erratic behavior of NLCG, i.e. the issues of sensitivity to line search and the need for restarts. Many practitioners would advise you to avoid NLCG altogether and to adopt L-BFGS instead. — Richard Zhang, Apr 10 '16 at 13:51
Actually the loss of orthogonality is entirely an issue associated with CG type (more generally, Lanczos type) methods, because they all attempt to represent the approximate Hessian implicitly using recursion. The Broyden methods all explicitly carry a Hessian, but suffer from the Hessian blowing up, becoming ill-conditioned, and possibly losing positive definiteness. Of these methods, BFGS is the most resilient. — Richard Zhang, Apr 11 '16 at 21:00

Nonlinear conjugate gradient restart threshold 1/10

0 Answers0