Nocedal and Wright on Conjugate Gradient Methods, p. 123, describe a
restart strategy ... whenever two consecutive gradients are far from orthogonal
$\qquad {{| \nabla f_k^T \ \nabla f_{k-1} |} \over { \|\nabla f_k\|^2 }} \ge \nu $, with $\nu$ typically 1/10.
Can anyone comment on CG with such restarts, or point to test cases on the web ?
Or is the "popular choice max $( 0, \beta^{PR} )$"
(Nonlinear_conjugate_gradient_method)
good enough, satisficing ?
(A good answer to bfgs-vs-conjugate-gradient-method says,
Anecdotal evidence points to restarting being a tricky issue,
as it is sometimes unnecessary and sometimes very necessary.
Well, that's generally true of a lot of things (taxes come to mind).
Test cases with plots of $\beta_k$ or $\theta_k$ might be interesting.)
A possibly silly test case that led to the question is CG on an ill-conditioned quadratic in 2d:
import numpy as np
from scipy.optimize import fmin_cg
n = 2
cond = 100
eigenvalues = np.linspace( 1./cond, 1, n )
xmin = 1000 * np.ones( n )
def fprime( x ):
return eigenvalues * (x - xmin)
def f( x ):
return (x - xmin) .dot( eigenvalues * (x - xmin)) / 2
x0 = np.zeros( n)
ret = fmin_cg( func, x0, fprime )
Added:
- for
nin [1,2,3,4,5] this takes 80 81 6 40 9 iterations, 721 722 28 94 27 function evaluations. (Is CG generally very sensitive to the linesearch ?) - The Mathematica conjugate gradient minimizer has a RestartThreshold with default 1/10. But, sorry, I don't speak Mathematica. Any native speaker care to try this ?