7

I have a one dimensional convex function $$f : [a,b] \to \mathbb{R}$$ and want to find the minimum value $$\min_{a \le x \le b} f(x)$$ I know all derivatives of $f$, so the problem could easily be solved with any 1D minimization method even ignoring the convexity. However, I would like to not ignore the convexity:

Question: How can I best take advantage of convexity to solve my 1D minimization?

For example, the values $f(a),f(b),f'(a),f'(b)$ define a triangular lower bound on the values of $f(x)$ on $[a,b]$, and the lowest vertex of this triangle is likely a good next guess.

Geoffrey Irving
  • 3,969
  • 18
  • 41
  • Actually, the lowest triangle vertex is a terrible guess. – Geoffrey Irving Mar 04 '14 at 00:58
  • Actually I think that depends on your function (for a piecewise linear function it is the exact solution). I think that it should work very well for almost every function, even using it as a procedure to refine between two points. – sebas Mar 04 '14 at 01:10
  • The lowest triangle vertex is $(a+b)/2$ if $f$ is quadratic. Ideally I would like second order convergence for smooth functions. – Geoffrey Irving Mar 04 '14 at 01:20
  • When you say that $f$ is quadratic, with this triangle vertex that you said, I assume that $f(x) = ((a+b)/2 - x)^2$, so for that function you can get the solution in one iteration. am I missing something? – sebas Mar 04 '14 at 01:27
  • I can imagine that the method will work fine for functions quite symmetrical, but the approximation will not be good when the derivatives at $a$ and $b$ are very different... I am just drawing random paraboles in my note book and trying... – sebas Mar 04 '14 at 01:29
  • 3
    How much more do you know about the function? For instance, strong convexity matters with first-order methods. For second-order methods a similar characteristic of the third derivative probably applies (e.g., self-concordance). – Michael Grant Mar 04 '14 at 03:55
  • @sebas: In order to understand why $(a+b)/2$ is a bad guess, it's important to consider parabolas which do not happen to have a minimum at $(a+b)/2$. – Geoffrey Irving Mar 04 '14 at 04:17
  • @MichaelC.Grant: I can usually assume strong convexity, but not self concordance: the motivating problem is finding the heightfield traced out by a convex mill bit. – Geoffrey Irving Mar 04 '14 at 04:25
  • Convexity is usually used to show that local minima are also global minima. Strong convexity has been used to show that first-order methods converge more quickly. I presume you've tried a Newton method? The modifications I'm aware of are typically features designed to overcome nonconvexity, and you don't need those; presumably, you'd want faster convergence, if possible? – Geoff Oxberry Mar 04 '14 at 05:45
  • Yes, I'm looking for faster convergence if possible, and ideally to avoid some of the complexity of general 1D minimizations. I have to solve billions of these problems, so fixed iteration count straight line code might be ideal if it can be accurate enough. – Geoffrey Irving Mar 04 '14 at 05:59
  • I think a simple gradient descent would do the job. Billions of them could be solved in parallel if you have certain resources. If you are too far from minima, I would suggest you to heuristically find some good initial guesses, depending on the nature of your problem and then descend to the minimum. – Tolga Birdal Mar 04 '14 at 09:37
  • 1
    Another thing we don't know is the computational complexity of computing the values and derivatives. My intuition here is that gradient descent or Newton will be as good as you can expect, with the choice depending on the cost of computing $f''$. If the billions of problems are closely related, then a warm start from a nearby solution will probably help. – Michael Grant Mar 04 '14 at 13:43
  • An example is $f(x) = -sqrt(a-\max(0,\sqrt{Q(x)}-b)^2)+L(x)$ where $Q(x)$ is a quadratic and $L(x)$ is a line. Thus, $f''$ is fairly cheap. Of course, by automatic differentiation $f''$ is at most a small constant factor slower than $f$ for any function. Michael: Would you want to upgrade your statements to an answer? This is a basic question, but also fundamental enough that I think even negative results are very good to know. – Geoffrey Irving Mar 04 '14 at 17:54
  • Sure, I don't care about the points, but if you think it will be helpful to have them front and center I will do that. – Michael Grant Mar 05 '14 at 02:45

2 Answers2

6

If you have derivatives available, no method can beat Newton's method in practice unless you use very specific features of your objective function. This is true whether you want to solve one or a billion problems: solving each one of them is most efficiently done using Newton's method since it is the only one that guarantees quadratic convergence, and this in turn typically leads to convergence to practical accuracies within less than 10 iterations, often significantly less.

Newton's method gets into a bit of trouble occasionally if your objective function is not convex, in which case you need to modify the Hessian appropriately. But, as you say, this is not important in your application.

Wolfgang Bangerth
  • 55,373
  • 59
  • 119
5

As requested I'm upgrading my comments to an answer.

To answer the original question it's necessary to understand how much you know about the function. How many derivatives can you readily compute? For instance, strong convexity matters with first-order methods. For second-order methods a similar characteristic of the third derivative probably applies (e.g., self-concordance).

For first-order methods, if you have strong convexity, then gradient search can do quite a good job. If you don't, then consider the so-called "accelerated first-order methods". Theoretically, these methods require Lipschitz continuity, but in practice, you can estimate and adapt the Lipschitz constant and do fine.

For second-order methods, you really can't beat Newton, unless you exploit specific knowledge of your function. That's a big "unless" though.

Another thing we don't know is the computational complexity of computing the values and derivatives. My intuition here is that gradient descent or Newton will be as good as you can expect, with the choice depending on the cost of computing f′′. Unless the second derivative is wickedly expensive, and in your case it sounds like it's not, then Wolfgang wins.

If the billions of problems are closely related, then a warm start from a nearby solution will probably help, especially if you can start within the region of quadratic convergence for Newton.

Michael Grant
  • 2,062
  • 11
  • 24