What are the numerical properties to consider between Augmented Lagrangian and the Penalty Method?

Question

I'm interested in (locally) minimizing a smooth nonconvex objective function: $$ f(\textbf{x}_1, \textbf{y}_1,\cdots, \textbf{x}_n, \textbf{y}_n)=\sum_{i=1}^ng(\textbf{x}_i, \textbf{y}_i) $$ Subject to $\textbf{y}_{i+1}=\textbf{h}(\textbf{x}_i, \textbf{y}_i)$ for each $i$.

What are the pros and cons of using the penalty method vs Augemented Lagrangian for this problem?

Penalty method. Until convergence:

Minimize wrt $\textbf{x}_i, \textbf{y}_i$ the objective $f+\sum_i\nu_i\|\textbf{y}_{i+1}-\textbf{h}(\textbf{x}_i, \textbf{y}_i)\|$ with gradient descent.
Update $\nu_i\gets\nu_i+\alpha\|\textbf{y}_{i+1}-\textbf{h}(\textbf{x}_i, \textbf{y}_i)\|^2$

Augmented Lagrangian method. Until convergence:

Minimize wrt $\textbf{x}_i, \textbf{y}_i$ the objective $f+\sum_i\boldsymbol\lambda_i\cdot(\textbf{y}_{i+1}-\textbf{h}(\textbf{x}_i, \textbf{y}_i))+\sum_i\nu_i\|\textbf{y}_{i+1}-\textbf{h}(\textbf{x}_i, \textbf{y}_i)\|^2$ with gradient descent.
Update $\boldsymbol \lambda_i\gets \boldsymbol\lambda_i+\alpha(\textbf{y}_{i+1}-\textbf{h}(\textbf{x}_i, \textbf{y}_i))$ and $\nu_i\gets\nu_i+\alpha\|\textbf{y}_{i+1}-\textbf{h}(\textbf{x}_i, \textbf{y}_i)\|^2$

The augmented lagrangian method mitigates some of the ill-conditioning that often arises when using the penalty method. It does so by explicitly estimating the lagrange multipliers for the equality constraints, which means you don't have to raise the barrier penalty as high (in most cases) to attain acceptable feasibility. For details, I recommend the book by Nocedal and Wright. http://www.springer.com/us/book/9780387303031 — Tyler Olsen, Dec 04 '17 at 03:24
But we might imagine $\nu_i$ as the explicit Lagrange multipliers for an equivalent optimization problem constraining the norm of the differences to be 0, no? In any case, I will check out the recommended book. — VF1, Dec 04 '17 at 05:39
Yeah, I'd say just check out the book. It's a long-ish chapter, and it'll do a better job than I can (considering I learned everything I know about this topic from there). In addition, try to chase down some literature on the LANCELOT solver, which is referenced in the book. It's a production-quality augmented lagrangian solver and has sophisticated heuristics for when to update the lagrange multipliers and when to update the penalty parameter. — Tyler Olsen, Dec 04 '17 at 05:55
Well, for the sake of answering this question and for future users, your comment should probably be the answer (and, to clarify, that answer is that you should pretty much never use bare penalty method). — VF1, Dec 04 '17 at 20:50

What are the numerical properties to consider between Augmented Lagrangian and the Penalty Method?

0 Answers0