I'm interested in (locally) minimizing a smooth nonconvex objective function: $$ f(\textbf{x}_1, \textbf{y}_1,\cdots, \textbf{x}_n, \textbf{y}_n)=\sum_{i=1}^ng(\textbf{x}_i, \textbf{y}_i) $$ Subject to $\textbf{y}_{i+1}=\textbf{h}(\textbf{x}_i, \textbf{y}_i)$ for each $i$.
What are the pros and cons of using the penalty method vs Augemented Lagrangian for this problem?
Penalty method. Until convergence:
- Minimize wrt $\textbf{x}_i, \textbf{y}_i$ the objective $f+\sum_i\nu_i\|\textbf{y}_{i+1}-\textbf{h}(\textbf{x}_i, \textbf{y}_i)\|$ with gradient descent.
- Update $\nu_i\gets\nu_i+\alpha\|\textbf{y}_{i+1}-\textbf{h}(\textbf{x}_i, \textbf{y}_i)\|^2$
Augmented Lagrangian method. Until convergence:
- Minimize wrt $\textbf{x}_i, \textbf{y}_i$ the objective $f+\sum_i\boldsymbol\lambda_i\cdot(\textbf{y}_{i+1}-\textbf{h}(\textbf{x}_i, \textbf{y}_i))+\sum_i\nu_i\|\textbf{y}_{i+1}-\textbf{h}(\textbf{x}_i, \textbf{y}_i)\|^2$ with gradient descent.
- Update $\boldsymbol \lambda_i\gets \boldsymbol\lambda_i+\alpha(\textbf{y}_{i+1}-\textbf{h}(\textbf{x}_i, \textbf{y}_i))$ and $\nu_i\gets\nu_i+\alpha\|\textbf{y}_{i+1}-\textbf{h}(\textbf{x}_i, \textbf{y}_i)\|^2$