5

I have a standard set of linear equations $Ax=b$ where the Hessian matrix $A$ has the special block structure as shown:

$A= \begin{pmatrix} T & U\\ U^T & V \end{pmatrix}$,

$x= \begin{pmatrix} c\\ d \end{pmatrix}$

Where $A$ is square Hessian matrix and each dimension is of size $n = c_n + d_n$. This compound dimension is quite large, typically $10^6-10^{10}$. The gradient vector $b$ has the same block structure as $x$. This set of equations is solved for a full Newton step to an iterative procedure.

One can think of the matrices $T$ and $V$ as two different basis expansions of a given problem with the coupling between these two bases represented by $U$. This set of linear equations is usually solved through CG or an associated method as the matrix $A$ is too large to store.

What is interesting about this is that the "coupling" block $U$ is quite small. Often, the $c$ and $d$ blocks are solved alternatively without consideration of $U$; however, this can lead to somewhat slow convergence.

It would appear that this can be solved in a somewhat decoupled manner where were $Ax$ Hessian-vector guesses can be split as follows:

$Ax_1 = T*c + U*d$

$Ax_2 = U^T*c + V*d$

For example, $Ax_1$ could be updated on every third iteration of $Ax_2$. As the computation of $Ax_1$ is much more costly than $Ax_2$ this would be quite advantageous.

I assume that this kind of system is well described somewhere, but have been unable to google the correct combination of words. Does anyone know what this kind of system of equations is called?

Also, if anyone has a good recommendation for material on approximate Newton methods for ill-conditioned systems that would be quite helpful as well.


Full disclosure: I asked the same question here with a 100 point bounty without luck. I decided to try again here as this community has more experience with practical solutions.

Ophion
  • 153
  • 1
  • 11
  • This is a saddle point system and numerical methods for this kind of system have been well studied. Related: http://scicomp.stackexchange.com/questions/7288/which-preconditioners-and-solver-in-petsc-for-indefinite-symmetric-systems-sho/7291#7291. Chances are, you may need a krylov subspace method coupled with an approximate commutator preconditioner. – Paul Apr 06 '16 at 16:57
  • @Paul My understanding is that V would have to be 0 for this to be a saddle point problem? I am currently solving this with Krylov methods utilizing good preconditions, what I more curious in is how to decouple the problem since the U blocks are so small. – Ophion Apr 06 '16 at 17:49
  • What you're after is a block Jacobi preconditioner, and the resulting convergence rate (e.g. using PCG) can easily be bounded. Is the system symmetric? Edit: What do you mean when you say the off-diagonal blocks are small? Spectral norm? Frobenius norm? Low-rank? – Richard Zhang Apr 06 '16 at 17:59
  • @Paul: In order for this to be a saddle-point system, $V$ would have to be negative semidefinite. But OP is currently solving it using CG, which wouldn't necessarily work under that setting. – Richard Zhang Apr 06 '16 at 18:08
  • @RichardZhang The system, $T$, and $V$ blocks are symmetric. The $U$ blocks are small via Eucledian norm (or simply the relative magnitude of the elements are much smaller than other blocks). – Ophion Apr 06 '16 at 18:37
  • @Paul The system is positive definite, although the existence of negative eigenvalues is a problem. – Ophion Apr 06 '16 at 18:39
  • @Ophion -- your statement is self-contradictory: the system can not be both positive definite and have negative eigenvalues. – Wolfgang Bangerth Apr 07 '16 at 21:36
  • @WolfgangBangerth It is formally positive definite; however, a poor basis or starting guess can result in negative eigenvalues that are easily circumvented with level shifting. I probably should not have mentioned it to avoid confusion. – Ophion Apr 08 '16 at 14:54
  • @Ophion -- I still don't understand. A matrix either has negative eigenvalues, or it doesn't. If it doesn't, then it is positive definite, if it does have negative eigenvalues, it is not positive definite. It cannot be both. – Wolfgang Bangerth Apr 08 '16 at 18:55
  • @WolfgangBangerth I don't quite understand the Hessian ($A$) is not static, but depends on a given problem. Since the matrix depends on many factors its a bit simplistic to simply state that it is always positive definite, the reasons that it will not be are beyond the scope of this question. – Ophion Apr 08 '16 at 19:46
  • @Ophion -- ok, so you are saying that it is sometimes positive definite. That's not a useful property, in practice, though, since it can't be used in an actual algorithm. – Wolfgang Bangerth Apr 11 '16 at 02:21
  • @WolfgangBangerth Re: you can condition or shift out the negative eigenvalues. Unless there is a more efficient answer which does not assume positive definiteness (which I doubt) this whole argument is moot as this assumption works perfectly fine in actual algorithms. – Ophion Apr 11 '16 at 12:03

1 Answers1

2

Consider the alternating directions method $$ x_{k+1}=T^{-1}(c-Uy_{k}),\qquad y_{k+1}=V^{-1}(d-U^{T}x_{k}). $$ If $\|U\|$ is sufficiently small (in a rigorously defined sense described below) then the sequence $\{x_{k},y_{k}\}$ is guaranteed to converge to a solution.

But conjugate gradients will always converge faster. To see this, define the matrix $P$ as the version of $A$ in which the coupling is eliminated altogether, $$ P=\begin{bmatrix}T & 0\\ 0 & V \end{bmatrix}, $$ and note that the alternating directions method is really just the preconditioned iterations $$ Pz_{k+1}+(A-P)z_{k}=b\iff z_{k+1}=b-P^{-1}(A-P)z_{k}. $$ It is a well-known theorem that preconditioned conjugate gradient (PCG) with $P$ as the preconditioner converges as fast or faster than the corresponding fixed-point iterations. But of course this may not matter if the alternating directions method already convergences very quickly.

Theorem. Define $\alpha=\sigma_{\max}(U)$ and $\beta=\sqrt{\lambda_{\min}(T)\lambda_{\min}(V)}$. If $\alpha<\beta$, then the alternating directions iteration converges within $$ \frac{\beta}{\beta-\alpha}\log\epsilon^{-1}\text{ iterations,} $$ and PCG with $P$ as the preconditioner converges within $$ \sqrt{\frac{\beta}{2(\beta-\alpha)}}\log2\epsilon^{-1}\text{ iterations.} $$

Proof. After preconditioning, the matrix $P^{-1/2}AP^{-1/2}=I+G$ where $G=P^{-1/2}(A-P)P^{-1/2}$ is written explicitly as $$ G=\begin{bmatrix}0 & T^{-1/2}UV^{-1/2}\\ V^{-1/2}U^{T}T^{-1/2} & 0 \end{bmatrix}, $$ and the spectral radius for the iteration matrix is $$\begin{align*} \rho(G) & =\|T^{-1/2}UV^{-1/2}\|\\ & \le\|U\|\sqrt{\|T^{-1}\|\|V^{-1}\|} = \frac{\alpha}{\beta} \end{align*}$$ where we note that if $A$ is positive definite, then $\|A^{-1}\|^{-1}=\lambda_{\min}(A)$. If the spectral radius is strictily less than 1, then the iterated sequence is convergent. Finally, the condition number for $P^{-1/2}AP^{-1/2}$ is equal to $\kappa=(1+\rho(G))/(1-\rho(G))=(\beta+\alpha)/(\beta-\alpha)$. Gradient descent converges in $\frac{1}{2}(\kappa+1)\log\epsilon^{-1}$ iterations, and PCG gradients converges in $\frac{1}{2}\sqrt{\kappa+1}\log2\epsilon^{-1}$ iterations.


References

  • Shewchuk, Jonathan Richard. "An introduction to the conjugate gradient method without the agonizing pain." (1994).
  • Saad, Yousef. Iterative methods for sparse linear systems. Siam, 2003.
  • Greenbaum, Anne. Iterative methods for solving linear systems. Vol. 17. Siam, 1997.
  • Kelley, C.T. Iterative methods for linear and nonlinear equations. Siam, 1995.
Richard Zhang
  • 2,485
  • 15
  • 26
  • This is quite interesting that CG will always converge faster; however, practicalities (cost of $V$ >> $U$ and the number of iterations is typically 6-10) make a decoupled approach (if possible) much more cost effective. The alternating directions method may be the answer I am searching for. – Ophion Apr 08 '16 at 15:10
  • I should emphasize that alternating directions can diverge if $\alpha \ge \beta$, whereas CG will at worst be stagnant. Also, the Jacobi splitting is only one of several possible "alternating directions" methods, with prominent others being the Gauss-Seidel version and the SSOR versions. Regardless, I like Jacobi here because it's really simple to implement, and its convergence is guaranteed to be monotonous. Every iteration gets you a multiplicative factor of $\alpha / \beta$ closer to the true solution. – Richard Zhang Apr 08 '16 at 16:24
  • Combining the two is proving to be quite successful, PCG for several iterations until $||Up_k||$ (where $p_k$ is a CG conjugate vector) is below some threshold and then switching to the alternating directions method. Ill give this question a few more days, but I think this is what I was looking for. – Ophion Apr 08 '16 at 19:58