How to understand QR decomposition? Compare the power method, QR decomposition for finding eigenvalues and Lyapunov exponents.

Question

The numerical methods for finding (the largest) eigenvalues and (the largest) Lyapunov exponents (LEs) look similar.

The power method is to use the matrix $B$ repetitively to grow a vector $z$, and the expansion/shrinking rate of the vector is the largest eigenvalue: $\frac {u^T B^{m+1}z}{u^T B^{m}z} \approx \lambda_1$, where $u$ is a somehow arbitrarily chosen vector.

The largest LE is computed similarly by finding the growth rate of a vector under the transform of certain Jacobian matrices $D_t$.

Q1:

Is QR decomposition similar to the power method?
i.e. is QR as follows: when we use QR to find the eigenvalues of a matrix $B$, we let $Q_{s+1}=B Q_s$, where $Q_s=[ q_1 \dots q_n]$ and $q_i$'s' are tagent vectors which form a basis for the tangent space;
then we compute $Q_m = B^m Q_0$, and decompose $Q_m$ with $Q_m = QR$ (where $Q$'s columns are unit vectors); and then the diagonal entries of $R$ will be the growth rates of tangent vectors under the transformation of $B$ and therefore the eigenvalues? $^{1}$

What confuses me is that QR seems to be more complicated than is stated above, and involves Hessenburg matrices and Householder reflections. I am not sure how the two play a role in QR.

Q2:

If my understanding of QR 1 is correct, then I am confused by the way we use QR to compute LEs.

Since LEs are the eigenvalues of $\Lambda=(T_t^T T_t)^{1/2t}$, and $T_t = D_{t-1}\dots D_0$, it seems we should compute the eigenvalues of $D_{t-1},\dots, D_0$ respectively and then multiply them; i.e. we need to use QR to compute $Q_{m,t}=(D_t)^m Q_0$ for each $t$ (i.e. we use the same matrix $D_t$ to transform the tangents $Q_0$ repetitively), then $Q_{m,t}=Q_t R_t$, and the iagonal entries of $R_t$ will be the eigenvalues of $D_t$. Then we multiplies the eigenvalues to get the eigenvalues of $T_t^T T_t$, i.e. $(\prod_t R_{t,ii})^2, \forall i \in \{1,\dots, n\}$.$^2$

But what we actually do is to use different $D_t$ ($D_0,D_1, \dots, D_{t-1}$) to transform the tangents $Q_0$, i.e. $Q_{t}=(D_{t-1})^1 (D_{t-2})^1\dots (D_{0})^1 Q_0$. Why can we do this? Without using $D_t$ with sufficient repetitions, it seems we cannot get the eigenvalues of $D_t$, then how can we get the LEs (which in my eyes are the product of eigenvalues of $D_t$. [2])? $^3$

Q3:

Since (no matter we use [2] or [3] to compute LEs) we use $D_t$ ($D_0,D_1, \dots, D_{t-1}$) where $t$ is somehow arbitrarily defined (it is discrete time, while the actual dynamical system is continuous time), the choices of $t$'s possibly matters. Do the time step (the duration between 0, 1, ...,t-2, t-1) matter for the computed value of LEs? If not (as it should be), why?

$\\$

In a word, the three methods look similar but their relations look as messy as intertwined threads to me.

My main reference is 4.8 The eigenvalue problem, in Conte, Elementary numerical analysis.

Related questions: What is integrating a variational equation?, How to understand the largest Lyapunov exponent?.

Supplemental:

What is the meaning of the (blue) text, which looks important?

You need to distinguish between the QR decomposition (usually done via Householder reflectors or Givens rotations) and the QR algorithm that iterates the QR decomposition. Here indeed a transformation to Hessenberg form as a first step can reduce the computational effort as the QR decomposition follows and preserves the zero pattern below the (sub-)diagonal. — Lutz Lehmann, Jun 05 '22 at 09:14
In Conte's book there is a first step about Hessenberg form. But in the QR algorithm mentioned in the Lyapunov spectrum paper there is no such a step. Are the QR decomposition and algorithm very different? Is there an article discussing the difference?//I see a QR decomposition exist in each iteration in the QR algorithms. So in each iteration $Q_{t} = D_{t-1}Q_{t-1}$, we calculate $(D_{t-1})^m$ where $m$ is large enough so that the $R$ converges. — Charlie Chang, Jun 05 '22 at 09:16
As for the LE computation, that enters the region of woolly thinking (like fuzzy logic, only more so). It is a compromise between the wanted result, exponential growth and ease of computation. While there are various approaches to compute approximations of the concept, I've yet to see what the ideal object, the limit to approximate is. But the usual methods are good enough to produce consistently something like "the local average of the ordered singular values of the Jacobian". — Lutz Lehmann, Jun 05 '22 at 09:21
[addition].. a QR decomposition (and therefore a Hessenberg form initiation step) exist in each iteration in the QR algorithms..// I am confused by the QR method in LEs, it seems that we assume that $D_t$ will converge to the same matrix or matrices with the same eigenvalues (i.e. the exponent of the 'exponential growth' will converge), so that the local average (i.e. $\lambda_i$ at the same $i$? 'local': position of an eigenvalue) will converge, and so we can use such average exponent to represent the expansion/shrinking of perturbation? If so, to what extent is the assumption valid? — Charlie Chang, Jun 05 '22 at 09:27
Since it is 'woolly', (w.r.t. Q3) the time $t$ could be somehow arbitrarily chosen, and cannot be absolutely defined or optimized?//Then what is the better way to understand the QR for LEs, if I might not understand it from a mathematically robust perspective? Should I just write and run code for several times and thus have some intuitive understanding? — Charlie Chang, Jun 05 '22 at 09:31
Yes, the averaging time is arbitrary. It should be long enough so that the attractor is sufficiently filled. As far as I know, LE serve to exclude chaos if they all have the same sign, and in the case where chaos and a strange attractor is established by other means, LE can be used to estimate the fractal dimension. I'm not aware of any application where a higher accuracy than two or three decimals is required. — Lutz Lehmann, Jun 05 '22 at 09:37
Isn't the QR decomposition related to the Gram-Schmidt orthonormalisation? — Rodrigo de Azevedo, Jun 05 '22 at 15:30
@RodrigodeAzevedo : Yes, that is the third method to compute it, but it is also the one that is least numerically stable. The other two split off orthogonal matrices, which is known to control the produced rounding errors tightly. — Lutz Lehmann, Jun 05 '22 at 16:17

Lutz Lehmann · Accepted Answer · 2022-06-05T12:51:28.270

QR decomposition is just the matrix factorization $QR=A$ with orthogonal $Q$ and upper triangular $R$. The sign structure of the diagonal of $R$ is not fixed, to get it to be all positive requires an extra effort.

QR algorithm is the most basic idea of the Francis eigenvalue algorithm. The Francis algorithm itself is based on a single-shifted QR decomposition (double-shifted for non-real eigenvalues of non-symmetric matrices). The zero shift algorithm iterates $Q_kR_k=A_k$, $A_{k+1}=R_kQ_k$. $A_0$ is either the input matrix $A$ or its Hessenberg form. In the latter case all $A_k$ and $Q_k$ remain in Hessenberg form.

Careful observation gives that in the first columns the algorithm essentially follows the power iteration, if one takes the first $k$ columns it corresponds to a subspace iteration, and starting from the last column it is (approximately?) the inverse power iteration.

Lyapunov exponents The motivation is to produce something like the eigenvalue analysis around stationary points or fixed points of a Poincaré map also for orbits that are not closed but still restricted to some bounded area. In the interesting cases the divergence of the trajectories is exponential, the (state space) Jacobian $J(t,t_0)$ of the propagator $x(t)=P(t,t_0)(x_0)$, rapidly becomes ill-conditioned. Thus the idea to compute these values $\sigma_j(t_{k+1},t_k)$ for small segments $[t_k,t_{k+1}]$ and average their logarithms $\ln(\sigma_j(t_{k+1},t_k))\sim\lambda_j·(t_{k+1}-t_k)$. As a side effect, these exponents no longer belong to global solution curves like in the motivation. For the small local segments one can produce initial points to that the solution difference to the reference solution grows or shrinks approximately according to the singular values of the propagator Jacobian. However, the initial points for the current local segment will be (slightly) shifted from the end points of the last segment, even if only the direction is considered.

Thus the LE over a length of the reference solution that fills the attractor set densely enough, or over several solutions that achieve this, will give numbers that are characteristic for the attractor, but not really connected to the difference of any two solutions.

The heuristic for the usual LE computations seems to be that in the progression of SVD $$U_k\Sigma_kV_k^T=J(t_{k+1},t_k),$$ $k_{k+1}=t_k+h$, the orthogonal bases at $t_k$ are close enough that $U_{k-1}=V_k+O(h^2)$, so that in the QR decomposition $$Q_kR_k=U_{k-1}^TJ(t_{k+1},t_k)U_{k-1}$$ the product $U_{k-1}Q_k$ is close to $U_k$ and the non-diagonal entries of $R_k$ are $O(h^2)$, so that this abbreviated computation is good enough as an approximate SVD.

This indeed looks like the QR algorithm, like a gliding eigenvalue computation. The question remains if the $R$ factors do indeed sufficiently approximate the singular values of the segment.

Note that $$J(t_{k+1},t_k)=I+h\nabla_xF(t_k,x_k)+O(h^2),$$ so that the factors in any decomposition are $O(h)$ close to the identity matrix in all non-trivial entries. Demanding that the other entries are approximately zero thus needs a smaller tolerance, like $O(h^2)$.

In consequence one could say that the iteration should start with an SVD of $J(t_1,t_0)$ or at least a good enough approximation in the above sense of error $O(h^2)$ in the off-diagonal entries of the middle factor. But also starting with identity matrices is not catastrophically wrong. Using the transformation to Hessenberg form for initialization is somewhere in-between. Usually one lets the computation of the orthogonal frame run for some time before starting the averaging of the singular values. This reduces the importance of the initialization.

I am a bit more perplexed. The QR decomposition in my post actually seems to be meant to be QR algorithms for finding eigenvalues. I am comparing QR algorithms/methods for eigenvalues with QR for LEs. — Charlie Chang, Jun 05 '22 at 10:53
That is what the last part should express, the heuristic for LE gives an algorithm that has some structural similarity to the QR algorithm. However, the intent of the LE is exponential factors of growth, which gives singular values, not eigenvalues. There is some dissonance there that might get explored in better sources. — Lutz Lehmann, Jun 05 '22 at 11:03
$Q_kR_k=A_k...Q_kR_k=J(t_{k+1},t_k)U_{k-1}$ so $A^{(k)}$ in the QR for eigenvalue is similar to $D_tQ_t$ (or $JU$ in your notation) in the QR for LEs. — Charlie Chang, Jun 05 '22 at 11:10
The notation is indeed changing somewhat. In the QR algorithm the $Q_k$ are the incremental rotations, while in the LE computation I have set them to be the cumulative rotation. So it would be more consistent to set $Q_kR_k=U_{k-1}^TJ_kU_{k-1}$ with then $U_k\approx U_{k-1}Q_k$. — Lutz Lehmann, Jun 05 '22 at 11:16
The discussions here are based on the assumption that the divergence of the trajectories is exponential. Is there a proof for that? — Charlie Chang, Jun 05 '22 at 12:18
In the usual situations of interest, this is actually impossible, as all solutions remain bounded. So even if two solutions diverge maximally, their distance will have an upper bound. However, entries in the Jacobian can grow indefinitely and exponentially. But even that may break down and the entries may behave more oscillatory than growing, contracting phases alternating with expanding ones. — Lutz Lehmann, Jun 05 '22 at 12:38
Then why we often talk about exponential growth of perturbations? — Charlie Chang, Jun 05 '22 at 12:57
Because the first term of the perturbation expansion is the linearization around the reference solution, here the unperturbed solution. Globally one can restrict that sometimes with a multi-scale expansion. But locally in the initial phase where the linear terms dominate the growth of the distance is usually exponential. — Lutz Lehmann, Jun 05 '22 at 13:05
I see. Near the initial condition x(0) the perturbation's growth can be exponential (somehow corresponding to that after Taylor linearization of f(x) in $\dot x = f(x)$ to $\dot x = Ax$ (near a fixed point where f(x0) = x0 (then change the axis system to make x0 =0)?), the perturbation $dx(t)$ is exponential with $t$). Since the Taylor linearization is usually valid only in a neighborhood, it and the resulted exponential growth will be invalid when the perturbation grow large. (One need to scale down the perturbation to keep doing similar computation about time evolution of perturbation.) — Charlie Chang, Jun 05 '22 at 13:36
.. (even) by repetitively scaling down the perturbation (e.g. $dx(t_0) \to \epsilon dx(t_0) $) and evolve it exponentially again (the growth of $dx(t_0)$ to $dx(t_0+t)$ will not be proportional to that of $\epsilon dx(t_0)$ to $\epsilon dx(t_0+t)$, because of the non-linearity due to the distance from the $x(t_0)$) , we still limit the growth of the perturbation, making it possible that the perturbation will not grow into an un-bounded space. — Charlie Chang, Jun 05 '22 at 13:48
.. It is like LEs are more (proper as) a diagnostic/characterizing tool, not corresponding to the phys truth (of 2 flows) or math validity; partly for the beyond-neighborhood nonlinearity above; partly because LEs are not based on strict math (but a simplified, misused version of Oseledets' difficult original work; and (not sure) it may be improper to attribute this idea of characterizing chaos to Lyapunov), e.g., the norm ||()|| (introduced into the time-limit computation of LEs) is not well defined for high-dim (phase) space. (References: Cvitanovic ́, Chaos Book, ch6 LEs) — Charlie Chang, Jun 05 '22 at 14:41

How to understand QR decomposition? Compare the power method, QR decomposition for finding eigenvalues and Lyapunov exponents.

1 Answers1