How much regularization to add to make SVD stable?

Question

I've been using Intel MKL's SVD (dgesvd through SciPy) and noticed that results are are significantly different when I change precision between float32 and float64 when my matrix is badly conditioned/not full rank. Is there a guide on the minimum amount of regularization I should add to make results insensitive to float32->float64 change?

In particular, doing $A=UDV^{T}$, I see that $L_\infty$ norm of $V^{T}X$ moves by about 1 when I change precision between float32 and float64. $L_2$ norm of $A$ is $10^5$ and it has about 200 zero eigenvalues out of 784 total.

Doing SVD on $\lambda I + A$ with $\lambda=10^{-3}$ made the difference vanish.

What is the size $N$ of an $N\times N$ matrix $A$ for that example (is it even a square matrix)? 200 zero eigenvalues or singular values? A Frobenius norm $||A||_\text{F}$ for a representative example would be also helpful. — Anton Menshov, May 01 '17 at 18:22
In this case 784 x 784 matrix, but I'm more interested in general technique to find good value of lambda — Yaroslav Bulatov, May 01 '17 at 19:59
So, is the difference in $V$ only in the last columns corresponding to the zero singular values? — Nick Alger, May 02 '17 at 07:05
If there are several equal singular values, the svd is not unique. In your example, I guess that the problem comes from the multiple zero singular values and that a different precision leads to a different choice of the basis for the respective singular space. I do not know why that does change when you regularize... — Dirk, May 02 '17 at 17:11
@Dirk It changes simply because there aren't multiple singular values anymore, I guess. — Federico Poloni, May 03 '17 at 11:01
(Indeed, note that adding a multiple of $I$ does not change singular values predictably -- unlike eigenvalues.) — Federico Poloni, May 08 '17 at 07:02
Would you know of $A \ x$ similar to yours on the web, to look at ? Thanks — denis, Feb 18 '18 at 10:09
Yup, here's a code to download/render into numpy the matrix I was looking at -- https://github.com/yaroslavvb/stuff/blob/master/svd_test.py#L27 — Yaroslav Bulatov, Feb 22 '18 at 03:55
Fwiw, a plot of the eigenvalues looks much like that of random correlation matrices. See e.g. this on stats.stack . — denis, Mar 01 '18 at 18:38

Richard Zhang · Answer 1 · 2017-05-02T22:37:25.143

The singular value decomposition for a symmetric matrix $A=A^{T}$ is one and the same as its canonical eigendecomposition (i.e. with an orthonormal matrix-of-eigenvectors), while the same thing for a nonsymmetric matrix $M=U \Sigma V^T$ is just the canonical eigenvalue decomposition for the symmetric matrix $$ H=\begin{bmatrix}0 & M\\ M^{T} & 0 \end{bmatrix}=\begin{bmatrix}U & 0\\ 0 & V \end{bmatrix}\begin{bmatrix}0 & \Sigma\\ \Sigma & 0 \end{bmatrix}\begin{bmatrix}U & 0\\ 0 & V \end{bmatrix}^{T} $$ Hence, without loss of generality, let us consider a closely related question: If two symmetric matrices are approximately the same, then should we expect their canonical eigendecompositions to also be approximately the same?

The answer is a surprising no. Let $\epsilon>0$ be small, and consider the two matrices $$ A_{\epsilon}=\begin{bmatrix}1 & \epsilon\\ \epsilon & 1 \end{bmatrix}=V\Lambda_{\epsilon}V^{T},\qquad B_{\epsilon}=\begin{bmatrix}1+\epsilon & 0\\ 0 & 1-\epsilon \end{bmatrix}=U\Lambda_{\epsilon}U^{T} $$ both of which have eigenvalues $\Lambda_{\epsilon}=\mathrm{diag}(1+\epsilon,1-\epsilon)$, but whose eigenvectors are $$ V=\frac{1}{\sqrt{2}}\begin{bmatrix}1 & 1\\ 1 & -1 \end{bmatrix},\qquad U=\begin{bmatrix}1 & 0\\ 0 & 1 \end{bmatrix}. $$ While the matrices $A_{\epsilon} \approx B_{\epsilon}$ are approximately the same, their matrix-of-eigenvectors $V$ and $U$ are very different. Indeed, since the eigendecompositions are unique for $\epsilon>0$, there really exists no choice of $U,V$ such that $U\approx V$

Now, applying this insight back to the SVD under finite precision, let us write $M_{0}=U_{0}\Sigma_{0}V_{0}^{T}$ as your matrix in float64 precision, and $M_{\epsilon}=U_{\epsilon}\Sigma_{\epsilon}V_{\epsilon}^{T}$ as the same matrix in float32 precision. If we assume that the SVDs themselves are exact, then the singular values $\Sigma_{0},\Sigma_{\epsilon}$ must differ by no more than a small constant factor of $\epsilon\approx10^{-7}$, but the singular vectors $U_{0},U_{\epsilon}$ and $V_{0},V_{\epsilon}$ can differ by an arbitrarily large quantity. Hence, as shown, there is no way to make the SVD "stable" in the sense of the singular vectors.

Is this example from: http://users.math.msu.edu/users/markiwen/Teaching/MTH995/Papers/SVD_Stewart.pdf ? — Memming, May 03 '17 at 22:59
That's a great reference. I don't know, I learnt this particular example many years ago in math class :-) — Richard Zhang, May 04 '17 at 17:13

denis · Accepted Answer · 2018-02-27T17:23:33.287

Although the question has a great answer, here's a rule of thumb for small singular values, with a plot.

If a singular value is nonzero but very small, you should define its reciprocal to be zero, since its apparent value is probably an artifact of roundoff error, not a meaningful number. A plausible answer to the question "how small is small ?" is to edit in this fashion all singular values whose ratio to the largest is less than $N$ times the machine precision $\epsilon$ .

$\qquad$ — Numerical Recipes p. 795

Added: the following couple of lines calculate this rule-of-thumb.

#!/usr/bin/env python2

from __future__ import division
import numpy as np
from scipy.sparse.linalg import svds  # sparse, dense or LinOp

#...............................................................................
def howsmall( A, singmax=None ):
    """ singular values < N float_eps sing_max  may be iffy, questionable
        "How small is small ?"
        [Numerical Recipes p. 795](http://apps.nrbook.com/empanel/index.html?pg=795)
    """
        # print "%d singular values are small, iffy" % (sing < howsmall(A)).sum()
        # small |eigenvalues| too ?
    if singmax is None:
        singmax = svds( A, 1, return_singular_vectors=False )[0]  # v0=random

    return max( A.shape ) * np.finfo( A.dtype ).eps * singmax

The Hilbert matrix seems to be widely used as a test case for roundoff error:

Here low-order bits in the mantissas of the Hilbert matrix are zeroed, A.astype(np.float__).astype(np.float64), then np.linalg.svd is run in float64. (Results with svd all float32 are about the same.)

Simply truncating to float32 might even be useful for denoising high-dimensional data, e.g. for train / test classification.

Real test cases would be welcome.

btw, scipy seems to add a factor of 1e3 for float32 and 1e6 for float64, curious where these came from — Yaroslav Bulatov, Sep 28 '19 at 04:02
@Yaroslav Bulatov, numpy and scipy.linalg.svd call LAPACK gesdd, see parameter JOBR in dgejsv: "Specifies the RANGE for the singular values. Issues the licence to set to zero small positive singular values if they are outside ..." (scipy.sparse.linalg.svds wraps ARPACK and has a parameter tol, Tolerance for singular values.) — denis, Oct 08 '19 at 08:25

How much regularization to add to make SVD stable?

2 Answers2

Linked