7

Suppose $h$ is a vector of $d$ positive numbers adding up to 1. I'm looking for a $O(d)$ algorithm to estimate eigenvalues of the following diagonal + rank1 matrix:

$$A=2\operatorname{diag}(h)-hh^T$$

Empirically it appears that $2h$ gives a good starting point for eigenvalues of $A$. How could I improve this estimate in $O(d)$ time?

enter image description here

Background: $f(t)=\operatorname{Tr} \exp(-t A)$ gives an estimate of mean error of SGD after $t/2$ steps of solving linear least squares for normally distributed data with covariance eigenvalues $h$ and a random starting distribution of error. Values of $h$ are expected to decay at least as fast as power-law with constant >1

This follows from Equation 5 of Bordelon paper after discarding small terms, using approximation$(I-A)^t\approx e^{-t A} $ and this trace result.

Equivalently, $f(t)$ gives expected value of $\|e\|^2$ after $t/2$ iterations of the form $e\leftarrow e-x\langle e, x\rangle$ for $x\sim \text{Normal}(0,\operatorname{diag}h)$ and isotropic starting $e_0$ with $\|e_0\|^2=1$.

enter image description here Notebook

Yaroslav Bulatov
  • 2,655
  • 11
  • 23
  • 1
    See this: https://en.m.wikipedia.org/wiki/Bunch%E2%80%93Nielsen%E2%80%93Sorensen_formula – lightxbulb Apr 20 '23 at 08:05
  • 2
    You might also be interested in some references I gave (ten years ago) in answering this Math.SE Question about numerical eigenvalues of a diagonal-plus-rank-one matrix (a more general problem than the one you've asked about). Readers will benefit from learning more about the context in which your version arose. – hardmath Apr 20 '23 at 12:57
  • 1
    @hardmath added clarifications – Yaroslav Bulatov Apr 20 '23 at 16:07
  • Thanks, it helps to have an idea about the importance of approximating eigenvalues vs. exact values. Also to think about how likely the components of $h$ are to be nearly equal. – hardmath Apr 20 '23 at 16:14
  • @hardmath thanks. I've indeed found a couple of $O(d^2)$ methods, one of them lead to discussion and efficient implementation. However, nothing I've seen suggests a way to approximate eigenvalues in $O(d)$ time – Yaroslav Bulatov Apr 20 '23 at 20:07
  • 1
    @hardmath more background on this problem is given here – Yaroslav Bulatov Apr 21 '23 at 06:13
  • @lightxbulb That formula is about computing the eigenvectors given the eigenvvalues. – Federico Poloni Apr 22 '23 at 11:25
  • @FedericoPoloni There is a paper from Golub in the references. It's $O(n^2)$ though and not $O(n)$. The characteristic polynomial has a pretty specific form in the above case, it's described in the paper. – lightxbulb Apr 22 '23 at 23:54

1 Answers1

6

I get a substantially better estimate if sort $h$ (descending) and estimate the eigenvalues as:

$$ \tilde\lambda_i = h_i + h_{i+1} $$


This works because a symmetric DPR1 matrix (with positive rank-one modification, no duplicate eigenvalues) has the strict interlacing property, which means that

$$ \lambda_1 > d_1 > \lambda_2 > d_2 > ... > \lambda_n > d_n $$

where $d_i$ are the entries of the diagonal matrix without the rank one modification. Details in this paper, introduction up to equation 5. In your matrix structure, the rank one modification is negative, meaning

$$ A = D + \rho h h^T,\quad\rho < 0 $$

and thus the strict interlacing property applies to the negated matrix where the symmetric rank one modification is positive, like

$$ A = -D + \rho h h^T,\quad\rho > 0 $$

For your original matrix, that means

$$ d_i > \lambda_i > d_{i+1} $$

My estimator simply takes the average of the upper and lower bound. The image below shows the result for a random 15 element vector. Your estimator is equivalent to the upper bound shown here.

Estimator performance comparison

Rainer P.
  • 451
  • 2
  • 6
  • 1
    Interesting! The 2h estimator essentially approximates eigenvalues by diagonal values, and the error can be bounded by Gershgorin circle theorem. Not sure why $h_i+h_{i+1}$ works better – Yaroslav Bulatov Apr 22 '23 at 15:51
  • thanks for the update, this makes sense. Tried your average formula on random matrices. Seems a lot better than upper/lower bound, although biased – Yaroslav Bulatov May 03 '23 at 22:40