Gradient of a numerically defined eigenvalue function

Question

I want to compute the gradient of a numerically defined, vector argument function using Mathematica's internal functions (One can write one's own function however, then what is the point of having internal functions).

I looked at answers to similar questions,here and here, however; that didn't help. Here is the code I am trying

var = {a, b};
f[var_?(VectorQ[#, NumericQ] &)] := var.var;
gradf[var_?VectorQ] := D[f[var], {var}];
gradf[var] /. {a -> .1, b -> .2}

The code doesn't return {0.2,0.4} which is the answer, instead it returns un-evaluated symbolic representation of the answer. Any idea what is going on and how to correct this?

Can you use f[var_List] := var.var or f[var_?VectorQ] := var.var instead? — Carl Woll, Mar 05 '17 at 18:16
No. If f[var_?VectorQ] := var.var , then the above code works. However, the function I am interested in, is defined numerically (the code presented here is meant for representation purposes, the actual function I want to work with is not var.var. It is a numerical function, I'll spare everyone the actual function unless it is absolutely necessary) — user21455, Mar 05 '17 at 18:26
How is the function derived? Does it come from NDSolve, FindRoot, ...? I think it is always better to utilize the function definition when computing derivatives instead of just relying on the black box output. — Carl Woll, Mar 05 '17 at 18:45
The function f takes a list of variables as argument, constructs a matrix, computes its eigen values and applies the $ \sum_i v_i \log v_i $ function to the eigen values $v_i$. It is better to accept numerical values for f, since the eigen value computation for large matrices may not yield algebraic results. — user21455, Mar 05 '17 at 19:07

Carl Woll · Accepted Answer · 2017-03-07T17:07:40.407

Based on the comments from the OP above, the problem originates from computing the gradient of a function of the eigenvalues of a non-constant matrix. Below I propose a method for computing this gradient.

(If there is interest, I could extend this method to also return the variation of the eigenvectors)

First, I'll assume that we're dealing with real symmetric matrices with no duplicate eigenvalues (I extend this to all complex matrices at the end). Let $M$ be the matrix whose eigenvalues you're computing. Then you can use:

$$\delta \lambda =S.\text{$\delta $M}.S^T$$

to compute the variation in the eigenvalues, where $S$ is the similarity matrix from Eigenvectors. As a simple example, consider the following matrix:

SeedRandom[49];
mat[x_, y_] = (#+Transpose[#])&[
    RandomInteger[10,{3,3}] + 
    RandomInteger[10,{3,3}](RandomChoice[{x,y},{3,3}]^RandomInteger[5,{3,3}])
];
mat[x, y] //TeXForm

$\left( \begin{array}{ccc} 36 & x+14 & 13 \\ x+14 & 40 & 3 x^4+10 y^2+4 \\ 13 & 3 x^4+10 y^2+4 & 20 y+16 \\ \end{array} \right)$

This example is simple enough that we can compute the exact eigenvalues:

eigs = Eigenvalues[mat[x,y]];

You're interested in the gradient of $\sum _i^3 \lambda _i \log \left(\lambda _i\right)$, which we can evaluate (at $x=0$, $y=0$) with:

D[eigs . Log[eigs], {{x, y}}] /. Thread[{x,y}->0] //N

{0.768965, 71.0006}

Now, lets use the Eigensystem to compute the same result (using N avoids long exact expressions, and also forces Eigensystem to return normalized vectors)

{λ, vecs} = Eigensystem @ N[mat[0,0]];
λ
vecs //TeXForm

{55.645,27.0316,9.32339}

$\left( \begin{array}{ccc} -0.675197 & -0.678307 & -0.289842 \\ 0.560288 & -0.727182 & 0.396589 \\ -0.479778 & 0.105381 & 0.871039 \\ \end{array} \right)$

As a check, we can use the above eigenvectors to regenerate the eigenvalues:

λ == Tr[vecs . mat[0, 0] . Transpose[vecs], List]

True

The variation of the eigenvalues with respect to x and y is given by:

δλx = Tr[vecs . Derivative[1,0][mat][0, 0] . Transpose[vecs], List]
δλy = Tr[vecs . Derivative[0,1][mat][0, 0] . Transpose[vecs], List]

{0.915982, -0.814863, -0.101119}

{1.68017, 3.14566, 15.1742}

Since the variation of the $\lambda \log (\lambda )$ is given by $(1 + \log (\lambda )) \delta \lambda$ we can reproduce our earlier results with:

(1 + Log[λ]) . δλx
(1 + Log[λ]) . δλy

0.768965

71.0006

Note that this method doesn't explicitly depend on the size of the matrix, so you should be able to apply it to any matrix where Eigensystem is able to return results for a numerical matrix.

Summarizing:

compute D[mat[x, y], {{x, y}}]/.{x->x0,y->y0}
compute Eigensystem[mat[x0, y0]]
compute $\delta \lambda$ using $\delta \lambda =S.\text{$\delta $M}.S^T$
compute your trace

Here is the extension to all complex matrices. It is more difficult because it requires both the left and right eigenvectors of the matrix. The relevant equation is:

$$\delta \lambda _i = \frac{\overline{l_i}. \delta M .r_i}{\overline{l_i}.r_i}$$

where $r_i$ is the right eigenvector associated with $\lambda _i$, and $\overline{l_i}$ is the complex conjugate of the the left eigenvector. Specifically, $\overline{l_i}$ satisfies:

$$\overline{l_i}.M = \lambda _i \overline{l_i}$$

or

$$M^{\dagger }. l_i = \bar{\lambda _i} l_i$$

When $M$ is Hermitian, then there is an obvious relationship between these vectors, but when $M$ is not Hermitian, I don't know of such a relationship. Hence, the general case will require 2 Eigensystem calls.

Let's see how this works in practice. Here is a random complex matrix:

SeedRandom[42];
mat[x_,y_] = RandomInteger[10,{3,3,2}].{1,I} + 
    RandomInteger[10,{3,3}](RandomChoice[{x,y},{3,3}]^RandomInteger[5,{3,3}]);
mat[x,y] //TeXForm

$\left( \begin{array}{ccc} 5 x^2+6 & 16+2 i & 9 x^5 \\ 8+6 i & 9 y^2+(4+i) & 8 y+(10+i) \\ 2 x+8 & 7+10 i & 9 x^5+(3+5 i) \\ \end{array} \right)$

Here is the exact case as a control:

eigs=Eigenvalues[mat[x,y]];
D[eigs, {{x, y}}] /. Thread[{x,y}->1.] //Transpose //TeXForm

$\left( \begin{array}{ccc} 14.179\, +10.2654 i & 12.0111\, -13.4113 i & 28.8099\, +3.14593 i \\ 6.90469\, -1.59718 i & 1.46593\, +0.158321 i & 9.62938\, +1.43886 i \\ \end{array} \right)$

Now, we need to find the eigensystems of mat[1, 1] and ConjugateTranspose @ mat[1,1]

{rλ, rvecs} = Eigensystem[mat[1.,1.]];
{lλ, lvecs} = Eigensystem[ConjugateTranspose @ mat[1.,1.]];

It is important that the eigenvalues match, so that the vectors match as well:

rλ == Conjugate[lλ]

True

The variation of the eigenvalues can then be determined with:

Divide[
    Tr[Conjugate@lvecs . Derivative[1,0][mat][1,1] . Transpose[rvecs],List],
    Tr[Conjugate@lvecs . Transpose[rvecs],List]
]
Divide[
    Tr[Conjugate@lvecs . Derivative[0,1][mat][1,1] . Transpose[rvecs],List],
    Tr[Conjugate@lvecs . Transpose[rvecs],List]
]

{28.8099 + 3.14593 I, 14.179 + 10.2654 I, 12.0111 - 13.4113 I}

{9.62938 + 1.43886 I, 6.90469 - 1.59718 I, 1.46593 + 0.158321 I}

This is the same as the control, up to a change in the order of eigenvalues.

@user21455: If you like the answer you can upvote and if this solves your problem, you can press the 'accept answer' button next to it (hint) :) — Thies Heidecke, Mar 06 '17 at 20:34

score 2 · Answer 2 · answered Mar 06 '17 at 04:54

2

v = {a, b}
gradf[x_, y_] := D[v.v, {v}] /. {a -> x, b -> y}
gradf[0.1, 0.2]

answered Mar 06 '17 at 04:54

ubpdqn

60,617
3
59
148

score 1 · Answer 3 · answered Mar 05 '17 at 23:58

Perhaps something using numerical differentiation:

Needs["NumericalCalculus`"];

ff[{x1_, x2_}] := {x1, x2}.{x1, x2};

nGradf[{a_, b_}] := {ND[ff[{x, b}], x, a], ND[ff[{a, y}], y, b]};

nGradf[{.1, .2}]

(* {0.19999999999999932`,0.4000000000000007`} *)

Gradient of a numerically defined eigenvalue function

3 Answers3