1

I am a little bit confused with the chain rule of matrix derivatives. For example, let

$f(X) = \text{tr}([\log(W X W^\top + B)]^2)$,

where $\log(X)$ is the matrix logrithm of matrix $X$, $X$ is a $m\times m$ symmetric positive definite (SPD) matrix, $B$ is a $n \times n$ SPD matrix ($n>m$), and $W\in \mathbb{R}^{n\times m}$ is a rectangle matrix. If I use the chain rule, I should have

$\frac{\partial f}{\partial X} = 2\log(W X W^\top + B) W^\top(W^\top X W + B)^{-1}W$.

However, the dimensions of $\log(W X W^\top + B)$ and $W^\top(W^\top X W + B)^{-1}W$ are $n \times n$ and $m \times m$ respectively. So there must be something wrong with my my derivations, but I don't know where is it.

Any comments? Thanks a lot!

* Addition *

Let $Z=(W X W^\top + B)S$.

What if $f(X) = \text{tr}((\log(Z))^\top\log(Z))$, where $S$ is also SPD matrix. Do we have

$\frac{\partial f}{\partial X} = 2W^\top \log[(W X W^\top + B)S] (W^\top X W + B)^{-1}W$,

or

$\frac{\partial f}{\partial X} = 2W^\top(W^\top X W + B)^{-1}S^{-1} \log[(W X W^\top + B)S] SW$?

Using the notations in this post, my solution is:

Define $Z=(W X W^\top + B)S$, and $\phi=\text{tr}([\log(Z)]^2)$. Then we have

$d\phi = 2\log(Z)\cdot Z^{-\top}:dZ=2\log(Z)\cdot Z^{-\top}: WdXW^\top S = 2W^\top\log(Z)\cdot Z^{-\top}SW: dX$.

Therefore, we have

$\frac{\partial \phi}{\partial X} = 2W^\top\log(Z)\cdot Z^{-\top}SW=2W^\top \log[(W X W^\top + B)S] (W^\top X W + B)^{-1}W$.

Is it correct?

  • 1
    The correct gradient is $$\frac{\partial f}{\partial X} = 2,W^T(WXW^T+B)^{-1}\log(WXW^T+B)W$$ – greg May 23 '18 at 23:50

2 Answers2

1

To answer your second question, consider this calculation $$\eqalign{ Z &= WXW^TS+BS \cr L &= \log Z \cr \phi &= {\rm tr\,}L^2 \cr d\phi &= (2LZ^{-1})^T:dZ \cr &= 2L^TZ^{-T}:W\,dX\,W^TS \cr &= 2W^TL^TZ^{-T}S^TW:dX \cr \frac{\partial\phi}{\partial X} &= 2W^TL^TZ^{-T}S^TW \cr }$$ Your first question is a special case of your second, in which $S$ equals the identity matrix. This choice of $S$ makes $(L,Z)$ symmetric, so you can omit the transposes on those terms.

greg
  • 35,825
  • Thanks a lot for the comments! However, I don't quite understand how do you get $d\phi = (2LZ^{-1})^T:dZ $. Could you please show some detailed derivations? – user3138073 Oct 09 '18 at 23:23
  • @user3138073 Consider the scalar function $$f(\lambda)=(\log\lambda)^2$$ whose derivative is $$f'(\lambda)=\frac{2\log\lambda}{\lambda}$$ Apply this to the trace of the corresponding matrix function via the formula $$d,{\rm tr}f(X)=f'(Z^T):dZ$$ It's also worth pointing out that the matrices $(L,Z)$ commute, since $L$ is a function of $Z$. – greg Oct 09 '18 at 23:36
  • It appears that this answer was altered to match a slight change in the question. Specifically the function $,{\rm tr}(L^2) \implies {\rm tr}(L^TL).,$ Despite the seemingly minor nature of this change, there is no simple closed-form solution to the new question. In particular, the expressions for $d\phi$ given in my answer is no longer valid. – greg Nov 22 '18 at 16:44
  • Hi @greg, thanks a lot for the feedback! But it seems that your answer is based on $\phi = \text{tr} L^\top L$, not $\phi = \text{tr} L^2$. Is my understanding correct? – user3138073 Nov 26 '18 at 17:01
  • No. My original answer dealt with the function $\phi={\rm tr}(L^2)$. Five months later that single line was edited to read $\phi={\rm tr}(L^TL)$. But everything after that line is no longer true for the new function. – greg Nov 26 '18 at 19:42
  • Yes. The original question is $\text{tr}(L^2)$, and I changed it to $\text{tr}(L^\top L)$. In this case, there is no analytical solution? I thought your answer is applicable to the second case, because it was edited at Oct 10 at 20:56, after I modified my question. – user3138073 Nov 28 '18 at 00:23
  • Also, it seems that $d\text{tr}f(Z) = f'(Z^\top) : dZ$ is a very general rule, but I didn't find any related reference. To me, the definition of $f'(Z^\top)$ is also not clear, since in my case, $f(Z) = \log(Z)^\top \log(Z)$ is also a matrix, not a scalar. Could you please tell me where it is from? Thanks! – user3138073 Dec 03 '18 at 05:51
1

Let $Z=WXW^T+B$; it's a symmetric $>0$ matrix. Since $\log(Z)$ and $Z^{-1}$ commute, the derivative is

$Df_X:K\in M_{m,m}\rightarrow 2tr(\log(Z)Z^{-1}WKW^T)=2tr(W^T\log(Z)Z^{-1}WK)$.

Then the gradient is

$\nabla(f)(X)=2W^TZ^{-1}\log(Z)W\in M_{m,m}$.

When I am writing, I see that greg obtains the same result.

EDIT. Comment on the addition by @user3138073 . The answer is no but you will have trouble understanding why...

Assume that $U(t)$ is a function of $t\in \mathbb{R}$ and let $f(t)=tr((\log(U))^2)$; then $f'(t)=2tr((\log(U))'\log(U))=2tr(\log(U)U^{-1}U')$; indeed, behind, there is a series, and thanks to the trace, and because $\log(U)$ is a polynomial in $U$ (it's true when $U$ has no $<0$ eigenvalues), we can put $U'$ on the right side of the trace and obtain the series which gives $U^{-1}$ (it's absolutely not obvious!).

In a second time, you choose $g(t)=tr((\log(U)S)^2)$. Then $g'(t)=2tr((\log(U))'S\log(U)S)$. Unfortunately, $(\log(U))'=U^{-1}U'$ is absolutely false (it's much more complicated than that!). If you put $U'$ on the right side of the trace, then you break the series (cf. above) because $S,U$ don't commute.

In other words, $tr(U^2U'U^3\log(U))=tr(U^5U'\log(U))$ but $tr(U^2U'U^3S\log(U)S)\not= tr(U^5U'S\log(U)S)$.

  • Thanks a lot for the help! Could you tell me if there is any textbook/tutorial about all these matrix derivative things? – user3138073 May 24 '18 at 03:19
  • @user3138073 , you can read any book about differential calculus. greg does not use same notations as me. I prefer to consider the derivative as a linear function and to deduce the gradient, using the equality $Df_X(K)=<\nabla (f)(X),K>=tr((\nabla(f)(X))^TK)$. –  May 25 '18 at 13:37
  • Thanks for the feedback! I need to spend some time on the matrix calculus, since I am not quite familiar with the notation you used. Here you use the fact that $log(Z)$ and $Z^{-1}$ is commute. But what if $Z$ is not symmetric? I've update my questions. any comments? thanks! – user3138073 May 25 '18 at 21:05
  • Thanks for the comments! I made a mistake (now are corrected) -- it is not $(\log(U))^2$, but $(\log(U))^\top \log(U)$. I thought U was symmetric, so I ignore the transpose. – user3138073 Oct 10 '18 at 20:22