3

Suppose I have two mean zero multivariate Gaussian random variables $X$ and $Y$ on $\mathbb R^d$, with covariance matrices $A$ and $B$. (Assumed to have full rank).

I have many choices of joint distributions such that the pair $(X,Y)\in\mathbb R^{2d}$ is Gaussian. I want to choose one so that $X$ and $Y$ are as close to each other as possible.

So as $X-Y$ is a Gaussian I want to minimise the trace of its covariance matrix.

I'm pretty sure the way to do this is to choose some matrix $M$ such that $B =M^T A M$ and set $Y = M^T\cdot X$.

Intuitively, this reduces the range of values that the pair $(X,Y)$ can take, so it must reduce the variance of $X-Y$. For my purposes this reasoning is good enough.

I already have $A$ and $B$ in the form $A= \alpha^T\alpha$, $B=\beta^T\beta$ so I think I just need to choose a unitary matrix $U$ and set $M = \alpha^{-1} U \beta$.

I've managed to forget everything I know about linear algebra. So I'm having trouble minimising over $U$, the naïve way of doing it (differentiate everything in sight and set to $0$) just produces a mess.

Is there a nice way of doing this?

Tim
  • 5,564
  • I think with $X=MY$ you want $A=MBM^\top$ instead of $B=M^\top AM$? – joriki May 05 '13 at 21:45
  • Let me be sure I understand this: you have the ability to select the joint distribution to be among any that satisfies $\mathop{\textbf{E}}XX^T=A$ and $\mathop{\textbf{E}}YY^T=B$. You wish to choose one that minimizes $\mathop{\textrm{Tr}} \mathop{\textbf{E}} (X-Y)(X-Y)^T$. Correct? – Michael Grant May 06 '13 at 00:45
  • D'oh. Silly me for missing your comment and misinterpreting. I am getting a different trace than you and thought this might be why. Apologies! Deleted my comment. – Michael Grant May 06 '13 at 16:39
  • Thanks for both of the corrections. I was going to go back and correct it myself once I remembered how to do linear algebra again. I'm going to edit it to put in a bit of reasoning why it's enough to choose a $U$ in case someone else ever finds this question interesting. But for the record I wanted to get X as a function of $Y$ anyway, it's much easier to implement what I'm trying to do that way. – Tim May 06 '13 at 16:56
  • I've edited my answer to reflect the required correction in the question; it turns out that also considerably simplifies the answer. – joriki May 06 '13 at 17:06

1 Answers1

3

The trace of the covariance matrix of $X-Y$ is

$$\def\Tr{\operatorname{Tr}}\Tr\langle(X-Y)^\top(X-Y)\rangle=\langle\Tr X^\top X+\Tr Y^\top Y-2\Tr X^\top Y\rangle\;,$$

where angled brackets denote the expected value. The only part that varies with $U$ is $\langle\Tr X^\top Y\rangle$.

Since Michael has now confirmed that there's an error in your ansatz, I'll write out the calculation again in what I think is a correct form.

In order to leave most of your setup of $\alpha$, $\beta$, $A$, $B$ and $U$ intact, I'll just replace $X=MY$ by $M^\top X=Y$. Then we have

\begin{align} \langle\Tr X^\top Y\rangle &= \langle\Tr X^\top M^\top X\rangle \\ &= \langle\Tr X^\top\beta^\top U^\top\alpha^{-1\top} X\rangle \\ &= \Tr\alpha^{-1\top}\langle XX^\top\rangle\beta^\top U^\top \\ &= \Tr\alpha^{-1\top}A\beta^\top U^\top \\ &= \Tr\alpha^{-1\top}\alpha^\top\alpha\beta^\top U^\top \\ &= \Tr\alpha\beta^\top U^\top \;. \end{align}

Thus the optimal $U$ is the unitary matrix closest to $\alpha\beta^\top$, which you can compute using singular value decomposition. As clarified in an exchange of comments between Michael and myself, the minimization of $\lVert R-A\rVert_F$ mentioned in that Wikipedia article is equivalent to the maximization of $\operatorname{Tr}A^\top R$, since $\lVert R-A\rVert_F^2=\operatorname{Tr}(R^\top R-R^\top A-A^\top R+A^\top A)$, where the first and last terms are constant and the other two both yield $-\operatorname{Tr}A^\top R$.

joriki
  • 238,052
  • That looks great, but I'll have to spend a bit of time getting my head around it. I'll check my working on your comment too, but we're in the same time zone so I'm going to bed. Do the angular brackets mean anything? Or are they just brackets? – Tim May 05 '13 at 23:00
  • Sudden thought, your solution for the optimal $U$ (and so the optimal $M$) depends on $Y$. But $Y$ is random, so choosing $M$ dependent on $Y$ affects the marginal distribution of $X$. So I need to pick an $M$ that only depends on $A$ and $B$. – Tim May 05 '13 at 23:49
  • @Tim: One answer to both of your comments: The angular brackets denote the expectation value. Sorry, I guess that's less widespread in mathematics than in physics; you're not the first one on this site to ask about it; I guess I should start introducing that notation when I use it. – joriki May 06 '13 at 04:01
  • That makes a lot more sense. I've never seen angular brackets used for expectation before. I guess it means expectation with respect to some canonical probability space called "the universe" that mathematicians aren't aware of. I want to check I understand the whole procedure, but that was exactly what I was looking for. – Tim May 06 '13 at 10:42
  • I don't see how it is established that this is going to minimize the trace covariance. – Michael Grant May 06 '13 at 15:29
  • @Michael: Could you please be more specific about which part of the derivation you're doubting? – joriki May 06 '13 at 15:34
  • The original poster asked that the trace of the covariance of $X-Y$ be minimized. You claim your value of $U$ is optimal in that context, but you do not explain why it must be. – Michael Grant May 06 '13 at 15:47
  • @Michael Probably my fault. I figured you could get any coupling from choosing $U$ at random independently from $Y$, so the way to minimize that variance was to choose the optimal deterministic $U$. – Tim May 06 '13 at 16:02
  • It's possible that this accomplishes it, I just don't think it is at all clear. – Michael Grant May 06 '13 at 16:07
  • @Michael: I don't see any basis for your interpretation of the question in the text of the post. There's only one question in the post, "Is there a nice way of doing this?", where "this" refers to "minimising over $U$". The rest just provides the context of what the OP wants to do; it's not asking us to do anything. I never claimed that the trace can be minimized by minimizing over $U$; only that this is how to minimize over $U$. (It does seem plausible to me, however, that this should be the right way to minimize the trace overall.) – joriki May 06 '13 at 16:23
  • "I never claimed that the trace can be minimized by minimizing over U; only that this is how to minimize over U." This sentence doesn't even make sense to me. You can't say "minimize over U" without specifying what it is being minimized. He asked to minimize the trace of the covariance matrix. What is it that you claim to be minimizing? – Michael Grant May 06 '13 at 16:28
  • @Michael: Sorry, it seems I misunderstood you. I thought you were criticizing that I'd limited myself to the ansatz presented in the question. Yes, I'm minimizing the trace of the covariance matrix, but only in the framework of the OP's ansatz, $X=MY$ and $M=\alpha^{-1}U\beta$ (or however either of these needs to be slightly corrected in the light of our comments under the question). So I'm minimizing the trace of the covariance matrix among all possible values achievable by varying $U$ in this ansatz, but not among all possible joint distributions of $X$ and $Y$. – joriki May 06 '13 at 16:39
  • OK, great! Then we are on the same page---except... I am not seeing how your result minimizes the trace of the covariance matrix, even under the condition that $X=MY$. You didn't prove it---you just jumped from the last trace expression to a declaration that the "closest" unitary matrix (or rather, its transpose) accomplishes it. In what sense do you mean "closest", firstly; and what theorem establishes this is the minimizing value, secondly? That was my original question above :P – Michael Grant May 06 '13 at 16:44
  • Furthermore, the discrepancy I noted in my comment above (and deleted) may still be throwing you off. Since $\langle YY^T\rangle=B=\beta^T\beta$, your last trace involves $\beta\beta^T\beta\alpha^{-1} U$. Intuitively this seems off: I expect something like $\beta\alpha^{-1}U$. – Michael Grant May 06 '13 at 16:46
  • @Michael: Aha, we have made progress -- sometimes long comment threads can be productive after all :-) On your first comment: Did you follow the Wikipedia link I included? The minimization of $\lVert R-A\rVert_F$ they perform is equivalent to minimizing $\operatorname{Tr}(A^\top R)$ (or so I think). On your second comment: Now that you confirmed that this is a mistake in the question, I'm writing out my answer for the corrected setup. – joriki May 06 '13 at 16:52
  • And now we are really back to my original concern: I do not think you can claim an equivalence between minimizing $|R-A|_F$ and $\mathop{\textrm{Tr}}(AR)$. After all, $$|R-A|_F^2=\mathop{\textrm{Tr}}(R^TR-R^TA-A^TR-A^TA).$$ – Michael Grant May 06 '13 at 16:56
  • @Michael: Sorry, I was missing a transposition there. (I snuck it in at the 4.5-th minute. :-) Your equation is precisely why I think this is equivalent -- the two terms $R^\top R$ and $A^\top A$ are constant, and the other two are transposes of each other and thus have the same trace, which is the trace we're minimizing here. (The sign on $A^\top A$ should be positive, by the way.) – joriki May 06 '13 at 16:58
  • Ah, of course, because $U$ is unitary. Very good. Thank you for your patience. Might I recommend including this in your modified writeup... it seems obvious to me now that you've made it clear. – Michael Grant May 06 '13 at 16:59
  • @Michael: OK, I did that. Thank you for your perseverance. :-) The correction of the mistake in the question actually led to a considerable simplification in the answer, much closer to what you intuitively expected. – joriki May 06 '13 at 17:12
  • @joriki Perhaps you know the answer to the following similar question? https://math.stackexchange.com/questions/3094275/constraining-the-sum-of-gaussian-random-variables – Kagaratsch Jan 30 '19 at 23:31