Suppose I have two mean zero multivariate Gaussian random variables $X$ and $Y$ on $\mathbb R^d$, with covariance matrices $A$ and $B$. (Assumed to have full rank).
I have many choices of joint distributions such that the pair $(X,Y)\in\mathbb R^{2d}$ is Gaussian. I want to choose one so that $X$ and $Y$ are as close to each other as possible.
So as $X-Y$ is a Gaussian I want to minimise the trace of its covariance matrix.
I'm pretty sure the way to do this is to choose some matrix $M$ such that $B =M^T A M$ and set $Y = M^T\cdot X$.
Intuitively, this reduces the range of values that the pair $(X,Y)$ can take, so it must reduce the variance of $X-Y$. For my purposes this reasoning is good enough.
I already have $A$ and $B$ in the form $A= \alpha^T\alpha$, $B=\beta^T\beta$ so I think I just need to choose a unitary matrix $U$ and set $M = \alpha^{-1} U \beta$.
I've managed to forget everything I know about linear algebra. So I'm having trouble minimising over $U$, the naïve way of doing it (differentiate everything in sight and set to $0$) just produces a mess.
Is there a nice way of doing this?