0

In the following multivariate normal distribution formula the $\sum^{-1}$ seems very unintuitive.

enter image description here

Is $(x-\mu)^T\Sigma^{-1}(x-\mu)$ the same as $\Sigma(x-\mu)^2$? If not, what is it and why?

Thanks in advance.

Blue
  • 75,673
  • https://en.wikipedia.org/wiki/Covariance_matrix – Sten Apr 30 '20 at 05:49
  • The exponent $^{-1}$ denotes matrix inversion. –  Apr 30 '20 at 05:59
  • 1
    The $\Sigma$ allows you to define normal distributions that are not symmetric about the mean. Intuitively, this means if you start from the mean and walk in different directions, you're more likely to see samples from the distribution in some directions than in others. See this image from the above wikipedia article for an example of a non-symmetric normal. If you only want a perfectly symmetric normal, you can take $\Sigma$ to be a scalar. – Jair Taylor Apr 30 '20 at 06:04
  • In other words, $\Sigma$ lets rotate and/or stretch the distribution along desired axes. – Jair Taylor Apr 30 '20 at 06:06

4 Answers4

3

$\Sigma$ is a $k \times k$ matrix and $\Sigma^{-1}$ is its inverse.

1

In the univariate case, the argument of the exponential is $$-\frac{(x-\mu)^2}{\sigma^2}=-\frac12(x-\mu)(\sigma^2)^{-1}(x-\mu).$$

It is a univariate quadratic form, the coefficient of which is the inverse of variance.

As the generalization of the variance to the multivariate case is the variance-covariance matrix, it is quite logical that you get a quadratic form based on the inverse of the variance-covariance matrix.

$$-\frac12(x-\mu)^T\Sigma^{-1}(x-\mu).$$

This is very intuitive.


The normalization constant is also quite logical:

$$\frac1{\sqrt{\sigma^2}}=\sqrt{(\sigma^2)^{-1}}$$

versus

$$\frac1{\sqrt{|\Sigma|}}=\sqrt{|\Sigma^{-1}|}.$$

1

You are talking about the density function when you have covariance matrix $\Sigma$. As "Kavi Rama Murthy" pointed out, if $\Sigma$ is invertible, then $\Sigma^{-1}$ is its inverse matrix.

However, the problem here is the case where $\Sigma^{-1}$ is degenerate. For such a case, $\Sigma^{-1}$ is defined to be the Moore-Penrose Inverse.

For more information about this inverse, you could wikipedia it, and you can read the post here: Show that $\mathbb{E}(X_{0}|X_{1},\cdots,X_{n})=c_{0}+c_{1}X_{1}+\cdots+c_{n}X_{n}$ for $(X_{0},\cdots, X_{n})$ Guassian.


But note that your covariance matrix will also be invertible if the distribution of $X$ is absolutely continuous with respect to Lebesgue measure, in which case the density function exists and it serves as the Radon-Nikodym derivative.

If it is degenerate, then your $X$ does not have density function. We can take $X$ Gaussian as an example:

In this case, $\Sigma$ has some number $k<n$ of strictly positive eigenvalues. Since $X$ is Gaussian, we can write $X=AZ+\mu$ where $Z$ is a $k-$dimensional vector of i.i.d standard normals and $A$ is an $n\times k$ matrix.

Consider the $k-$dimensional space defined by $$E:=\{x\in\mathbb{R}^{n}:x=Az+\mu\ \text{for some}\ z\in\mathbb{R}^{k}\}.$$ Because $\mathbb{P}(Z\in\mathbb{R}^{k})=1$, it follows that $\mathbb{P}(X\in E)=1$. If $X$ has a pdf $f_{X}$, then $$1=P(X\in E)=\int_{E}f_{X}(x)dx.$$ But the $n-$dimensional volume of $E$ is zero since the dimension of $E$ is $k<n$, a contradiction.

So in this case, $X$ did not have a pdf, at least not in the current space, since $X$ no longer has a distribution absolutely continuous relative to $n-$dimensional Lebesgue measure.


So for most of the distribution you worked with, the density is defined, otherwise it is hard to work with.

The problem is for the conditional expectation, but you can see from the link (and the link I referred in side that link) above that we can always find a formula that is irrespective to if the covariance matrix is non-degenerate or not.


This long answer seems not directly relating to your problem, but it worth to talk about. Hope it help :)

0

The multivariate normal distribution is based on vectors and matrices. In particular, $\Sigma$ is covariance matrix, so that is simply the inverse of it. The covariance matrix captures the covariances between each possible pair of elements in your $\mathbf{x}$ vector.