Analytic expression for the Gaussian Fisher information matrix

Question

Problem motivation

Suppose $Z \sim \mathcal{N}( \mu, \Sigma )$ is a multivariate $n$-dimensional Gaussian random variable, and let $f( z ; \mu, \Sigma)$ be its density function. I'm interested in calculating the Fisher information matrix, and its entries are given by (in vector notation), $$ \begin{align} & E\left[ - \frac{\partial \log \, f(Z ; \mu, \Sigma)}{ \partial \mu \partial \mu^\top } \right] \\ & E\left[ - \frac{\partial \log \, f(Z ; \mu, \Sigma)}{ \partial \mu \partial \Sigma} \right] \\ & E\left[ - \frac{\partial \log \, f(Z ; \mu, \Sigma)}{ \partial \Sigma \partial \Sigma} \right] \end{align} $$

I'm aware that the Gaussian distribution is smooth enough such that we can interchange the order of integration and differentiation. Thus the above three quantities can be calculated equivalently as, $$ \begin{align} & - \frac{\partial}{\partial \mu \partial \mu^\top } E\left[ \log \, f(X ; \mu, \Sigma) \right] \\ & - \frac{\partial}{\partial \mu \partial \Sigma } E\left[ \log \, f(X ; \mu, \Sigma) \right] \\ & - \frac{\partial}{\partial \Sigma \partial \Sigma } E\left[ \log \, f(X ; \mu, \Sigma) \right] \\ \end{align} $$

Attempt at Mathematica implementation

This suggests we can use the following Mathematica coding approach. I know that Mathematica has a built in Expectation function, and there's also the MultinormalDistribution set of objects. As an illustration, suppose we take $n = 2$. Then if I consider simply computing the expectation of the log likelihood,

muvec = Table[ x[i], {i, 1, 2} ] ;
covmat = { { x[3], x[4]}, {x[4], x[5]} } ; 
normaldist = MultinormalDistribution[ muvec, covmat ]; 
expectedscore = Expectation[Log[ PDF[normaldist, {z[1], z[2]}] ], {z[1], z[2]} \[Distributed] normaldist]

I can't even to get the above code to finish computing on a decent computer! If I can't even compute $E [ \log f(Z; \mu, \Sigma) ]$, then there's clearly no hope to even get to the entries of the Fisher information matrix. And I note that this is a very, very, very modest dimension of $n = 2$!

Question:

Is the above Mathematica method the only or best approach to get the analytical form of the Fisher information matrix?
If not, is there a better way?

This may be of relevance? https://mathematica.stackexchange.com/a/94072/1089 — chris, Feb 18 '18 at 07:42
Can you really interchange the order of integration and differentiation? The expected value of the log of the density doesn't include any of the $\mu$'s. — JimB, Feb 19 '18 at 03:13

score 1 · Answer 1 · answered Jan 19 '18 at 06:14

Here's a brute force way:

pdf = PDF[BinormalDistribution[{μ1, μ2}, {σ1, σ2}, ρ], {x1, x2}];
logpdf = FullSimplify[Log[pdf] //. Log[z_ w_] -> Log[z] + Log[w] //. 
  Log[1/z_] -> -Log[z] //. Log[Exp[z_]] -> z]
hessian = D[logpdf, {{μ1, μ2, σ1, σ2, ρ}, 2}];
FullSimplify[Expectation[-hessian, {x1, x2} \[Distributed] 
    BinormalDistribution[{μ1, μ2}, {σ1, σ2}, ρ]]] // MatrixForm

$$\left( \begin{array}{ccccc} \frac{1}{\text{$\sigma $1}^2-\rho ^2 \text{$\sigma $1}^2} & -\frac{\rho }{\text{$\sigma $1} \text{$\sigma $2}-\rho ^2 \text{$\sigma $1} \text{$\sigma $2}} & 0 & 0 & 0 \\ -\frac{\rho }{\text{$\sigma $1} \text{$\sigma $2}-\rho ^2 \text{$\sigma $1} \text{$\sigma $2}} & \frac{1}{\text{$\sigma $2}^2-\rho ^2 \text{$\sigma $2}^2} & 0 & 0 & 0 \\ 0 & 0 & \frac{\rho ^2-2}{\left(\rho ^2-1\right) \text{$\sigma $1}^2} & -\frac{\rho ^2}{\text{$\sigma $1} \text{$\sigma $2}-\rho ^2 \text{$\sigma $1} \text{$\sigma $2}} & \frac{\rho }{\left(\rho ^2-1\right) \text{$\sigma $1}} \\ 0 & 0 & -\frac{\rho ^2}{\text{$\sigma $1} \text{$\sigma $2}-\rho ^2 \text{$\sigma $1} \text{$\sigma $2}} & \frac{\rho ^2-2}{\left(\rho ^2-1\right) \text{$\sigma $2}^2} & \frac{\rho }{\left(\rho ^2-1\right) \text{$\sigma $2}} \\ 0 & 0 & \frac{\rho }{\left(\rho ^2-1\right) \text{$\sigma $1}} & \frac{\rho }{\left(\rho ^2-1\right) \text{$\sigma $2}} & \frac{\rho ^2+1}{\left(\rho ^2-1\right)^2} \\ \end{array} \right)$$

And if one already knows the answer for a multivariate normal:

μ = {μ1, μ2};
Σ = {{σ1^2, ρ σ1 σ2}, {ρ σ1 σ2, σ2^2}};
fim[x_, y_] := FullSimplify[FullSimplify[D[μ, x].Inverse[Σ].D[μ, y]] + 
  (1/2) Tr[Inverse[Σ].D[Σ, x].Inverse[Σ].D[Σ, y]]]
θ = {μ1, μ2, σ1, σ2, ρ};
Table[fim[θ[[i]], θ[[j]]], {i, Length[θ]}, {j, Length[θ]}] // MatrixForm

$$\left( \begin{array}{ccccc} \frac{1}{\text{$\sigma $1}^2-\rho ^2 \text{$\sigma $1}^2} & -\frac{\rho }{\text{$\sigma $1} \text{$\sigma $2}-\rho ^2 \text{$\sigma $1} \text{$\sigma $2}} & 0 & 0 & 0 \\ -\frac{\rho }{\text{$\sigma $1} \text{$\sigma $2}-\rho ^2 \text{$\sigma $1} \text{$\sigma $2}} & \frac{1}{\text{$\sigma $2}^2-\rho ^2 \text{$\sigma $2}^2} & 0 & 0 & 0 \\ 0 & 0 & \frac{\rho ^2-2}{\left(\rho ^2-1\right) \text{$\sigma $1}^2} & -\frac{\rho ^2}{\text{$\sigma $1} \text{$\sigma $2}-\rho ^2 \text{$\sigma $1} \text{$\sigma $2}} & \frac{\rho }{\left(\rho ^2-1\right) \text{$\sigma $1}} \\ 0 & 0 & -\frac{\rho ^2}{\text{$\sigma $1} \text{$\sigma $2}-\rho ^2 \text{$\sigma $1} \text{$\sigma $2}} & \frac{\rho ^2-2}{\left(\rho ^2-1\right) \text{$\sigma $2}^2} & \frac{\rho }{\left(\rho ^2-1\right) \text{$\sigma $2}} \\ 0 & 0 & \frac{\rho }{\left(\rho ^2-1\right) \text{$\sigma $1}} & \frac{\rho }{\left(\rho ^2-1\right) \text{$\sigma $2}} & \frac{\rho ^2+1}{\left(\rho ^2-1\right)^2} \\ \end{array} \right)$$

While this is a good solution for the bivariate Gaussian (indeed, since closed form is readily available by hand), but a solution for a general n-dimensional distribution is most relevant. After all, I want Mathematical to (at least try?) solve things that I can't do by hand! Thanks for the thoughts though! — user32416, Jan 19 '18 at 08:12
Your question was about your problem with a bivariate Gaussian. If you need an approach for a general n-dimensional distribution, you should modify your question. — JimB, Jan 19 '18 at 16:15

JimB · Answer 2 · 2018-02-19T03:09:21.830

My intent was to make a general approach but practically the code below only gets you the results for n = 3 (7 seconds) and n = 4 (10 minutes).

This approach uses direct calculation of the log of the probability density and a rule rather than the Expectation function. (And this also assumes that the standard deviation of the $i$-th random variable ($\sigma_i$) is the parameter of interest rather than the variance ($\sigma_i^2$). Same thing for using the correlation ($\rho_{ij}$) rather than the covariance ($Cov(x_i,x_j)$).

(* Multivariate normal parameters *)
n = 2;
covmat = Table[If[i == j, σ[i]^2, σ[i] σ[j] ρ[Min[i, j], Max[i, j]]], {i, n}, {j, n}];
muvec = Table[μ[i], {i, n}];
parameters = Variables[{muvec, covmat}];
xx = Table[x[i], {i, n}];

(* Log of multivariate normal density *)
logDensity = -(1/2) (xx - muvec).Inverse[covmat].(xx - muvec) - 
  (n/2) Log[2 π] - (1/2) Log[Det[covmat]];

(* Hessian *)
hessian = Expand[D[logDensity, {parameters, 2}]];

(* Take expectation by using a rule *)
expectationRule = {x[i_]^2 -> σ[i]^2 + μ[i]^2 , 
   x[i_] x[j_] -> ρ[i, j] σ[i] σ[j] + μ[i] μ[j],
   x[i_] -> μ[i]};
expectedHessian = Simplify[hessian /. expectationRule];

$$\left( \begin{array}{ccccc} \frac{1}{\left(\rho (1,2)^2-1\right) \sigma (1)^2} & \frac{\rho (1,2)}{\sigma (1) \sigma (2)-\rho (1,2)^2 \sigma (1) \sigma (2)} & 0 & 0 & 0 \\ \frac{\rho (1,2)}{\sigma (1) \sigma (2)-\rho (1,2)^2 \sigma (1) \sigma (2)} & \frac{1}{\left(\rho (1,2)^2-1\right) \sigma (2)^2} & 0 & 0 & 0 \\ 0 & 0 & -\frac{\rho (1,2)^2+1}{\left(\rho (1,2)^2-1\right)^2} & \frac{\rho (1,2)}{\sigma (1)-\rho (1,2)^2 \sigma (1)} & \frac{\rho (1,2)}{\sigma (2)-\rho (1,2)^2 \sigma (2)} \\ 0 & 0 & \frac{\rho (1,2)}{\sigma (1)-\rho (1,2)^2 \sigma (1)} & \frac{2-\rho (1,2)^2}{\left(\rho (1,2)^2-1\right) \sigma (1)^2} & \frac{\rho (1,2)^2}{\sigma (1) \sigma (2)-\rho (1,2)^2 \sigma (1) \sigma (2)} \\ 0 & 0 & \frac{\rho (1,2)}{\sigma (2)-\rho (1,2)^2 \sigma (2)} & \frac{\rho (1,2)^2}{\sigma (1) \sigma (2)-\rho (1,2)^2 \sigma (1) \sigma (2)} & \frac{2-\rho (1,2)^2}{\left(\rho (1,2)^2-1\right) \sigma (2)^2} \\ \end{array} \right)$$

This can be made a bit more compact with the following:

(* Compact output *)
compactRule = {μ[i_] :> ToExpression["μ" <> ToString[i]],
  σ[i_] :> ToExpression["σ" <> ToString[i]],
  ρ[i_, j_] :> ToExpression["ρ" <> ToString[i] <> ToString[j]]}
parameters /. compactRule
expectedHessian /. compactRule // MatrixForm

$$\left( \begin{array}{ccccc} \frac{1}{\left(\text{$\rho $12}^2-1\right) \text{$\sigma $1}^2} & \frac{\text{$\rho $12}}{\text{$\sigma $1} \text{$\sigma $2}-\text{$\rho $12}^2 \text{$\sigma $1} \text{$\sigma $2}} & 0 & 0 & 0 \\ \frac{\text{$\rho $12}}{\text{$\sigma $1} \text{$\sigma $2}-\text{$\rho $12}^2 \text{$\sigma $1} \text{$\sigma $2}} & \frac{1}{\left(\text{$\rho $12}^2-1\right) \text{$\sigma $2}^2} & 0 & 0 & 0 \\ 0 & 0 & -\frac{\text{$\rho $12}^2+1}{\left(\text{$\rho $12}^2-1\right)^2} & \frac{\text{$\rho $12}}{\text{$\sigma $1}-\text{$\rho $12}^2 \text{$\sigma $1}} & \frac{\text{$\rho $12}}{\text{$\sigma $2}-\text{$\rho $12}^2 \text{$\sigma $2}} \\ 0 & 0 & \frac{\text{$\rho $12}}{\text{$\sigma $1}-\text{$\rho $12}^2 \text{$\sigma $1}} & \frac{2-\text{$\rho $12}^2}{\left(\text{$\rho $12}^2-1\right) \text{$\sigma $1}^2} & \frac{\text{$\rho $12}^2}{\text{$\sigma $1} \text{$\sigma $2}-\text{$\rho $12}^2 \text{$\sigma $1} \text{$\sigma $2}} \\ 0 & 0 & \frac{\text{$\rho $12}}{\text{$\sigma $2}-\text{$\rho $12}^2 \text{$\sigma $2}} & \frac{\text{$\rho $12}^2}{\text{$\sigma $1} \text{$\sigma $2}-\text{$\rho $12}^2 \text{$\sigma $1} \text{$\sigma $2}} & \frac{2-\text{$\rho $12}^2}{\left(\text{$\rho $12}^2-1\right) \text{$\sigma $2}^2} \\ \end{array} \right)$$

Analytic expression for the Gaussian Fisher information matrix

2 Answers2