1

Let $X_1, X_2, ..., X_n$ be normally distributed with mean $\mu$ and variance $\sigma ^2$, then $\frac{(n-1) S^2}{ \sigma^2}$ has a Chi-Square distribution with $n-1$ degrees of freedom.

How come it's chi-square distributed?

Attempt:

$S^2 = \frac{1}{n-1} \sum(X_i-\overline X)^2 = \frac{1}{n-1}(\sum X_i ^2 - (\sum X_i)^2)$.

Here, the first term $\sum X_i^2$ is chi-squared. But you also need to subtract the second term (which is normal distributed with mean $\mu$, and variance $\frac{\sigma^2}{n}$. How does it make it chi-square distributed?

Dave
  • 13,568
kou
  • 159
  • There is a trick orthogonal linear change of variables that exhibits $S^2$ as a sum of $n-1$ squared standard normals. – kimchi lover Aug 06 '17 at 22:10
  • See https://stats.stackexchange.com/questions/121662/why-is-the-sampling-distribution-of-variance-a-chi-squared-distribution – Wraith1995 Aug 06 '17 at 22:15

1 Answers1

3

The first proof people generally encounter involves MGFs. Note,

$$ \frac{(n-1)S^2}{\sigma^2} = \sum\limits_i^n\left(\frac{X_i-\bar{X}}{\sigma}\right)^2 = \sum\limits_i^n\left(\frac{X_i-\mu}{\sigma}\right)^2 - \left(\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\right)^2$$

Observe that the first term on the RHS is a chi-square RV with $n$ degrees of freedom. The second term is a chi-square with $1$ degree of freedom. Rearranging,

$$ \frac{(n-1)S^2}{\sigma^2} + \left(\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\right)^2 = \sum\limits_i^n\left(\frac{X_i-\mu}{\sigma}\right)^2 $$

Now we can find the MGFs both sides,

$$ (1-2t)^{-1/2}M(t) = (1-2t)^{-n/2} \Longrightarrow M(t) = (1-2t)^{-n/2+1/2} = (1-2t)^{-1/2(n-1)}$$

where $M(t)$ is the MGF of $\frac{(n-1)S^2}{\sigma^2}$. Thus, the MGF of $\frac{(n-1)S^2}{\sigma^2}$ is that of a chi-square RV with $n-1$ degrees of freedom.

Note: The above used the identity that $M_{X+Y}(t) = M_X(t)M_Y(t)$ where $X$ and $Y$ are independent. This detail was skipped, but one also needs to show that $S^2$ and $\bar{X}$ are independent which is true in the case of a Normally distributed sample.

  • thanks for your answer. May I ask how did you get from here to here? $$\sum\limits_i^n\left(\frac{X_i-\bar{X}}{\sigma}\right)^2 = \sum\limits_i^n\left(\frac{X_i-\mu}{\sigma}\right)^2 - \left(\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\right)^2$$ – kou Aug 06 '17 at 23:54
  • Ignoring the $\sigma^2$, note that $(X_i - \bar{X})^2 = (X_i - \mu + \mu - \bar{X})^2 = (X_i - \mu)^2 - 2(X_i - \mu)(\mu - \bar{X}) + (\bar{X}-\mu)^2$. Taking sums results in $\sum\limits_i^n(X_i - \bar{X})^2 = \sum\limits_i^n(X_i - \mu)^2 - 2n(\bar{X} -\mu)^2 + n(\bar{X}-\mu)^2$. –  Aug 07 '17 at 00:09
  • $$ \frac{(n-1)S^2}{\sigma^2} + \left(\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\right)^2 $$ Your argument is at best incomplete if it does not mention that these two terms are independent. And their independence is not instantly obvious; that should be explained as well. A substantial step in showing that is to show that $\operatorname{cov}\big( ,\overline X, X_i-\overline X,\big) = 0. \qquad$ – Michael Hardy Nov 29 '19 at 18:07