I am trying to understand confidence interval CI, from simplest article I could find about it. I got to an extent and then stuck at crucial moment. Suppose if we decide confidence level we want is 95%, then
95% of all "95% Confidence Intervals" will include the true mean.
This is usually many people infer wrong (assuming a confidence interval says that there is 95% chance that true mean lies within it). I could avoid this trap by focusing on above highlighted sentence. However, when I dig step by step, I get caught, how it could be the case.
- Suppose I have a population distribution $Y$ with $\mu$ and $\sigma$. For brevity let it be already normal.
- I take 1st sample set of size $n$, denoted by $n1$, described by random variable $X_1 = {x_1,x_2,\cdots, x_{n1}}$, samples picked from population. I find mean $\overline{X_1} = \frac{x_1 + x_2 + \cdots + x_{n1}}{n1}$ and variance $S_1$. For a moment, lets say its normal.
- Similary 2nd sample set of size $n$, denoted by $n2$, described by random variable $X_2 = {x_1,x_2,\cdots, x_{n2}}$, samples picked from population. I find mean $\overline{X_2} = \frac{x_1 + x_2 + \cdots + x_{n2}}{n2}$ and variance $S_2$. Again we assume its normal.
I decide I want confidence level of 95%.
- If I transfer my population distribution to Z standard deviation, then 95% area occurs at $Z= \pm 1.96$. Since $Z = \dfrac {Y-\mu}{\sigma}$, in original population distribution, 95% data points fall within $Y = \mu \pm 1.96\sigma$. $$ \color{blue}{\Pr(\mu-1.96\sigma < Y < \mu+1.96\sigma) = 0.95} \tag{1} $$
- If I transfer my sample set n1 to Z standard (caz assuming its normal), again, 95% of $n1$ data points fall within $\overline{X_1} \pm 1.96S_1$ $$ \color{blue}{\Pr(\overline{X_1}-1.96S_1 < X_1 < \overline{X_1}+1.96S_1) = 0.95} \tag{2} $$
- If I transfer my sample set $n2$ to Z standard, again, 95% of $n2$ data points fall within $\overline{X_2} \pm 1.96S_2$ $$ \color{blue}{\Pr(\overline{X_2}-1.96S_2 < X_2 < \overline{X_2}+1.96S_2) = 0.95} \tag{3} $$
- Obviously, I would take many sample sets $n3,n4,n5, \cdots nk$ so my eventual sampling distribution of sample means, described by random variable $X$, would be normal, with mean $\overline{X} \rightarrow \mu$ and standard deviation, $S \rightarrow \dfrac{\sigma}{\sqrt{n}}$ $$ \color{blue}{\Pr(\overline{X}-1.96S < X < \overline{X}+1.96S = 0.95} \tag{4} $$ $$ \color{blue}{\Pr(\mu-1.96\dfrac{\sigma}{\sqrt{n}} < X < \mu+1.96\dfrac{\sigma}{\sqrt{n}}) = 0.95} \tag{5} $$
My questions:
- Each sample set $n_k$ has its own interval derived from its mean $\overline{X_k}$ and variance $S_k$. How come when I take many of them, suddenly we would say, 95% of all those individual confidence intervals will contain true population mean $\mu$? What is the missing link here?Below is my derivation, is it correct and can we say because of that, it is thus proved, 95% CIs will have $\mu$?
From eq. $5$,
$\Pr(\mu-1.96\dfrac{\sigma}{\sqrt{n}} < X < \mu+1.96\dfrac{\sigma}{\sqrt{n}}) = 0.95$
Adding $-\mu$ on both sides of inequalities,..
$\Pr(-\mu + \mu-1.96\dfrac{\sigma}{\sqrt{n}} < -\mu + X < -\mu + \mu+1.96\dfrac{\sigma}{\sqrt{n}}) = 0.95$
$\Pr(-1.96\dfrac{\sigma}{\sqrt{n}} < X - \mu < 1.96\dfrac{\sigma}{\sqrt{n}}) = 0.95$
Adding $-X$ on both sides of inequalities..
$\Pr(-X-1.96\dfrac{\sigma}{\sqrt{n}} < -X+X - \mu < -X+1.96\dfrac{\sigma}{\sqrt{n}}) = 0.95$
$\Pr(-X-1.96\dfrac{\sigma}{\sqrt{n}} < - \mu < -X+1.96\dfrac{\sigma}{\sqrt{n}}) = 0.95$
Multiplying by $-1$ on both sides of inequalities.. $\Pr(X+1.96\dfrac{\sigma}{\sqrt{n}} > \mu > X-1.96\dfrac{\sigma}{\sqrt{n}}) = 0.95$
This is same as,..
$$\color{blue}{
\Pr(X-1.96\dfrac{\sigma}{\sqrt{n}} < \mu < X+1.96\dfrac{\sigma}{\sqrt{n}}) = 0.95 \tag{6}
}
$$
Eq. $6$ simply means, when we take enormous no of samples to arrive at sampling distribution of sample means described by $X$, probability of $\mu$ within intervals $X \pm 1.96\dfrac{\sigma}{\sqrt{n}}$ is 95%.
Also, 95% of sample mean values $\overline{X_k}$ values fall within this same interval $X \pm 1.96\dfrac{\sigma}{\sqrt{n}}$.
Because of this can we also say, the 95% of CIs associated with $\overline{X_k}$ also fall within this same interval $X \pm 1.96\dfrac{\sigma}{\sqrt{n}}$?
I think am also approaching with a narrowing missing link. Kindly help here.
Since there are many sample sets to be calculated to arrive at sampling distribution, do we divide by $n$ or $n-1$ (unbiased), for each sample set? (as they will influence CI calculation)
What happens to above questions, when I do not have normal distribution to start with for population ? (Instead say, random or uniform or bernoulli). The eventual sampling distribution might be normal, but we are talking about few sample sets in the beginning for which we calculate confidence intervals for. I ask this, because intermediate Z transformation I said earlier would not be possible, as those sample sets may not have normal distribution.