1

I am trying to understand confidence interval CI, from simplest article I could find about it. I got to an extent and then stuck at crucial moment. Suppose if we decide confidence level we want is 95%, then

95% of all "95% Confidence Intervals" will include the true mean.

This is usually many people infer wrong (assuming a confidence interval says that there is 95% chance that true mean lies within it). I could avoid this trap by focusing on above highlighted sentence. However, when I dig step by step, I get caught, how it could be the case.

  1. Suppose I have a population distribution $Y$ with $\mu$ and $\sigma$. For brevity let it be already normal.
  2. I take 1st sample set of size $n$, denoted by $n1$, described by random variable $X_1 = {x_1,x_2,\cdots, x_{n1}}$, samples picked from population. I find mean $\overline{X_1} = \frac{x_1 + x_2 + \cdots + x_{n1}}{n1}$ and variance $S_1$. For a moment, lets say its normal.
  3. Similary 2nd sample set of size $n$, denoted by $n2$, described by random variable $X_2 = {x_1,x_2,\cdots, x_{n2}}$, samples picked from population. I find mean $\overline{X_2} = \frac{x_1 + x_2 + \cdots + x_{n2}}{n2}$ and variance $S_2$. Again we assume its normal.

I decide I want confidence level of 95%.

  1. If I transfer my population distribution to Z standard deviation, then 95% area occurs at $Z= \pm 1.96$. Since $Z = \dfrac {Y-\mu}{\sigma}$, in original population distribution, 95% data points fall within $Y = \mu \pm 1.96\sigma$. $$ \color{blue}{\Pr(\mu-1.96\sigma < Y < \mu+1.96\sigma) = 0.95} \tag{1} $$
  2. If I transfer my sample set n1 to Z standard (caz assuming its normal), again, 95% of $n1$ data points fall within $\overline{X_1} \pm 1.96S_1$ $$ \color{blue}{\Pr(\overline{X_1}-1.96S_1 < X_1 < \overline{X_1}+1.96S_1) = 0.95} \tag{2} $$
  3. If I transfer my sample set $n2$ to Z standard, again, 95% of $n2$ data points fall within $\overline{X_2} \pm 1.96S_2$ $$ \color{blue}{\Pr(\overline{X_2}-1.96S_2 < X_2 < \overline{X_2}+1.96S_2) = 0.95} \tag{3} $$
  4. Obviously, I would take many sample sets $n3,n4,n5, \cdots nk$ so my eventual sampling distribution of sample means, described by random variable $X$, would be normal, with mean $\overline{X} \rightarrow \mu$ and standard deviation, $S \rightarrow \dfrac{\sigma}{\sqrt{n}}$ $$ \color{blue}{\Pr(\overline{X}-1.96S < X < \overline{X}+1.96S = 0.95} \tag{4} $$ $$ \color{blue}{\Pr(\mu-1.96\dfrac{\sigma}{\sqrt{n}} < X < \mu+1.96\dfrac{\sigma}{\sqrt{n}}) = 0.95} \tag{5} $$

My questions:

  1. Each sample set $n_k$ has its own interval derived from its mean $\overline{X_k}$ and variance $S_k$. How come when I take many of them, suddenly we would say, 95% of all those individual confidence intervals will contain true population mean $\mu$? What is the missing link here?Below is my derivation, is it correct and can we say because of that, it is thus proved, 95% CIs will have $\mu$?

From eq. $5$,
$\Pr(\mu-1.96\dfrac{\sigma}{\sqrt{n}} < X < \mu+1.96\dfrac{\sigma}{\sqrt{n}}) = 0.95$

Adding $-\mu$ on both sides of inequalities,..
$\Pr(-\mu + \mu-1.96\dfrac{\sigma}{\sqrt{n}} < -\mu + X < -\mu + \mu+1.96\dfrac{\sigma}{\sqrt{n}}) = 0.95$
$\Pr(-1.96\dfrac{\sigma}{\sqrt{n}} < X - \mu < 1.96\dfrac{\sigma}{\sqrt{n}}) = 0.95$

Adding $-X$ on both sides of inequalities.. $\Pr(-X-1.96\dfrac{\sigma}{\sqrt{n}} < -X+X - \mu < -X+1.96\dfrac{\sigma}{\sqrt{n}}) = 0.95$
$\Pr(-X-1.96\dfrac{\sigma}{\sqrt{n}} < - \mu < -X+1.96\dfrac{\sigma}{\sqrt{n}}) = 0.95$

Multiplying by $-1$ on both sides of inequalities.. $\Pr(X+1.96\dfrac{\sigma}{\sqrt{n}} > \mu > X-1.96\dfrac{\sigma}{\sqrt{n}}) = 0.95$

This is same as,..
$$\color{blue}{ \Pr(X-1.96\dfrac{\sigma}{\sqrt{n}} < \mu < X+1.96\dfrac{\sigma}{\sqrt{n}}) = 0.95 \tag{6} } $$

Eq. $6$ simply means, when we take enormous no of samples to arrive at sampling distribution of sample means described by $X$, probability of $\mu$ within intervals $X \pm 1.96\dfrac{\sigma}{\sqrt{n}}$ is 95%.
Also, 95% of sample mean values $\overline{X_k}$ values fall within this same interval $X \pm 1.96\dfrac{\sigma}{\sqrt{n}}$.
Because of this can we also say, the 95% of CIs associated with $\overline{X_k}$ also fall within this same interval $X \pm 1.96\dfrac{\sigma}{\sqrt{n}}$?
I think am also approaching with a narrowing missing link. Kindly help here.

  1. Since there are many sample sets to be calculated to arrive at sampling distribution, do we divide by $n$ or $n-1$ (unbiased), for each sample set? (as they will influence CI calculation)

  2. What happens to above questions, when I do not have normal distribution to start with for population ? (Instead say, random or uniform or bernoulli). The eventual sampling distribution might be normal, but we are talking about few sample sets in the beginning for which we calculate confidence intervals for. I ask this, because intermediate Z transformation I said earlier would not be possible, as those sample sets may not have normal distribution.

  • It is absolutely wrong to say that $95%$ of the data points lie within a $95%$ confidence interval. In fact, there are examples in which none of them do. – Michael Hardy Aug 15 '18 at 18:24
  • But I did not say that. I said 95% of CIs will contain the population mean, not data points. Isn't that correct? – Parthiban Rajendran Aug 15 '18 at 18:27

4 Answers4

1

While the article you refer to correctly defines the concept of confidence interval (your highlighted text) it does not correctly treat the case of a normal distribution with unknown standard deviation. You may want to search "Neyman confidence interval" to see an approach that produces confidence intervals with the property you highlighted.

The Neyman procedure selects a region containing 95% of outcomes, for each true value of the parameter of interest. The confidence interval is the union of all parameter values for which the observation is within the selected region. The probability for the observation to be within the selected region for the true parameter value is 95%, and only for those observations, will the confidence interval contain the true value. Therefore the procedure guarantees the property you highlight.

If the standard deviation is known and not a function of the mean, the Neyman central confidence intervals turn out to be identical to those described in the article.


Thank you for the link to Neyman's book - interesting to read from the original source! You ask for a simple description, but that is what my second paragraph was meant to be. Perhaps a few examples will help illustrate: Example 1 and 1b could be considered trivial, whereas 2 would not be handled correctly by the article you refer to.

Example 1. Uniform random variable. Let X follow a uniform distribution, $$f(x)=1/2 {\mathrm{\ \ for\ \ }}\theta-1\le x\le \theta+1 $$ and zero otherwise. We can make a 100% confidence interval for $\theta$ by considering all possible outcomes $x$, given $\theta$, ie. $x \in [\theta-1,\theta+1]$. Now consider an observed value, $x_0$. The union of all possible values of $\theta$ for which $x_0$ is a possible outcome is $[x_0-1,x_0+1]$. That is the 100% confidence interval for $\theta$ for this problem.

Example 1b. Uniform random variable. Let X follow the same uniform distribution. We can make a 95% central confidence interval for $\theta$ by selecting the 95% central outcomes $x$, given $\theta$, ie. $x \in [\theta-0.95,\theta+0.95]$. Now consider an observed value, $x_0$. The union of all possible values of $\theta$ for which $x_0$ is within the selected range is $[x_0-0.95,x_0+0.95]$. That is the 95% confidence interval for $\theta$ for this problem.

Example 2. Uniform random variable. Let X follow a uniform distribution, $$f(x)=1/\theta {\mathrm{\ \ for\ \ }}{1\over2}\theta \le x \le {3\over2}\theta $$ and zero otherwise. We can make a 100% confidence interval for $\theta$ by considering all possible outcomes $x$, given $\theta$, ie. $x \in [{1\over2}\theta,{3\over2}\theta]$. Now consider an observed value, $x_0$. The union of all possible values of $\theta$ for which $x_0$ is a possible outcome is $[{2\over3}x_0,2x_0]$. That is the 100% confidence interval for $\theta$ for this problem. (You can confirm this by inserting the endpoints of the confidence interval into the pdf and see they are at the boundaries of the pdf). Note that the central confidence interval is not centered on the point estimate for $\theta$, $\hat\theta = x_0$.

Example 3. Normal distribution with mean $\theta$ and standard deviation $1$. The 68% central confidence interval would be constructed identically to example 1, that is the selected region for $X$ would be $[\theta-1,\theta+1]$. The 68% central confidence interval is therefore the same as in Example 1, $[x_0-1,x_0+1]$. You can extend this to 95% and arbitrary KNOWN standard deviation $\sigma$ to be $[x_0-1.96\sigma,x_0+1.96\sigma]$.

Example 4. Normal distribution with mean $\theta$ and standard deviation $\theta/2$. The 68% central confidence interval would be constructed identically to example 2. The 68% central confidence interval for $\theta$ is therefore the same as in Example 2, $[{2\over3}x_0,2x_0]$.

The authors of the article you refer to and the other commenters to your question would not get Example 2 or 4 right. Only following a procedure like Neyman's will the confidence interval have the property that you highlighted in your post. The other methods are approximations for the general problem of building confidence intervals.

The exact solution to the problem with a normal distribution and UNKNOWN standard deviation is more difficult to work out than the examples above.

Dean
  • 1,370
  • I tried here but its really too much math and heavy jargon, l may need to read substantial (CI is almost at section of book) or entire book to understand I do not have time for (I am taking 2 weeks just to understand basics of sampling distribution). No easier comprehension available? – Parthiban Rajendran Aug 14 '18 at 07:41
  • @PaariVendhan : I see: You wrote $\text{“}95%$ data points fall within$\ldots\text{''}$ etc. Speaking in that way of "data points" made me think you were talking about a confidence interval. It's not $95%$ of data points that are within the interval $\mu\pm1.96\sigma;$ rather, it is $95%$ of the population. – Michael Hardy Aug 15 '18 at 18:35
  • oh sorry about that, In eq (1), the data points meant that of population (if I transform original population distribution to Z). In eq(2) and (3), the data points meant samples from the population, assuming samples from sample sets were also normally distributed. – Parthiban Rajendran Aug 15 '18 at 18:44
  • @Dean Great update. I am examining your answer in depth. Meanwhile a small doubt. Example 1b, you meant 90% CI, not 95% right? Its not adding up. Area of $f(x)$ within $\theta$ bounds come up as 0.9. – Parthiban Rajendran Aug 16 '18 at 12:34
  • Right! I have corrected that above. – Dean Aug 16 '18 at 12:49
  • @Dean Practically speaking, are there any real life scenarios possible, where we know population variance, but not population mean? – Parthiban Rajendran Aug 17 '18 at 07:26
  • I had gone through your examples, and created a detailed step by step insights here. Can you kindly have a glance at least just at last 2 sections (CI for normal and sampling) and say if its ok and in line with what you tried to explain? I could have avoided documenting your initial trivial examples, but I started anyway and also first time documenting fully in latex and learning tikz, so was quite an exercise. – Parthiban Rajendran Aug 17 '18 at 08:37
  • @PaariVendhan: In answer to first question, yes, and is often the case in scientific measurement of physical parameters. The variance is due to the measurement apparatus, and that variance is well understood. The population mean corresponds to the true value of the physical parameter. [In this case the population being the distribution of infinite repetitions of the experiment.] – Dean Aug 17 '18 at 17:22
  • @PaariVendhan : Your write up outlines what appears to be a correct treatment for examples 1-4. You may also want to consider drawing what is called the confidence belt (on a plot of $\theta$ vs $x$ show the area that corresponds to the selected region for $x$ for each $\theta$). Then draw a vertical line at $x_0$ and the intersection with the belt is the confidence interval for $\theta$. For a sample drawn from a Normal distribution with unknown variance, I would recommend using the Student's t distribution (as mentioned in another persons response here), to be more general. – Dean Aug 17 '18 at 17:29
  • sure, already taken too long on this topic, but want to document all of my understanding as much as possible. Coming to t distribution, the constant for 95% CI will be 2.093 instead of 1.96? I am already exploring my statistical results and finding another disturbing inference. Will soon hit SE with that ;) – Parthiban Rajendran Aug 17 '18 at 17:46
  • @Dean I did not get the belt part fully. Is it possible to illustrate? even a rough diagram will do – Parthiban Rajendran Aug 17 '18 at 18:30
  • @Dean Can you also please have a look at my new question on CI here – Parthiban Rajendran Aug 18 '18 at 18:20
  • @PaariVendhan - The factor 2.093 would be appropriate only for a problem where the sample size is 20. (The number of degrees of freedom is 19). The Wikipedia page gives a table for the boundary of the 95% central confidence belt for other sample sizes. – Dean Aug 20 '18 at 18:19
  • @PaariVendhan - Regarding the confidence belt construction: Look at example 1b: Make a plot, vertical axis is $\theta$; horizontal axis is $x$. Draw a horizontal line at $\theta=0$ between $x=-0.95$ to $x=0.95$. At $\theta=1$ draw a horizontal line between $x=0.05$ and $x=1.95$. At $\theta=2$... Now draw two diagonal lines that connect the endpoints of those horizontal lines. Shade the region between those diagonal lines - that shaded region is the confidence belt. The intersection of the confidence belt with a vertical line located at $x=x_0$ is the confidence interval for $\theta$. – Dean Aug 20 '18 at 18:27
  • @Dean thanks Dean. Can you please check my new question also here. Statistically getting very weird results on confidence interval. – Parthiban Rajendran Aug 20 '18 at 18:29
1

Let me address your question item by item:

  1. If I transfer my population distribution to Z standard deviation, then 95% area occurs at $Z= \pm 1.96$. Since $Z = \dfrac {Y-\mu}{\sigma}$, in original population distribution, 95% data points fall within $Y = \mu \pm 1.96\sigma$. $$ \color{blue}{\Pr(\mu-1.96\sigma < Y < \mu+1.96\sigma) = 0.95} \tag{1} $$

This is correct.

  1. If I transfer my sample set $n_1$ to Z standard (caz assuming its normal), again, 95% of $n_1$ data points fall within $\overline{X_1} \pm 1.96S_1$ $$ \color{blue}{\Pr(\overline{X_1}-1.96S_1 < X_1 < \overline{X_1}+1.96S_1) = 0.95} \tag{2} $$

This is problematic because you have not defined the meaning of $X_1$. You have defined a sample mean $\bar X_1 = (Y_1 + \cdots + Y_{n_1})/n_1$, but it is not clear what you mean by $X_1$. Moreover, $\bar X_1$ and $S_1$ are statistics, and as such are random variables, not parameters.

A correct statement would be something like $$\Pr\left[\mu - 1.96 \frac{\sigma}{\sqrt{n_1}} < \bar X_1 < \mu + 1.96 \frac{\sigma}{\sqrt{n_1}}\right] = 0.95, \tag{2a}$$ where here we have used the fact that $$X_1 \sim \operatorname{Normal}(\mu, \sigma/\sqrt{n_1}),$$ being the sample mean of $n_1$ independent and identically distributed normal random variables with mean $\mu$ and standard deviation $\sigma$.

Another correct statement is $$\Pr\left[\bar X_1 - t^*_{n_1-1,0.975} \frac{S_1}{\sqrt{n_1}} < \mu < \bar X_1 + t^*_{n_1-1,0.975} \frac{S_1}{\sqrt{n_1}}\right] = 0.95, \tag{2b}$$ where $t^*_{n_1-1,0.975}$ is the critical value of the Student's $t$ distribution with $n_1 - 1$ degrees of freedom; i.e., it is the $97.5\%$ quantile satisfying $$\Pr[T \le t^*_{n_1-1,0.975}] = 0.975.$$ The first statement (2a) is about the two-sided probability of the sampling distribution. The second statement (2b) pertains to the coverage probability of a confidence interval constructed from the estimates of the mean and standard deviation of the data.

  1. If I transfer my sample set $n2$ to Z standard, again, 95% of $n2$ data points fall within $\overline{X_2} \pm 1.96S_2$ $$ \color{blue}{\Pr(\overline{X_2}-1.96S_2 < X_2 < \overline{X_2}+1.96S_2) = 0.95} \tag{3} $$

See above.

  1. Obviously, I would take many sample sets $n3,n4,n5, \cdots nk$ so my eventual sampling distribution has $\overline{X} \rightarrow \mu$ and $S \rightarrow \sigma$ $$ \color{blue}{\Pr(\overline{X}-1.96S < X < \overline{X}+1.96S) = 0.95} \tag{4} $$ $$ \color{blue}{\Pr(\mu-1.96\sigma < X < \mu+1.96\sigma) = 0.95} \tag{5} $$

Again, your notation is unclear because you have not precisely defined what you mean by $\bar X$, $X$, and $S$.

The rest of your questions should not be addressed until you have understood the meaning of, and difference between, equations (2a) and (2b) as I have written them, and after you have defined your notation in terms of the underlying population distribution $Y$.


In so far as inverting the test statistic to obtain a $100(1-\alpha)\%$ confidence interval, suppose the population standard deviation $\sigma$ is known. Then $$Z = \frac{\bar X - \mu}{\sigma/\sqrt{n}} \sim \operatorname{Normal}(0,1)$$ is a pivotal quantity. Consequently, $$\Pr\left[|Z| < z^*_{\alpha/2} \right] = 1 - \alpha,$$ where $z^*_{\alpha/2}$ is the critical value. Hence $$\begin{align*} 1 - \alpha &= \Pr\left[-z^*_{\alpha/2} < Z < z^*_{\alpha/2} \right] \\ &= \Pr\left[z^*_{\alpha/2} \frac{\sigma}{\sqrt{n}} > -\bar X + \mu > z^*_{\alpha/2} \frac{\sigma}{\sqrt{n}} \right] \\ &= \Pr\left[\bar X - z^*_{\alpha/2} \frac{\sigma}{\sqrt{n}} < \mu < \bar X +z^*_{\alpha/2} \frac{\sigma}{\sqrt{n}} \right].\end{align*}$$ What you wrote in your question is pretty much the same thing. When the data are known to be drawn from a normal distribution, then the pivotal quantity is exactly standard normal. If not, for sufficiently large $n$, it is asymptotically normal.

heropup
  • 135,869
  • thank you very much for detailed answer, I have updated now in my question, can you please check? $X_k$ is a random variable taking on sample set values, of $k$th sample set. $\overline{X_k}$ is just the mean of that sample set. I wanted to use $\overline{\widehat{X_k}}$, where hat denoting its a statistic random vairable, but for brevity as noted in another question, I left hat. Is my question clearer now? If so, my doubt in your eq.{2a} would be how $X_1$ could have normal distribution, because its a single statistic? – Parthiban Rajendran Aug 15 '18 at 07:13
  • As per my understanding, your eq. {2a} is my eq. {5} after updating an error in my equations. If so, we are only sure, 95% of our sample mean values (statistics), would fall within that range. How we could automatically also say, the 95% of CIs associated with each of those sample means also fall under 95% region?
  • – Parthiban Rajendran Aug 15 '18 at 07:38
  • Equation (2a) in my answer has nothing to do with the confidence interval. It is simply a statement about the sampling distribution. It is Equation (2b) that captures the meaning of "95% confidence." You must remember that a confidence interval is an estimate calculated from data, thus is random; the parameter $\mu$, is unknown but fixed. Statements such as Equation (2a) do not furnish an interval estimate because as you can see, the endpoints are functions of unknown parameters, not to one or more statistics. – heropup Aug 15 '18 at 08:34
  • yeah, I did not say Eq. $2a$ is or anyway a CI. I just said, (2a) says, that in sampling distribution of sample means, any random sample mean value $\overline{X}$ would fall within intervals $\mu \pm 1.96S$, 95% of the time. 1. Is that statement correct? This is direct inference because of nature of sampling distribution of sample means being a normal curve, with mean always nearing population mean due to CLT. 2. Is it possble to show how eq. {2b} is derived? – Parthiban Rajendran Aug 15 '18 at 10:17
  • When the population standard deviation is known, the critical value becomes a $z$-score, and the sample standard deviation can be replaced by $\sigma$, making the derivation of the CI a straightforward exercise of inverting the hypothesis test. However, in the case of unknown standard deviation, one would first want to show, for example, that the test statistic $$T = (\bar X - \mu)/(S/\sqrt{n}) \sim t_{n-1}$$ is Student $t$-distributed with $n-1$ degrees of freedom, which was the topic of the original paper by Gosset in which he elucidated this distribution. – heropup Aug 15 '18 at 14:39
  • yes, I first wanted to understand for known population parameters. In that case, can you kindly confirm my equation {6} is correct. If so, can you kindly answer the associated question in that 1st question in My questions? Though that sounds straight forward, I am getting blocked intuitively as explained in that 1st question. After that I could go to other questions. – Parthiban Rajendran Aug 15 '18 at 14:53
  • Can you also please have a look at my new question on CI here – Parthiban Rajendran Aug 18 '18 at 18:20