Why does Law of Large Numbers work?

Question

Often I see books and professors reasoning that, in order to make a good experiment, many measurements are necessary because then the average value of a quantity is closer to the expected value because of the Law of Large Numbers.

But the actual law (either the weak or the strong version) only gives the limit as $n\to\infty$ where $n$ is the number of measurements. In reality we only deal with finite amounts, so what can we say about this case?

Also, I read through this post, but it fails to answer my question. To paraphrase Littlewood, what can we say about the rate of convergence?

Remember that $n\to\infty$ tells you about how $n$ behaves for finite $n$'s, it states that for a sufficiently large $n$ the behavior indicated dominates. This means that your final question is the important one. — Taemyr, Oct 13 '14 at 09:57
@Taemyr Well, indeed, but if I make $n_0$ measurements in the lab, how can I tell if $n_0$ is 'sufficiently large'? I believe answers here have tackled this successfully. — Minethlos, Oct 13 '14 at 13:59

score 6 · Accepted Answer · edited Apr 13 '17 at 12:40

This is the exact reason why we do statistical hypothesis confidence testing. In essence, the confidence interval we get from this test is a quantitative measure of "how far we've converged". For example, consider an experiment to test whether a coin is imbalanced or not. Our null hypothesis is that it is not: in symbols, our hypothesis is ${\rm Pr}(H) = \frac{1}{2}$: the probability of a head is a half.

Now we work out from the binomial distribution the limits on the number of heads you will see in an experiment with $N$ tosses, given the null hypothesis and check whether the observed number falls within it. The interval wherein the observed number of heads falls with a probability of, say 0.999, is then calculated: for small $N$, you'll need to calculate this brute force with the binomial distribution. As $N$ gets bigger, we use Stirling's approximation to the factorial, which shows that the binomial distribution becomes the normal distribution. Your 0.999 confidence interval, as a proportion of $N$, gets smaller and smaller as $N\to\infty$, and these calculations are exactly what you use to see how fast it does so.

I like to call the law of large numbers the "law of pointier and pointier distributions" because this aspect of the convergence shows us why a weak form of the second law of thermodynamics is true, as I discuss in the linked answer: the law of large numbers says that in the large number limit, there are samples that look almost like the maximum likelihood sample, and almost nothing else. In different words: there are microstates which look almost exactly the same as the maximum entropy ones, and almost nothing else. Therefore, almost certainly, a system will be found near its maximum entropy microstate, and, if by chance a system is found at one of the seldom, significantly-less-than-maximum entropy states, it will almost certainly progress towards the maximum entropy microstate, just from a "random walk".

Your remark about hypothesis testing answers my main question about convergence rates. But the rest, it seems, has nothing to do with LLN. LLN does say that the distributions get pointier eventually, but the fact that the binomial and Maxwell-Boltzmann distributions have this property for all $n$ is a property of the respective distributions. And this property doesn't follow from LLN, as far as I know! — Minethlos, Oct 13 '14 at 08:22
@Minethlos I disagree. The distributions behave in this way from very general considerations: the other answer gives you some general inequalities. Another thing worth citing (if you don't already know of it) is the mechanisms used to prove various forms of the central limit theorem: a simple one is to look at the raising of the characteristic function to a higher and higher powers (corresponding to the convolution of a pdf with itself). This too gives you bounds (grounded on the normal dist) for the proportional spread of many .... — Selene Routley, Oct 13 '14 at 09:48
Alright, it is a general behaviour. My point was that it doesn't follow from LLN. I wasn't aware of the CLT proof, and will read up on it. — Minethlos, Oct 13 '14 at 15:01

Jens · Answer 2 · 2014-10-13T09:21:18.430

3

In general, we can say nothing about finite $n$, but most of the time, we can safely assume some "niceness" of the distributions in question.

If, for example, we assume a finite Variance $\sigma^2$ (a quite common feature), we could use Chebyshev's inequality for a rough error estimation of the form

$$P(|\bar{X_n} - µ| > \alpha) \leq \frac{\sigma^2}{\alpha^2n}. $$

Stronger (but still reasonable) assumptions lead to stronger inequalities, see e.g. Cramér theorem (the second one).

edited Oct 13 '14 at 09:21

answered Oct 13 '14 at 06:23

Jens

191
7

Nice! I didn't know about this Chebyshev's inequality. – Minethlos Oct 13 '14 at 08:46

nabla · Answer 3 · 2014-10-13T10:03:36.700

What you can say is that you have a distribution of the results you are going to get (be it a discrete or continuous random variable), and when you calculate the average of a large sample, you are adding the random variables and multiplying by a constant. The addition of random variables translates into a convolution of the probability density functions, which when $n \rightarrow \infty$ will converge into a normal random variable (that is, somehow, a way to prove the LLN, although you could call it overkill). And for slightly stronger hypothesis than those in the central limit theorem, you have the Berry-Esseen theorem, which gives you a convergence rate of $n^{-1/2}$ to the Normal distribution using the Kolmogorov-Smirnov distance (sup norm).

In any case, if you want the exact "confidence" in a particular case, your only option is to convolve $n$ times the particular distribution you are using, and get the confidence margins.

Why does Law of Large Numbers work?

3 Answers3