7

What is the probability of getting more than $ \frac { 3 N } 4 $ heads in $ N $ flips of coins?

I know we need to use binomial distribution formula for this and sum it from $ N = \frac { 3 N } 4 $ to $ N $.

I can solve this when numbers are given but I'm struggling to solve because a general case is given. Any help appreciated.

r.e.s.
  • 14,371
  • 1
    For large N you can use the normal approximation – Godfather Sep 09 '21 at 06:30
  • Do you need the exact value, or just lower and upper bounds (how tight is enough?), or do you just want the limit as $N \to \infty$? For a lower bound, the probability mass function at the first point of the tail is surprisingly good. If not good enough, include some more terms. – Jukka Kohonen Sep 09 '21 at 07:39
  • Normal approximation works @Henry. What results do you get for n=200? – Godfather Sep 09 '21 at 14:45
  • 1
    n <- 200; pbinom(1/4*n,n,1/2); pnorm(1/4*n+1/2,n/2,sqrt(n/4)) gives me 4.19651e-13 and 1.276554e-12 in R, a factor of $3$ different. Use n <- 1000 and the factor is over $300$ – Henry Sep 09 '21 at 16:08

3 Answers3

5

Approximating the Tail

Using Stirling's Approximation, we get $$ \frac1{2^n}\binom{n}{\left\lceil\frac{3n}{4}\right\rceil}\sim\left(\frac13\right)^{\frac{n\bmod4}4}\frac{(16/27)^{n/4}}{\sqrt{3\pi n/8}}\tag1 $$ Since $\frac{\binom{n}{k+1}}{\binom{n}{k}}=\frac{n-k}{k+1}\sim\frac13$ for $k\sim\frac{3n}4$, the sum of the tail will be asymptotic to a geometric series with ratio $\frac13$: that is, the sum is asymptotic to $\frac32=\frac1{1-\frac13}$ of the first term. Thus, $$ \begin{align} \frac1{2^n}\sum_{k=\lceil3n/4\rceil}^n\binom{n}{k} &\sim\frac32\left(\frac13\right)^{\frac{n\bmod4}4}\frac{(16/27)^{n/4}}{\sqrt{3\pi n/8}}\tag{2a}\\[3pt] &=\bbox[5px,border:2px solid #C0A000]{\left(\frac13\right)^{\frac{n\bmod4}4}\frac{(16/27)^{n/4}}{\sqrt{\pi n/6}}}\tag{2b} \end{align} $$ Here are some comparisons $$ \begin{array}{r|l|l|l} n\ \ &\quad\frac1{2^n}\sum\limits_{k=\left\lceil\frac{3n}4\right\rceil}^n\binom{n}{k}&\left(\frac13\right)^{\frac{n\bmod4}4}\frac{(16/27)^{n/4}}{\sqrt{\pi n/6}}&\quad\Delta\\\hline 100&2.818141\times10^{-7}&2.880091\times10^{-7}&2.198\%\\ 200&4.196510\times10^{-13}&4.244208\times10^{-13}&1.136\%\\ 300&7.167016\times10^{-19}&7.221983\times10^{-19}&0.766\%\\ 400&1.295943\times10^{-24}&1.303444\times10^{-24}&0.579\%\\ 500&2.418405\times10^{-30}&2.429646\times10^{-30}&0.465\%\\ 1000&6.738128\times10^{-59}&6.753911\times10^{-59}&0.234\%\\ 1001&4.487638\times10^{-59}&4.500358\times10^{-59}&0.283\%\\ 1002&2.987810\times10^{-59}&2.998741\times10^{-59}&0.366\%\\ 1003&1.988589\times10^{-59}&1.998164\times10^{-59}&0.481\% \end{array} $$


Approximating with the Normal Distribution

As shown in this answer $$ \frac1{\sqrt{2\pi}}\int_x^\infty e^{-t^2/2}\,\mathrm{d}t \sim\frac1{\sqrt{2\pi}\,x}\,e^{-x^2/2}\tag3 $$ which adjusted for a standard deviation of $\sqrt{n/4}$ gives $$ \frac1{\sqrt{\pi n/2}}\int_x^\infty e^{-2t^2/n}\,\mathrm{d}t \sim\frac{\sqrt{n/4}}{\sqrt{2\pi}\,x}\,e^{-2x^2/n}\tag4 $$ Using $(4)$ to estimate the probability of being at least $n/4$ above the mean gives $$ \frac1{\sqrt{\pi n/2}}\,e^{-n/8}\tag5 $$ Since $\frac{16}{27}\lt e^{-1/2}$, as $n\to\infty$, the binomial tail sum from $(2)$ decays exponentially faster than the tail of the normal distribution from $(5)$ does.

The Central Limit Theorem says that $$ \lim_{n\to\infty}\sup_{\lambda\in\mathbb{R}}\left[P\!\left(\sqrt{n}\left(\bar{X}_n-\mu\right)\gt\lambda\sigma\right)-\frac1{\sqrt{2\pi}}\int_\lambda^\infty e^{-t^2/2}\,\mathrm{d}t\right]=0\tag6 $$ The Berry-Esseen Theorem gives a bound on how the limit in $(6)$ tends to $0$: $$ \sup_{\lambda\in\mathbb{R}}\left|\,P\!\left(\sqrt{n}\left(\bar{X}_n-\mu\right)\gt\lambda\sigma\right)-\frac1{\sqrt{2\pi}}\int_\lambda^\infty e^{-t^2/2}\,\mathrm{d}t\,\right|\le\frac12\frac\rho{\sigma^3\sqrt{n}}\tag7 $$ where $\rho=\frac18$ and $\sigma=\frac12$ for fair coin flipping (the constant $\frac12$ can actually be improved, but we'll use it for simplicity).

However, the bound in $(7)$ is far from being sharp enough to compare probabilities as small as those in $(2)$ and $(5)$. These theorems are better for approximating tails above a fixed number of standard deviations. In the case given, the number of standard deviations above the mean is $\sqrt{n/4}$, not a fixed number.


Details About $\bf{(1)}$

If $n\bmod4=0$, then Stirling's Approximation says $$ \begin{align} \frac1{2^n}\binom{n}{\left\lceil\frac{3n}{4}\right\rceil} &=\frac1{2^n}\binom{n}{\frac{3n}{4}}\tag{8a}\\[6pt] &=\frac1{2^n}\frac{n!}{\frac{3n}4!\,\frac{n}4!}\tag{8b}\\ &\sim\frac1{2^n}\frac{\sqrt{2\pi n}\frac{n^n}{e^n}}{\sqrt{3\pi n/2}\frac{(3n/4)^{3n/4}}{e^{3n/4}}\,\sqrt{\pi n/2}\frac{(n/4)^{n/4}}{e^{n/4}}}\tag{8c}\\ &=\frac1{2^n}\frac1{\sqrt{3\pi n/8}\,(3/4)^{3n/4}\,(1/4)^{n/4}}\tag{8d}\\[6pt] &=\frac{(16/27)^{n/4}}{\sqrt{3\pi n/8}}\tag{8e} \end{align} $$ Explanation:
$\text{(8a)}$: $\frac{3n}4\in\mathbb{Z}$
$\text{(8b)}$: $\binom{n}{k}=\frac{n!}{k!\,(n-k)!}$
$\text{(8c)}$: apply Stirling's Approximation three times
$\text{(8d)}$: cancel
$\text{(8e)}$: simplify

Furthermore, $$ \begin{align} \frac1{2^{n+1}}\binom{n+1}{\left\lceil\vphantom{\frac{3n}4}\right.\!\frac{3(n+1)}4\left.\vphantom{\frac{3n}4}\right\rceil} &=\frac1{2^{n+1}}\binom{n+1}{\frac{3n}4+1}\tag{9a}\\ &=\frac12\frac{n+1}{\frac{3n}4+1}\frac1{2^n}\binom{n}{\frac{3n}4}\tag{9b}\\ &\sim\frac23\frac{(16/27)^{n/4}}{\sqrt{3\pi n/8}}\tag{9c}\\ &\sim\left(\frac13\right)^{1/4}\frac{(16/27)^{(n+1)/4}}{\sqrt{3\pi(n+1)/8}}\tag{9d} \end{align} $$ Explanation:
$\text{(9a)}$: $\left\lceil\vphantom{\frac{3n}4}\right.\!\frac{3(n+1)}4\left.\vphantom{\frac{3n}4}\right\rceil=\frac{3n}4+1$
$\text{(9b)}$: $\binom{n+1}{k+1}=\frac{n+1}{k+1}\binom{n}{k}$
$\text{(9c)}$: $\frac12\frac{n+1}{\frac{3n}4+1}\sim\frac23$ and apply $(8)$
$\text{(9d)}$: $\sqrt{n+1}\sim\sqrt{n}$

and $$ \begin{align} \frac1{2^{n+2}}\binom{n+2}{\left\lceil\vphantom{\frac{3n}4}\right.\!\frac{3(n+2)}4\left.\vphantom{\frac{3n}4}\right\rceil} &=\frac1{2^{n+2}}\binom{n+2}{\frac{3n}4+2}\tag{10a}\\ &=\frac12\frac{n+2}{\frac{3n}4+2}\frac12\frac{n+1}{\frac{3n}4+1}\frac1{2^n}\binom{n}{\frac{3n}4}\tag{10b}\\ &\sim\frac49\frac{(16/27)^{n/4}}{\sqrt{3\pi n/8}}\tag{10c}\\ &\sim\left(\frac13\right)^{2/4}\frac{(16/27)^{(n+2)/4}}{\sqrt{3\pi(n+2)/8}}\tag{10d} \end{align} $$ Explanation:
$\text{(10a)}$: $\left\lceil\vphantom{\frac{3n}4}\right.\!\frac{3(n+2)}4\left.\vphantom{\frac{3n}4}\right\rceil=\frac{3n}4+2$
$\text{(10b)}$: $\binom{n+2}{k+2}=\frac{n+2}{k+2}\frac{n+1}{k+1}\binom{n}{k}$
$\text{(10c)}$: $\frac12\frac{n+2}{\frac{3n}4+2}\frac12\frac{n+1}{\frac{3n}4+1}\sim\frac49$ and apply $(8)$
$\text{(10d)}$: $\sqrt{n+2}\sim\sqrt{n}$

and $$ \begin{align} \frac1{2^{n+3}}\binom{n+3}{\left\lceil\vphantom{\frac{3n}4}\right.\!\frac{3(n+3)}4\left.\vphantom{\frac{3n}4}\right\rceil} &=\frac1{2^{n+3}}\binom{n+3}{\frac{3n}4+3}\tag{11a}\\ &=\frac12\frac{n+3}{\frac{3n}4+3}\frac12\frac{n+2}{\frac{3n}4+2}\frac12\frac{n+1}{\frac{3n}4+1}\frac1{2^n}\binom{n}{\frac{3n}4}\tag{11b}\\ &\sim\frac8{27}\frac{(16/27)^{n/4}}{\sqrt{3\pi n/8}}\tag{11c}\\ &\sim\left(\frac13\right)^{3/4}\frac{(16/27)^{(n+3)/4}}{\sqrt{3\pi(n+3)/8}}\tag{11d} \end{align} $$ Explanation:
$\text{(11a)}$: $\left\lceil\vphantom{\frac{3n}4}\right.\!\frac{3(n+3)}4\left.\vphantom{\frac{3n}4}\right\rceil=\frac{3n}4+3$
$\text{(11b)}$: $\binom{n+3}{k+3}=\frac{n+3}{k+3}\frac{n+2}{k+2}\frac{n+1}{k+1}\binom{n}{k}$
$\text{(11c)}$: $\frac12\frac{n+3}{\frac{3n}4+3}\frac12\frac{n+2}{\frac{3n}4+2}\frac12\frac{n+1}{\frac{3n}4+1}\sim\frac8{27}$ and apply $(8)$
$\text{(11d)}$: $\sqrt{n+3}\sim\sqrt{n}$

Putting together $\overset{\underbrace{0\bmod4}}{(8)}$, $\overset{\underbrace{1\bmod4}}{(9)}$, $\overset{\underbrace{2\bmod4}}{(10)}$, and $\overset{\underbrace{3\bmod4}}{(11)}$, we get $(1)$.

robjohn
  • 345,667
  • Note that $\frac14\log(16/27)=-0.1308\lesssim-\frac18$, which says that $(2)$ agrees with Jack D'Aurizio's bound of $e^{-n/8}$. – robjohn Sep 10 '21 at 14:52
2

Consider each coin toss as the outcome for a random variable $X_k$ which is equally distributed over $\{0,1\}$. Assuming that $X_1,\ldots,X_n$ are independent, we are looking for $$\mathbb{P}[S_n=X_1+X_2+\ldots+X_n > 3n/4] $$ where the central limit theorem ensures the convergence of $S_n$ to a normal variable with mean $n/2$ and variance $n/4$. For any moderately large $n$ ($n\geq 5$) the previous probability is very well approximated (see the Berry-Esseen theorem for understanding how well) by

$$ \int_{3n/4}^{+\infty}\sqrt{\frac{2}{\pi n}} e^{-\frac{2(x-n/2)^2}n^2}\,dx =\frac{1}{2}\,\operatorname{Erfc}\left(\sqrt{\frac{n}{8}}\right)\approx e^{-n/8}\sqrt{\frac{2}{\pi n}},$$ due to the continued fraction representation for the complementary error function.
In a weak sense (see the first comment) we have

$$ \mathbb{P}[S_n > n/2] \approx \frac{e^{-n/8}}{2\sqrt{\pi}}\sqrt{\frac{8}{n}} = e^{-n/8}\sqrt{\frac{2}{\pi n}}.$$

There is an arithmetic perturbation given by the residue class $n\pmod{4}$, such that the ratio between $\mathbb{P}[S_n > 3n/4]$ and the RHS does not converge as $n\to +\infty$. In any case Hoeffding's inequality ensures

$$\mathbb{P}\left[S_n \geq \frac{3n}{4}\right]\leq e^{-n/8}.$$

Jack D'Aurizio
  • 353,855
  • With $n=1000$ this approximation gives $1.30 \cdot 10^{-56}$ while direct summation of the binomial probabilities gives $6.74 \cdot 10^{-59}$. The absolute error is of course minuscule, but the relative error is not: the approximation overestimates the tail probability by almost 200-fold. – Jukka Kohonen Sep 09 '21 at 14:47
  • @JukkaKohonen: that's because I made a little mess with the constants, I will fix this later. On the other hand the exponential decay is certain, by Hoeffding's inequality. – Jack D'Aurizio Sep 09 '21 at 15:05
  • 1
    (+1) when $x\sim\sigma^2$ (both here are $n/4$), the density of the Normal Distribution is about the same as the tail above, so exactly how much we offset the Normal CDF to approximate the Binomial CDF can have a large percentage effect on how close the approximation is. – robjohn Sep 10 '21 at 14:48
  • The normal distribution does not approximate too well this far out in the tail. Actually, the ratio between $\mathbb{P}[S_n\gt3n/4]$ and $e^{-n/8}\sqrt{\frac2{\pi n}}$ tends to $0$ according to $(2)$ from my answer. Furthermore, the cyclic variation mod $4$ is shown there. – robjohn Sep 19 '21 at 18:12
  • @robjohn, you are right and I think this is exactly the point that Henry was trying to make already in his first Sep 9 comment: that despite the folklore that "normal approximation is very good", it is not good everywhere in all respects; even this "far" in the tails its relative performance is (surprisingly?) bad. – Jukka Kohonen Oct 06 '21 at 19:02
0

This isn't an answer, simply a long comment. If you don't care about polynomial terms (and you shouldn't when they are overwhelmed with exponential terms) you can use Stirling's approximation formula to get that for large $n\choose k$ looks likes $\frac{n^n}{k^k(n-k)^{n-k}}=2^{nh_2(k/n)}$ with $h_2(p)=-p\log_2(p)-(1-p)\log_2(1-p)$. There is a formal definition of "looks like" which I won't go over, this is only for intuition and not a proof. Observe that on $[1/2,1]$, $h_2$ is decreasing and so for $3/4n\leq k\leq n$, $h_2(k/n)\leq h_2(3/4)$. Writing the sum you get \begin{align*} S&=\sum_{k=3/4n}^n{n\choose k}\frac{1}{2^n}\\ &\sim\sum_{k=3/4n}^n2^{n(h_2(k/n)-1)}\\ &\leq 2^{n(h_2(3/4)-1)}\cdot \frac{n}{4}\\ &\sim 2^{n(h_2(3/4)-1)} \end{align*} Now as $h_{2}(3/4)<1$, we get some exponential decrease.

P. Quinton
  • 6,031