1

In an earlier post, I asked "Can Mathematica find an expression for the distribution of the median of N i.i.d. random variables?". JimB found the very neat solution:

pdf[n_, x_] := Piecewise[{{-((4 ((1 - 2 x) x)^(n/2)*Gamma[n] Hypergeometric2F1[1 - n/2, n/2, (2 + n)/2, x/(-1 + 2 x)])/((-1 + 2 x) Gamma[n/2]^2)), 0 < x < 1/2},{(2^(2 - n) n!)/((-1 + n) ((-1 + n/2)!)^2), x == 1/2},{(4 (-1 + (3 - 2 x) x)^(n/2) * Gamma[n]*Hypergeometric2F1[1 - n/2, n/2, (2 + n)/2, (-1 + x)/(-1 + 2 x)])/((-1 + 2 x) Gamma[n/2]^2), 1/2 < x < 1}}, 0]

Plotting this solution as $n$ increases shows, visually at least, that the distribution becomes increasingly narrower. Unfortunately, my simple attempt to ask for the limit as $n$ approaches infinity seems to be beyond Mathematica.

Any suggestions on how to find the limit or at least provide a good sense of what the limit is?

user120911
  • 2,655
  • 9
  • 18

1 Answers1

4

For large $n$, your distribution is very well approximated by a normal distribution with mean $\mu=\frac12$ and variance $\sigma^2=\langle(x-\mu)^2\rangle=\frac{n}{4(n+1)(n+2)}$:

P[n_] = NormalDistribution[1/2, 1/2 Sqrt[n/((n+1)(n+2))]];

From this distribution you can get the limiting behavior for $n\to\infty$: a Gaussian with mean $\mu=\frac12$ and variance $\sigma^2\approx\frac{1}{4n}$.

Check for $n=100$:

With[{n = 100},
  Plot[{pdf[n, x], PDF[P[n], x]}, {x, 0, 1}, PlotRange -> All]]

enter image description here

The formula for the variance was found by calculating the variance for $n=2\ldots10$ and then finding a formula with FindSequenceFunction:

Table[{n, Integrate[(x - 1/2)^2*pdf[n, x], {x, 0, 1}]}, {n, 2, 10}]
(*    {{2, 1/24}, {3, 3/80}, {4, 1/30}, {5, 5/168}, {6, 3/112},
       {7, 7/288}, {8, 1/45}, {9, 9/440}, {10, 5/264}}    *)

FindSequenceFunction[%, n] // FullSimplify
(*    n/(8 + 12 n + 4 n^2)    *)
Roman
  • 47,322
  • 2
  • 55
  • 121
  • Roman, can you please add a few comments about your rationale for variance? How did you arrive at exactly that formula that includes n as a variable? – user120911 Jul 25 '19 at 11:43
  • The rationale for using a Gaussian approximation is the Central Limit Theorem. This implies that we can characterize the distribution knowing only the mean and the variance (as $n\to\infty$). I calculated the variance (by exact integration) for $n$ up to 20, then used FindSequenceFunction to discover the formula for general $n$. – Roman Jul 25 '19 at 15:43
  • Roman, that sounds very interesting. I did not know that function. Can I ask you to show the details? – user120911 Jul 25 '19 at 16:01
  • 1
    I've added some details, let me know if you have further questions. – Roman Jul 25 '19 at 17:41