3

I woke up this morning with the following question on my mind:

What is (the distribution of) the number of draws you need to sample (with replacement) from an urn with $n$ distinct objects until you have seen every object at least once.

Calling this number $N$, it is quite easy to see that $\mathbb{E}[N]$ is of order $n\log n$. Moreover, simulations suggest that $$ \frac{N-n \log n}{n} $$ converges in distribution to some non-trivial, non-normal distribution $\mu$.

So my more precise question is: Has this observation been proven, and, if so, what is the distribution $\mu$?

Thanks.

MJD
  • 65,394
  • 39
  • 298
  • 580
Eckhard
  • 7,705

2 Answers2

2

This problem is known as the Coupon Collector's Problem. For instance, it is known that $E(N) = n \log n + \gamma n + \frac{1}{2} + o(1)$, where $\gamma$ is the Euler-Mascheroni constant $\gamma \approx 0.577$, and $\mathrm{Var}(N) \leq 2n^2$. Hence, the quantity $\frac{N - n \log n}{n}$ you are interested in, satisfies:

\begin{align} E\left(\frac{N - n \log n}{n}\right) &= \gamma + \frac{1}{2n} + o\left(\frac{1}{n}\right), \\ \mathrm{Var}\left(\frac{N - n \log n}{n}\right) &\leq 2. \end{align}

Several other results to bound the tails of the distribution are known; see the Wikipedia-page for more on that.

TMM
  • 9,976
1

The limiting distribution you refer to was determined by Erdős and Rényi in On a Classical Problem of Probability Theory (Magyar. Tud. Akad. Mat. Kutató Int. Kőzl. 6 ($1961$), $215$–$220$):

$$ \lim_{n\to\infty}\textsf{Pr}\left(\frac{N-n\log n}n\lt x\right)=\mathrm e^{-\mathrm e^{-x}}\;. $$

In addition to the two derivations given in the paper, I believe this can also be derived by applying the approximation

$$ \left\{{n \atop k}\right\} \sim \frac{\sqrt{n-k}}{\sqrt{n (1-G)}\ G^k\ (v-G)^{n-k}} \left(\frac{n-k}{e}\right)^{n-k} \left({n \atop k}\right)\;, $$

where $\left\{{n\atop k}\right\}$ is a Stirling number of the second kind, $v=\frac nk$ and $G=-W_0\left(-v\mathrm e^{-v}\right)$, with $W_0$ the main branch of the Lambert $W$ function (see Wikipedia and the references given there), to

$$ \textsf{Pr}(N\le m)=\frac{n!}{n^m}\left\{{m\atop n}\right\} $$

(see Probability distribution in the coupon collector's problem), but I haven't worked through all the details.

joriki
  • 238,052