2

It is given that the coupon collector finishes his collection on average at time $\mathbb{E}(C) = rH_r$, where $H_r = \sum_{i=1}^n \frac{1}{i}$. Let's say that the coupon collector gives all of his duplicate coupons to his little sibling. When the coupon collector finishes, how far is the little sibling from finishing their collection?

It is given that the little sibling is $H_r$ away from finishing their collection in Flajolet. In another paper, I read that the expected number of duplicates for each coupon is given by $P_2^r = \sum_{i=1}^r \frac{i-1}{i}$, which would then give us our answer since $r - \sum_{i=1}^r \frac{i-1}{i}= H_r$, but I don't know how to go about proving that. If anyone could explain where $\sum_{i=1}^r \frac{i-1}{i}$ comes from, that would help immensely in solving this problem.

Edit: Since I believe my friends and I have a constructive proof, if someone could provide a proof using EGF's that would also be cool.

User203940
  • 2,473

2 Answers2

1

This happens to be very similar to a problem that has recently appeared here, namely at this MSE link. We ask the reader to consult this link for an introduction and additional background material.

We start with the species of ordered set partitions with sets of more than two elements marked. This is

$$\mathfrak{S}(\mathcal{U}\mathcal{Z} +\mathcal{U}\mathcal{V}\mathfrak{P}_{\ge 2}(\mathcal{Z})).$$

We thus obtain the generating function

$$G(z, u, v) = \frac{1}{1-u(v\exp(z)-vz+z-1)}.$$

We then get for the probability that

$$P[T=m] = \frac{1}{n^m} {n\choose n-1} (m-1)! [z^{m-1}] [u^{n-1}] G(z, u, v).$$

What happens here is very simple. We choose the $n-1$ coupons that go into the prefix consisting of $m-1$ draws. Then we partition those draws into sets, one for each type of coupon, containing the position where it appeared. We mark sets of more than two elements. Doing the extraction in $u$ we find

$$P[T=m] = \frac{1}{n^m} {n\choose n-1} (m-1)! [z^{m-1}] (v\exp(z)-vz+z-1)^{n-1}.$$

Now to do the usual sanity check that we have a probability distribution we remove the marking in $v$ and obtain

$$\sum_{m\ge 1} P[T=m] = \sum_{m\ge 1} \frac{n!}{n^m} (m-1)! [z^{m-1}] \frac{(\exp(z)-1)^{n-1}}{(n-1)!}.$$

This was evaluated at the cited link and the sanity check goes through, more or less by inspection in fact. Continuing with the expectation of coupons that were drawn mor than once we differentiate with respect to $v$ and set $v=1$, getting

$$\frac{n! \times (m-1)!}{n^m} [z^{m-1}] (n-1) \left.\frac{(v\exp(z)-vz+z-1)^{n-2}}{(n-1)!} \times (\exp(z)-z)\right|_{v=1} \\ = \frac{n! \times (m-1)!}{n^m} [z^{m-1}] \frac{(\exp(z)-1)^{n-2}}{(n-2)!} \times (\exp(z)-z).$$

We write this in three pieces, namely

$$\frac{n! \times (m-1)!}{n^m} [z^{m-1}] \frac{(\exp(z)-1)^{n-1}}{(n-2)!} \\ - \frac{n! \times (m-1)!}{n^m} [z^{m-2}] \frac{(\exp(z)-1)^{n-2}}{(n-2)!} \\ + \frac{n! \times (m-1)!}{n^m} [z^{m-1}] \frac{(\exp(z)-1)^{n-2}}{(n-2)!}.$$

Consulting the results from the main link we find for the first two pieces

$$n-1 - (H_n - 1) = n - H_n.$$

We then get for the third piece (recognizing the Stirling number EGF and observing that the EGF morphs into an OGF)

$$\frac{n!}{n} \sum_{m\ge 1} \frac{1}{n^{m-1}} [z^{m-1}] \prod_{q=1}^{n-2} \frac{z}{1-qz} = \frac{n!}{n} \prod_{q=1}^{n-2} \frac{1/n}{1-q/n} \\ = \frac{n!}{n} \prod_{q=1}^{n-2} \frac{1}{n-q} = \frac{n!}{n} \frac{1}{(n-1)!} = 1.$$

We thus have for the answer that the sibling collects $n+1-H_n$ coupons and hence is missing $H_n-1$ coupons probabilistically from among the coupons collected in the prefix. Furthermore and deterministically, the sibling never sees the last coupon collected because it is always a singleton. Hence the sibling is missing

$$\bbox[5px,border:2px solid #00A000]{H_n}$$

coupons. We may add the halting singleton because it does not involve any additional probability and is determined by the set partition of the prefix.

What have we learned? On seeing this result it immediately becomes evident that these two parameters (singletons and duplicates) are prefectly additive on the level of generating functions and we could have concluded by inspection, citing the result for singletons from the link without any extra calculation.

Marko Riedel
  • 61,317
0

I believe some friends of mine and I found an answer. Let $S_i$ be the random variable where $S_i = 1$ if the sibling of the coupon collector does not get the coupon, and let $S_i = 0$ if the sibling of the coupon collector does get the coupon. Line up the coupons in a row beginning from 1 to $r$ in the order in which the coupon collector gets the coupons. Then we have $$ 1, \ldots, i, \ldots, r.$$ We want to then find $P(S_i = 1)$. Notice that we have $r-i+1$ different places in which the $i$ coupon could appear. Also note that if the $i$ coupon appears after the $r$ coupon, then it does not count, since the coupon collector terminates the sequence after finishing his collection. So, in order to find $P(S_i = 1)$, we simply consider the case in which the $i$ coupon appears after the $r$, which is simply $\frac{1}{r-i+1}$. Note that $S = S_1 + \cdots + S_r$, and by the linearity of expectance $\mathbb{E}(S) = \mathbb{E}(S_1) + \cdots + \mathbb{E}(S_r)$. Hence, to find the expected value, we sum over all different possible $i$'s to get $$\mathbb{E}(S) = \sum_{i=1}^r E(S_i) = \sum_{i=1}^r \frac{1}{r-i+1}. $$ Let $r-i+1 = j$. Then we have $$ \mathbb{E}(S) = \sum_{j=1}^r \frac{1}{j},$$ or in other words the expected amount of coupons she is missing is $H_r = \sum_{i=1}^r \frac{1}{i}$.

If any of this looks wrong let me know and I will think about it longer.

User203940
  • 2,473
  • I don't understand, you said: "the ii coupon appears after the rr coupon, then it does not count" and then ". So, in order to find P(Si=1)P(Si=1), we simply consider the case in which the ii coupon appears after the r" If it doesn't count why are you considering the case? – Gaston Jun 17 '17 at 11:56
  • It's been a while since I've written this solution, but to me it looks like I included it just for completeness reasons. Since it can never happen, you don't have to consider the case. – User203940 Jun 19 '17 at 17:41