1

Suppose you have a fair $N$-sided die. You decide to roll it until $M$ unique values have been produced (i.e. you re-roll all previously rolled values). How many times will you roll the die? (Given $2 <= M <= N$.)

I know that for the special case of $M=2$ it's simply a matter of how many times you have to re-roll your attempt for the second value, so the distribution is: $$P^N_2(X=u) = \left(\frac{1}{N}\right)^{u-2}\left(1-\frac{1}{N}\right) = \frac{N-1}{N^{u-1}}$$

And that for any $M$ the probability of the lowest possible outcome $X=M$ (i.e. no re-rolls): $$P^N_M(X=M) = \prod_{i=0}^{M-1}\left(1-\frac{i}{N}\right) = \frac{1}{N^M}\prod_{i=0}^{M-1}\left(N-i\right) = \frac{N!}{N^M(N-M)!}$$

The final clue I've got is that the probability distributions for subsequent values of $M$ can be defined using the probability distribution of the previous, like so:

$$P^N_{M}(X=u) = \sum_{i=1}^{u-M+1}\left(P^N_{M-1}(X=u-i)\left(\frac{M-1}{N}\right)^{i-1}\left(1-\frac{M-1}{N}\right)\right)$$

With that I can determine the probability distribution for any value of $M$ I want, for instance $M=3$:

$$P^N_3(X=u) = \sum_{i=1}^{u-3+1}\left(P^N_2(X=u-i)\left(\frac{3-1}{N}\right)^{i-1}\left(1-\frac{3-1}{N}\right)\right)$$

$$= \sum_{i=1}^{u-2}\left(\left(\frac{N-1}{N^{u-i-1}}\right)\left(\frac{2}{N}\right)^{i-1}\left(1-\frac{2}{N}\right)\right)$$

$$= \sum_{i=1}^{u-2}\left(\left(\frac{N-1}{N^{u-1}}\right)N^i\left(\frac{N}{2}\right)\left(\frac{2}{N}\right)^i\left(\frac{N-2}{N}\right)\right)$$

$$= \frac{(N-1)(N-2)}{2 \cdot N^{u-1}}\sum_{i=1}^{u-2}\left(2^i\right)$$

$$= \frac{(N-1)(N-2)}{N^{u-1}}\sum_{i=0}^{u-1}\left(2^i\right)$$

$$= \left(\frac{(N-1)(N-2)}{N^{u-1}}\right)\left(\frac{1-2^u}{1-2}\right)$$ $$= \frac{(N-1)(N-2)(2^u-1)}{N^{u-1}}$$

However, I have no idea how to turn this into a generic formula that will allow me to calculate the probability for any $N$, $M$, and $u$ without going through the process of figuring out the PMF of every value of $M$ leading up to the one I want.

Travis Reed
  • 223
  • 1
  • 6
  • @Masacroso That answer appears to give the expected value of this PMF, which is useful but not what I asked for. (Admittedly, it's all I needed for the practical problem I'm facing, but I'm still academically interested in getting the full PMF if I can.) EDIT: the original comment seems to have vanished, so here's the link for reference: https://math.stackexchange.com/questions/1639339/different-solution-of-probability-problem-from-textbook/1639566#1639566 – Travis Reed May 24 '19 at 17:08
  • 1
    but you are asking about "how many times you need to roll the die", this could be whatever number, so you are asking for the probability to get $M$ distinct values in $m$ throws? – Masacroso May 24 '19 at 17:17
  • @Masacroso No, I'm effectively asking for the probability that the $m$-th throw specifically will produce the $M$-th distinct value. Which I suppose is the the probability that $m-1$ throws will produce $M-1$ distinct values times the probability that the final throw will produce a new value... Actually, that sounds like a much easier way to go about this. What is the probability to get exactly $M$ distinct values in $m$ throws, then? – Travis Reed May 24 '19 at 17:24
  • Actually I think my previous formula (in a now-deleted comment) for the probability of getting $M$ distinct values from $m$ rolls was wrong, and the correct formula would be: $$X_M^N(m)=\left(\frac{M}{N}\right)^m-\left(\frac{M-1}{N}\right)^m$$ (That's just the probability that all $m$ rolls would be be only one of $M$ values minus the probability that all $m$ rolls would be one of $M-1$ values.) – Travis Reed May 24 '19 at 17:42
  • So, if I'm right then the full PMF I was looking for can be written as: $$P_M^N(u)=\left(\left(\frac{M-1}{N}\right)^{u-1}-\left(\frac{M-2}{N}\right)^{u-1}\right)\left(1-\frac{M-1}{N}\right)$$ I'll answer my own question in a bit unless anyone else has an objection. – Travis Reed May 24 '19 at 17:49
  • 1
    take a look to this paper, it analyzes the same problem (theorem 2 is the pmf you want, but in a generalized form assuming weighted distribution for the probability to get some specific value). Also took a look here – Masacroso May 25 '19 at 11:50

1 Answers1

0

Okay, so thanks to @Masacroso I've had an epiphany.

Basically, the question I'm asking is "what is the probability that $u-1$ throws of an $N$ sided die will create exactly $M-1$ distinct values, and then the next roll produces a new distinct value.

$$P_M^N(u) = X_{M-1}^N(u-1) \cdot \left(1-\frac{M-1}{N}\right)$$

The probability of $u$ throws producing exactly $M$ distinct values is equal to the probability of $u$ throws each being any one of $M$ possible values minus the probability of $u$ each being any one of $M-1$ possible values:

$$X_M^N(u) = Y_M^N(u) - Y_{M-1}^N(u)$$

$$Y_M^N(u) = \binom{N}{M}\left(\frac{M}{N}\right)^u = \frac{N!}{M!(N-M)!}\left(\frac{M}{N}\right)^u$$

$$\therefore X_M^N(u) = \frac{N!}{M!(N-M)!}\left(\frac{M}{N}\right)^u - \frac{N!}{(M-1)!(N-M+1)!}\left(\frac{M-1}{N}\right)^u$$

$$\therefore X_{M-1}^N(u-1) = \frac{N!}{(M-1)!(N-M+1)!}\left(\frac{M-1}{N}\right)^{u-1} - \frac{N!}{(M-2)!(N-M+2)!}\left(\frac{M-2}{N}\right)^{u-1}$$

$$\therefore X_{M-1}^N(u-1) = \frac{(N-1)!}{(M-2)!(N-M+1)!}\left(\frac{M-1}{N}\right)^{u-2} - \frac{(N-1)!}{(M-3)!(N-M+2)!}\left(\frac{M-2}{N}\right)^{u-2}$$

$$\therefore X_{M-1}^N(u-1) = \frac{(N-1)!}{N^{u-2}(M-3)!(N-M+1)!}\left(\frac{(M-1)^{u-2}}{(M-2)} - \frac{(M-2)^{u-2}}{(N-M+2)}\right)$$

Therefore, the full PMF can be written as such:

$$\therefore P_M^N(u) = \frac{(N-1)!}{N^{u-2}(M-3)!(N-M+1)!}\left(\frac{(M-1)^{u-2}}{(M-2)} - \frac{(M-2)^{u-2}}{(N-M+2)}\right) \cdot \left(\frac{N-M+1}{N}\right)$$

$$\therefore P_M^N(u) = \frac{(N-1)!}{N^{u-1}(M-3)!(N-M)!}\left(\frac{(M-1)^{u-2}}{(M-2)} - \frac{(M-2)^{u-2}}{(N-M+2)}\right)$$

$$\therefore P_M^N(u) = \frac{(N-1)!}{N^{u-1}(M-2)!(N-M)!}\left((M-1)^{u-2} - \frac{(M-2)^{u-1}}{(N-M+2)}\right)$$

All this given: $2 \leq M \leq N,\quad u \geq M)$

And there we have it. I've checked this solution for the M=2 case above, and it works. I haven't checked the recursive definitions, though. EDIT: The solution doesn't seem to match for M=3, so I seem to have gone wrong somewhere...

Travis Reed
  • 223
  • 1
  • 6
  • Your formula in the middle cannot possibly be right: consider the case $u<M$, in which you must have zero. – Ian May 24 '19 at 18:19
  • If you're going to use this approach of reducing $P^N_M$ to the problem of evaluating $X^N_M$, you should instead come up with a recursion for $X^N_M$. This can be done using the total probability formula: the probability that $u$ throws generate $M$ distinct values is the probability that $u-1$ throws generate $M-1$ distinct values multiplied by the probability that the $u$th throw is a new distinct value, plus the probability that $u-1$ throws already generated $M$ distinct values and the $u$th throw generated one of those values again. Then you need a suitable base case. – Ian May 24 '19 at 18:30
  • 1
    You defined $X^N_M(u)$ to be the probability that in $u$ throws you get exactly $M$ distinct values. It's not possible to get more than $u$ distinct values out of $u$ throws, so $X^N_M(u)=0$ if $u<M$. – Ian May 24 '19 at 19:33
  • Oh, my mistake. I had that confused with the probability of getting $M$ or less values on $u$, which I didn't actually assign a function to. So, say $$X_M^N(u) = Y_M^N(u) - Y_{M-1}^{N}(u)$$ – Travis Reed May 24 '19 at 19:36
  • 1
    Anyway, the even easier way to do the original problem is to simply write it as a sum of independent geometric random variables: the first unique value takes Geo(1) trials to get, the second one takes Geo(1-1/N) trials to get, and so on until the Mth value takes Geo(1-(M-1)/N) trials to get, assuming $M \leq N$ as in the question. – Ian May 24 '19 at 19:36
  • 1
    @Ian How do you use the linear combination of geometric random variables to calculate the probability mass function, though? – Travis Reed May 24 '19 at 19:49
  • The PMF of a sum of independent discrete random variables is the convolution of the PMFs. This means that to get $P^N_M(u)$, you sum over all the sequences $(x_1,\dots,x_M)$ of $M$ positive integers that sum to $n$, with the summand being $\prod_{i=1}^M P_i(x_i)$, where $P_i$ is the $i$th PMF. – Ian May 24 '19 at 21:15
  • That seems like too much work. So I've expanded upon my current solution and added the caveat that it only applies for cases where $u \geq M$ – Travis Reed May 28 '19 at 16:03
  • Just that caveat doesn't save the problem. Again, if you want to use a recursion, you can't do it in such a simple way, you have to handle $X^N_M(u)$ recursively too. To do that, you can't restrict attention to the case $u \geq M$, because as I said $X^N_M(u)$ includes the possibility that the $u$th throw got the $M$th distinct value for the first time and also the possibility that the $M$th distinct value was already hit earlier. Looking at the case $u<M$ to begin with just makes the breakdown of your method more apparent. – Ian May 28 '19 at 16:06
  • Also, don't knock the convolution method until you try it, with the geometric series structure there it might very well be easy to simplify. I just haven't tried it yet. – Ian May 28 '19 at 16:09
  • Indeed it actually should be somewhat straightforward to do it. Consider $M=3$: for $u \geq 3$ you have $P^N_3(u)=P^N_2(u-1)=\sum_{n_2=1}^{u-2} f(n_2;1/N) f(u-1-n_2;2/N)$ where $f$ is the geometric PMF, so this is $\sum_{n_2=1}^{u-2} (1/N)^{n_2-1} (2/N)^{u-1-n_2} (1-1/N) (1-2/N)$. That's actually easy to sum up, because it's just a number independent of $n_2$ times $2^{-n_2}$. – Ian May 28 '19 at 16:25