We present a problem inspired by the work at this MSE link. In particular, we consider a coupon collector scenario with $n$ coupons where an integer $1\le j\le n-1$ is given. We introduce two random variables, namely $T$ and $Q$ where $T$ represents the number of draws until all coupons have been collected and $Q$ the number of different coupons that appeared in the first $j$ draws. The following conjecture is submitted for your consideration.
$$\mathrm{E}\left[{T\choose Q}\right] = \sum_{k=1}^j \frac{n!}{n^{n-k-1+j}} \times {j\brace k} \sum_{r=0}^k {n+j-k\choose k-r} \\ \times \sum_{p=0}^{n-k-1} \frac{(-1)^{n-k-1-p}}{p! (n-k-1-p)!} \frac{(k+p)^{n-k-1+r}}{(n-k-p)^{r+1}}.$$
I have what I believe to be a proof but it is quite involved. We propose the following list of questions concerning the above identity:
does it indeed hold and does it perhaps have a straightforward proof using probabilistic methods and is there structural simplification
what are the asymptotics, are there effective estimates of these terms that match the numeric exact values from the formula without having recurse to a triple sum.
The reader is invited to compare potentially relevant asymptotics to the data from the identity.
There is the following extremely basic (no pun intended) C program which I include here to help clarify what interpretation of the problem is being used. Compiled with GCC 4.3.2 and the std=gnu99 option.
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
#include <time.h>
#include <string.h>
long choose(long n, long k)
{
long num = 1, denom = 1;
while(k > 0){
num *= n;
denom *= k;
n--; k--;
}
return num/denom;
}
int main(int argc, char **argv)
{
int n = 6 , j = 3, trials = 1000;
if(argc >= 2){
n = atoi(argv[1]);
}
if(argc >= 3){
j = atoi(argv[2]);
}
if(argc >= 4){
trials = atoi(argv[3]);
}
assert(1 <= n);
assert(1 <= j && j < n);
assert(1 <= trials);
srand48(time(NULL));
long long data = 0;
long genstats[n];
memset(genstats, 0, n*sizeof(long));
for(int tind = 0; tind < trials; tind++){
int seen = 0; int steps = 0;
int dist[n], startseg[n];
for(int cind = 0; cind < n; cind++){
dist[cind] = 0; startseg[cind] = 0;
}
while(seen < n){
int coupon = drand48() * (double)n;
genstats[coupon]++;
steps++;
if(steps <= j)
startseg[coupon]++;
if(dist[coupon] == 0)
seen++;
dist[coupon]++;
}
int stseen = 0;
for(int stcoup = 0; stcoup < n; stcoup++)
if(startseg[stcoup] > 0)
stseen++;
data += choose(steps, stseen);
}
long double expt = (long double)data/(long double)trials;
printf("[n = %d, j = %d, trials = %d]: %Le\n",
n, j, trials, expt);
long long gentotal = 0;
for(int cind = 0; cind < n; cind++){
gentotal += genstats[cind];
}
for(int cind = 0; cind < n; cind++){
printf("%02d: %.8Le\n", cind,
(long double)genstats[cind]
/(long double)gentotal);
}
exit(0);
}
Addendum. As a sanity check when $j=1$ the formula should produce $n H_n$ for $n\ge 2.$ In fact we obtain
$$\frac{n!}{n^{n-1}} \left(n\times \sum_{p=0}^{n-2} \frac{(-1)^{n-2-p}}{p! (n-2-p)!} \frac{(1+p)^{n-2}}{n-1-p} + \sum_{p=0}^{n-2} \frac{(-1)^{n-2-p}}{p! (n-2-p)!} \frac{(1+p)^{n-1}}{(n-1-p)^2}\right).$$
For the first sum we introduce
$$f(z) = \frac{(1+z)^{n-2}}{n-1-z} \prod_{q=0}^{n-2} \frac{1}{z-q}$$
so that the sum is given by (residues sum to zero)
$$\sum_{q=0}^{n-2} \mathrm{Res}_{z=q} f(z) = -\mathrm{Res}_{z=n-1} f(z) - \mathrm{Res}_{z=\infty} f(z).$$
The contribution from the first term is $$\frac{n^{n-2}}{(n-1)!}$$ and from the second
$$\mathrm{Res}_{z=0} \frac{1}{z^2} \frac{(1+1/z)^{n-2}}{n-1-1/z} \prod_{q=0}^{n-2} \frac{1}{1/z-q} = \mathrm{Res}_{z=0} \frac{1}{z^n} \frac{(1+z)^{n-2}}{n-1-1/z} \prod_{q=0}^{n-2} \frac{z}{1-qz} \\ = \mathrm{Res}_{z=0} \frac{1}{z} \frac{(1+z)^{n-2}}{n-1-1/z} \prod_{q=0}^{n-2} \frac{1}{1-qz} = \mathrm{Res}_{z=0} \frac{(1+z)^{n-2}}{z(n-1)-1} \prod_{q=0}^{n-2} \frac{1}{1-qz} = 0.$$
Hence the first sum contributes
$$\frac{n!}{n^{n-1}} \times n \frac{n^{n-2}}{(n-1)!} = n.$$
For the second sum we use
$$g(z) = \frac{(1+z)^{n-1}}{(n-1-z)^2} \prod_{q=0}^{n-2} \frac{1}{z-q} = \frac{(1+z)^{n-1}}{(z-(n-1))^2} \prod_{q=0}^{n-2} \frac{1}{z-q}.$$
We get for the negative of the residue at $n-1$ the value
$$-\left((1+z)^{n-1} \prod_{q=0}^{n-2} \frac{1}{z-q} \right)' _{z=n-1} \\ = -\left((n-1)(1+z)^{n-2} \prod_{q=0}^{n-2} \frac{1}{z-q} - (1+z)^{n-1} \prod_{q=0}^{n-2} \frac{1}{z-q} \sum_{q=0}^{n-2} \frac{1}{z-q}\right)_{z=n-1} \\ = - \left((n-1)n^{n-2} \frac{1}{(n-1)!} - n^{n-1} \frac{1}{(n-1)!} H_{n-1}\right).$$
Multiply by $n!/n^{n-1}$ to get
$$n H_{n-1} - (n-1)n^{n-2} \frac{1}{(n-1)!} \frac{n!}{n^{n-1}} \\ = n H_{n-1} - (n-1)\frac{n}{n} = n H_{n-1} - (n-1).$$
For the negative of the residue at infinity we obtain
$$\mathrm{Res}_{z=0} \frac{1}{z^2} \frac{(1+1/z)^{n-1}}{(n-1-1/z)^2} \prod_{q=0}^{n-2} \frac{1}{1/z-q} = \mathrm{Res}_{z=0} \frac{1}{z^{n+1}} \frac{(1+z)^{n-1}}{(n-1-1/z)^2} \prod_{q=0}^{n-2} \frac{z}{1-qz} \\ = \mathrm{Res}_{z=0} \frac{1}{z^2} \frac{(1+z)^{n-1}}{(n-1-1/z)^2} \prod_{q=0}^{n-2} \frac{1}{1-qz} \\ = \mathrm{Res}_{z=0} \frac{(1+z)^{n-1}}{(z(n-1)-1)^2} \prod_{q=0}^{n-2} \frac{1}{1-qz} = 0.$$
Collecting everything we get
$$n H_{n-1} - (n-1) + n = n H_{n-1} + n \frac{1}{n}$$
or alternatively
$$\bbox[5px,border:2px solid #00A000]{n H_n}$$
and the sanity check goes through. Observe that we evidently require something more sophisticated to prove the conjectured identity e.g. when $j=n-1.$ (Remark. We don't need to actually apply the formula for the residues at infinity, it is sufficient when working with rational functions to observe that both $f(z)$ and $g(z)$ have the difference between the degree of the denominator and of the numerator equal to two.)