9

Our group did a yankee swap yesterday, which got me thinking about a probability question, re: a simplified version without actual "stealing".

Players numbered $1, 2, ..., N$ each brings a wrapped gift. Then in order, player $1, 2, ..., N-1$ each uniformly-randomly picks a wrapped gift which is NOT the one he/she brought, and unwraps it. (By unwrapping it, this gift cannot be picked by any subsequent player.)

Question: What is the probability $p$ that the last wrapped gift was brought by the last player, i.e. $N$? (Or in yankee swap terminology, the last player would be forced to unwrap the gift he/she brought.)

Further thoughts:

  • Clearly $1/N$ is not the answer since $N=2 \implies p=0$ (player $1$ is forced to pick the gift brought by player $2$).
  • A bit of calculation shows $N = 3 \implies p=1/4$:
    • Player $1$ can pick $2$ or $3$, with prob $1/2$ each.
    • If player $1$ picked $3$, then player $2$ must pick $1$ and player $3$ will pick $2$.
    • If player $1$ picked $2$, then player $2$ can pick $1$ or $3$, with conditional probability $1/2$ each.
    • Therefore, the only case of gift $3$ being the last wrapped gift is $1$ picks $2$ and $2$ picks $1$, which happens with probability $1/2 \times 1/2 = 1/4$.
  • I would think for large $N$ the probability approaches $1/N$, but stays slightly below. I.e. I imagine $p = 1/N - \epsilon(N)$ where the error term $\epsilon(N) > 0$ and approaches $0$ very very fast (much faster than $1/N$).
  • For any $N$, we can solve this exactly using a system of recurrences (involving $3N$ variables, I think...), but I'm wondering if there is a smarter way. E.g. if my second bullet above is correct, is there a good way to calculate or bound the error term?
antkam
  • 15,363
  • @Randall - If you're talking about the problem where each passenger $2, ..., N-1$ takes a random seat ONLY IF his/her original seat is taken, then I think the similarity is quite superficial and certainly the trick for the airplane problem cannot be used here. Unless I'm missing some subtle mapping between them? – antkam Dec 09 '21 at 18:43
  • It seems that no closed form is known, check the OEIS entry A136300. However, I also believe that the probability is of the form $\frac{1}{N}-\epsilon(N)$ for some positive error term $\epsilon(N)=o(1/N)$. – Sangchul Lee Dec 09 '21 at 21:11
  • 1
    @SangchulLee - Thanks!! Why am I not surprised OEIS already has this... :) Still, I wonder if a good bound for the error can be found, or approximated in some interesting way. – antkam Dec 09 '21 at 21:22
  • The OEIS entry seems to suggest that $\epsilon(N)$ is of the form $cN^{-\alpha}$ for some $\alpha$ which is strictly less than $2$ (and presumably $\alpha=\frac{3}{2}$). This seems hinting that this problem is quite non-trivial. I'm also interested to see any good (and rigorous) estimates on the probability. – Sangchul Lee Dec 10 '21 at 05:54
  • @SangchulLee - $\alpha < 2$ would be quite surprising to me! E.g. the deleted answers (which wrongly assume every valid sequence is equiprobable) both arrive at $p \approx 1/(N+1)$ which would mean the error term is $1/(N(N+1))$ i.e. $\alpha =2$. TBH my gut feel / original blind guess would be an error of even higher order i.e. $\alpha > 2$. – antkam Dec 10 '21 at 14:15
  • Indeed it is surprising that $\alpha$ seems less than $2$. Moreover, both the exact results for small $N$'s and simulations suggest that the PMF of the 'last pick' demonstrates an unexpected behavior, developing a "pick" at $N-1$. (Honestly, this is quite surprising to me!) Analyzing this behavior might possibly explain the slow decay of the error term. Of course, we can't exclude the possibility that my observation is merely an artifact from limited number of data. – Sangchul Lee Dec 10 '21 at 14:22

1 Answers1

2
  1. The key is to formulate the problem in terms of a permanent of a certain matrix. The probability $p(n)$ that the last wrapped gift was brought by the last player is

$$p(n) = \operatorname{perm}\, \begin{pmatrix}0&1/(n-1)&1/(n-2)&\cdots&1/3&1/2\\1/(n-1)&0&1/(n-2)&\cdots&1/3&1/2\\1/(n-1)&1/(n-2)&0&\cdots&1/3&1/2\\\vdots&\vdots&\vdots&&\vdots&\vdots\\1/(n-1)&1/(n-2)&1/(n-3)&\cdots&0&1/2\\1/(n-1)&1/(n-2)&1/(n-3)&\cdots&1/2&0\end{pmatrix}$$

Looking at the definition $\operatorname{perm}\,(A) = \sum\limits_{\sigma \in S_n} \prod\limits_{i=1}^n a_{i, \sigma(i)}$ long enough it becomes clear why it can be expressed like this. Each player can choose any gift except their own and the last one (which ensures that the last player does end up with their own gift). The matrix columns indicate the available probabilities at each step, starting with $1/(n-1)$ for the first player and adjusting based on whether a gift has been previously selected. The zeros on the diagonal reflect the fact that the players can't choose their own gifts. Wikipedia has a section permanent:enumeration which describes how a similar approach can be applied to other problems – to count derangements or the number of arrangements in Ménage problem.

We can also simplify the permanent by multiplying the first column by $n-1$, the second column by $(n-1)(n-2)$ and in general the $k$-th column by $(n-k+1)(n-k)$ getting

$$p(n) = \frac{1}{(n-1)!^2} \operatorname{perm}\, (A_n),$$ where $$A_n = \begin{pmatrix}0&n-2&n-3&\cdots&2&1\\1&0&n-3&\cdots&2&1\\1&n-1&0&\cdots&2&1\\\vdots&\vdots&\vdots&&\vdots&\vdots\\1&n-1&n-2&\cdots&0&1\\1&n-1&n-2&\cdots&3&0\end{pmatrix}$$ Some numerical values $$p(2) = \frac{1}{1!^2} \operatorname{perm}\,(0) = 0,$$ $$p(3) = \frac{1}{2!^2} \operatorname{perm}\, \begin{pmatrix}0&1\\1&0\end{pmatrix}=\frac14,\quad p(4) = \frac{1}{3!^2} \operatorname{perm}\, \begin{pmatrix}0&2&1\\1&0&1\\1&3&0\end{pmatrix} = \frac{5}{36}$$

  1. We can get an explicit expression for this permanent using MacMahon's master theorem. It allows to move from permanents to determinants, which are usually easier to work with:

$$\operatorname{perm}\,(A_n) = [x_1x_2\cdots x_{n-1}]\, \frac{1}{\operatorname{det}\,(I-XA_n)}$$

where notion $[f]g$ means the coefficient of monomial $f$ in the expanded form of $g$ and $X$ is the diagonal matrix of $x_i$.

We also have $A_n = u_nv_n^T-B_n$, where

$$u_n = \begin{pmatrix}1\\1\\1\\\vdots\\1\\1\end{pmatrix},\quad v_n = \begin{pmatrix}1\\n-2\\n-3\\\vdots\\2\\1\end{pmatrix}, \quad B_n = \begin{pmatrix}1&0&0&\cdots&0&0\\0&n-2&0&\cdots&0&0\\0&-1&n-3&\cdots&0&0\\\vdots&\vdots&\vdots&&\vdots&\vdots\\0&-1&-1&\cdots&2&0\\0&-1&-1&\cdots&-1&1\end{pmatrix}$$ which allows to simplify the computation further:

$$\operatorname{det}\,(I-XA_n) = \operatorname{det}\,(I-Xu_nv_n^T + XB_n) = \operatorname{det}\,X \operatorname{det}\,(X^{-1}+ B_n -u_nv_n^T)$$

Using Matrix determinant lemma:

$$= \operatorname{det}\,X \operatorname{det}\,(X^{-1}+ B_n) (1 - v_n^T (X^{-1}+ B_n)^{-1}u_n) = \det(I+XB_n) (1 - v_n^T (X^{-1}+ B_n)^{-1}u_n)$$

Since $I+XB_n$ is lower triangular

$$\operatorname{det}\,(I+XB_n) = (1+x_1)(1+2x_2)\cdots (1+(n-2)x_{n-2})(1+x_{n-1}) = (1+x_{n-1}) \prod\limits_{m=1}^{n-2}(1+mx_m)$$ Here we also choose the order in which $x_i$ appear in the $X$ which is $X = \operatorname{diag} (x_{n-1}, x_{n-2}, \ldots, x_1)$. It doesn't affect the answer but results in shorter formulas.

Now we need to find $(X^{-1}+ B_n)^{-1} = (I+XB_n)^{-1} X$. Since $I+XB_n$ is lower triangular, it is easily invertible. One way to do it is to calculate its adjugate matrix. Fortunately, we can neglect the powers of $x_i$ greater than $1$:

$$(I+XB_n)^{-1}X = $$ $$=\begin{pmatrix}x_{n-1}&0&0&\cdots&0&0&0\\0&x_{n-2}&0&\cdots&0&0&0\\0&x_{n-2}x_{n-3}&x_{n-3}&\cdots&0&0&0\\\vdots&\vdots&\vdots&&\vdots&\vdots&\vdots\\0&x_3 x_{n-2} \prod_{p=4}^{n-3} (1+x_p)&x_3 x_{n-3} \prod_{p=4}^{n-4} (1+x_p)&\cdots&x_3&0&0\\0&x_2 x_{n-2} \prod_{p=3}^{n-3} (1+x_p)&x_2 x_{n-3} \prod_{p=3}^{n-4} (1+x_p)&\cdots&x_2x_3&x_2&0\\0&x_1 x_{n-2} \prod_{p=2}^{n-3} (1+x_p)&x_1 x_{n-3} \prod_{p=2}^{n-4} (1+x_p) &\cdots&x_1x_3(1+x_2)&x_1x_2&x_1\end{pmatrix}$$

Therefore, $$1 - v_n^T (X^{-1}+ B_n)^{-1}u_n = 1 - x_1 \left(1+x_2+x_3(1+x_2) + \cdots + x_{n-2} \prod_{p=2}^{n-3} (1+x_p)\right) - 2x_2 \left(1+x_3+ \cdots + x_{n-2} \prod_{p=2}^{n-3} (1+x_p)\right) - \cdots - (n-2)x_{n-2} - x_{n-1}$$

We will use the notation $\Omega_k(p_1, \ldots, p_n)$ for the set of all possible ordered products of $p_1, \ldots, p_n$ with $\geq k$ elements where each term is weighted by the index of the first element. For example $\Omega_1(p_1, p_2, p_3) = \{p_1, p_1p_2, p_1p_3, p_1p_2p_3, 2p_2, 2p_2p_3, 3p_3\}$. Then we can rewrite:

$$1 - v_n^T (X^{-1}+ B_n)^{-1}u_n = 1-\sum \Omega_1(x_1, x_2, \ldots, x_{n-2}) - x_{n-1}$$

Now we have $$\operatorname{perm}\,(A_n) = [x_1x_2\cdots x_{n-1}]\, \left((1+x_{n-1})^{-1} \prod\limits_{m=1}^{n-2}(1+mx_m)^{-1}\right) \left(1-\sum \Omega_1(x_1, x_2, \ldots, x_{n-2}) - x_{n-1}\right)^{-1}$$

We can substitute $(1+x)^{-1}$ by $\exp(-x)$ without affecting the result. We also apply the formula $1/x = \int_0^\infty \exp(-\lambda x) d\lambda$:

$$\operatorname{perm}\,(A_n) = [x_1x_2\cdots x_{n-1}]\, \int_0^\infty \exp(-\lambda) \exp\left(-x_{n-1}-\sum_{m=1}^{n-2} mx_m\right) \exp\left(\lambda x_{n-1}+\lambda \sum \Omega_1 (x_1, x_2, \ldots, x_{n-2})\right) d\lambda = $$ $$= [x_1x_2\cdots x_{n-1}]\, \int_0^\infty \exp(x_{n-1}(\lambda-1)) \exp\left(\sum_{m=1}^{n-2} mx_m(\lambda-1)\right) \exp\left(\lambda \sum \Omega_2(x_1, x_2, \ldots, x_{n-2})\right) d\lambda = $$ $$= [x_1x_2\cdots x_{n-2}]\, \int_0^\infty \exp(-\lambda) (\lambda-1) \exp\left(\sum_{m=1}^{n-2} mx_m(\lambda-1)\right) \exp\left(\lambda \sum \Omega_2(x_1, x_2, \ldots, x_{n-2})\right) d\lambda = $$ $$= [x_1x_2\cdots x_{n-2}]\, \int_0^\infty \exp(-\lambda) (\lambda-1) \prod_{m=1}^{n-2} \left(1+(\lambda-1) mx_m \right) \prod \left(1+\lambda \Omega_2(x_1, x_2, \ldots, x_{n-2})\right) d\lambda$$

In order to proceed we need to compute the polynomial $$f_n(\lambda) = [x_1x_2\cdots x_{n-2}] \prod_{m=1}^{n-2} \left(1+\color{blue}{(\lambda-1)} mx_m \right) \prod \left(1+\color{red}{\lambda} \Omega_2(x_1, x_2, \ldots, x_{n-2})\right)$$ which is essentially a new combinatorial problem. What we are doing above is dividing the product $x_1x_2\cdots x_{n-2}$ into parts where some parts (single elements) can come from the left product and some parts ($\geq 2$ elements) from the right product and giving each such partition an appropriate weight. It includes the products of coefficients before each (product of) $x_i$ and in addition $\color{blue}{(\lambda-1)}$ for the parts from the left product and $\color{red}{\lambda}$ for the right ones. For example, $f_3(\lambda) = \color{blue}{\lambda-1}$, $f_4(\lambda) = 2\color{blue}{(\lambda-1)}^2 + \color{red}{\lambda}$, $f_5(\lambda) = 2\cdot 3\color{blue}{(\lambda-1)}^3 + (2+2+3)\color{red}{\lambda}\color{blue}{(\lambda-1)} + \color{red}{\lambda}$.

The tricky part is that we can write the general form of this polynomial like this: $$f_n(\lambda) = \prod_{m=n-2}^{1} (\color{blue}{(\lambda-1)}m+\color{red}{\lambda} D_\lambda) 1, \quad n > 2$$ where $D_\lambda$ is the differential operator with respect to $\lambda$. This expression can be proved inductively – the first term of the product $(\color{blue}{(\lambda-1)}(n-2)+\color{red}{\lambda} D_\lambda)$ represents the possible choice for the last element $x_{n-2}$. If it is chosen from the left part, the right part can no longer contain it, therefore it reduces to the solution of the problem for $n-1$ players. Otherwise, it is chosen from the right part and the result can be obtained by applying $D_\lambda$ to the $n-1$ solution due to the structure of $\Omega_2$.

Now we expand $f_n(\lambda)$ into series of $\lambda$:

$$f_{n}(\lambda) = \sum_{k=0}^{n-2} g_{n,k} \lambda^k$$

Since $f_n(\lambda) = ((\lambda-1)(n-2)+\lambda D_\lambda)f_{n-1}(\lambda)$:

$$f_n(\lambda) = (n-2) \sum_{k=0}^{n-3} g_{n-1,k} \lambda^{k+1} - (n-2) \sum_{k=0}^{n-3} g_{n-1,k} \lambda^k + \sum_{k=1}^{n-3} g_{n-1,k} k\lambda^k$$

Equating the terms under the same powers of $\lambda$ we obtain the recurrence equations:

$$g_{n,k} = (n-2) g_{n-1, k-1} - (n-k-2) g_{n-1, k}, \quad g_{n, 0} = -(n-2)g_{n-1, 0},\quad g_{n,n-2} = (n-2)g_{n-1, n-3}$$

By change of variables $g_{n, k} = (n-k-2)! q_{n, k}$ we obtain:

$$q_{n, k} = (n-2) q_{n-1, k-1} - q_{n-1, k},\quad q_{n, 0} = -q_{n-1,0}, \quad q_{n,n-2} = (n-2)q_{n-1, n-3}$$

The recurrence we have got is for the signed Stirling numbers of the first kind. Using the common notation:

$$q_{n, k} = \begin{bmatrix}n-1\\n-k-1\end{bmatrix} (-1)^{n-k}$$

Hence we have the following explicit expression for the polynomial: $$f_n(\lambda) = \sum_{k=0}^{n-2} \begin{bmatrix}n-1\\n-k-1\end{bmatrix} (-1)^{n-k} (n-k-2)! \lambda^k = \sum_{k=0}^{n-2} \begin{bmatrix}n-1\\k+1\end{bmatrix} (-1)^{k} k! \lambda^{n-k-2}$$

Going back to the permanent:

$$\operatorname{perm}\, (A_n) = \int_0^\infty \exp(-\lambda) (\lambda-1) \left( \sum_{k=0}^{n-2} \begin{bmatrix}n-1\\k+1\end{bmatrix} (-1)^{k} k! \lambda^{n-k-2} \right) d\lambda = $$ $$= \sum_{k=0}^{n-2} \begin{bmatrix}n-1\\k+1\end{bmatrix} (-1)^{k} k! \int_0^\infty \exp(-\lambda) (\lambda-1) \lambda^{n-k-2} d\lambda$$

Using $\int_0^\infty \exp(-\lambda) \lambda^k d\lambda = k!$:

$$\operatorname{perm}\, (A_n) = \sum_{k=0}^{n-2} \begin{bmatrix}n-1\\k+1\end{bmatrix} (-1)^{k} k! (n-k-2)(n-k-2)!$$

Finally

$$\boxed{p(n) = \frac{1}{(n-1)!^2} \sum_{k=0}^{n-2} \begin{bmatrix}n-1\\k+1\end{bmatrix} (-1)^{k} k! (n-k-2)(n-k-2)!}$$

  1. To find the asymptotic we start with the expression without lifting the integral

$$p(n) = \frac{1}{(n-1)!^2}\sum_{k=0}^{n-2} \begin{bmatrix}n-1\\k+1\end{bmatrix} (-1)^k k! \int_0^\infty \exp(-\lambda)(\lambda-1) \lambda^{n-k-2} d\lambda$$

For the Stirling numbers of the first kind we know that $$\begin{bmatrix}n+1\\k+1\end{bmatrix} \sim \frac{n!}{k!} z^k,$$ where $z = 1 + \frac12 + \cdots + \frac1n$ is harmonic number.

Then $$p(n) \sim \frac{1}{(n-1)!^2}\sum_{k=0}^{n-2} \frac{(n-2)!}{k!} z^k (-1)^k k! \int_0^\infty \exp(-\lambda)(\lambda-1) \lambda^{n-k-2} d\lambda = $$ $$= \frac{1}{(n-1)(n-1)!} \int_0^\infty \exp(-\lambda) \lambda^{n-2} (\lambda-1) \sum_{k=0}^{n-2} (-1)^k z^k \lambda^{-k} d\lambda = $$

Summing up the geometric series:

$$= \frac{1}{(n-1)(n-1)!} \int_0^\infty \exp(-\lambda) \lambda (\lambda-1) \frac{\lambda^{n-2}-(-1)^{n-2}z^{n-2}}{\lambda+z} d\lambda = $$ $$= \frac{\exp(z)}{(n-1)(n-1)!} \left[ E_{n+1}(z)n! - E_{n}(z)(n-1)! + (-1)^{n-1}z^{n-2} (2E_3(z)-E_2(z))\right],$$ where $E_n$ is the generalized exponential integral, for which we know that

$$\frac{1}{z+n} < \exp(z)E_n(z) \leq \frac{1}{z+n-1}$$ Therefore,

$$p(n) \sim \frac{1}{(n-1)(n-1)!} \left[ \frac{n!}{z+n} - \frac{(n-1)!}{z+n} \right] = \frac{1}{z+n}$$

Using asymptotic for the harmonic number $z$ we obtain:

$$\boxed{p(n) \sim \frac{1}{\gamma + n + \ln n}}$$

where $\gamma \approx 0.5772156649$ is Euler's constant.

  • 1
    wow, thank you very much for such a detailed answer! i must admit some of the math here is a bit beyond me, but i get the general gist. your final answer also gives an error term $\sim \frac{\ln n}{n^2}$ which aligns with the observation by Sanghul Lee (in the comment thread to my original question) that the error term is strictly greater than $\epsilon n^{-2}$ – antkam Jan 16 '24 at 18:36