6

Choose a number $n\ge2$.

You have two boxes, A and B. Each turn, you add to each box a ball randomly numbered from 1 to n. What is the probability that, eventually, at any turn both boxes contain exactly the same numbers? (except obviously when the boxes are empty) I feel like if $n=2$, the probability is 100%, while for $n=3$ or greater it isn't 100%, as this problem looks similar to random walks in n dimensions. (Which I learnt about in Youtube's PBS Infinite Series) Can anyone help me solve the problem?

Thank you very much.

  • Hi, I'm not that strong in combinatorics but if I'm understanding correctly your problem I feel that neither with $n=2$, the probability is $100%$, for example if at the first turn you put the ball numbered 1 in A and at the second turn the ball numbered 2 in B, the two boxes at the second turn contains two different balls. – Turquoise Tilt Sep 03 '22 at 14:52
  • I'm sorry, that's not what I meant. I edited the question so it's more understadable – MATEO BOZZI Sep 03 '22 at 14:53
  • Daniel Mathias, I feel like even if that can happen, the probability that the two boxes never have the same balls tends to 0. – MATEO BOZZI Sep 03 '22 at 14:57
  • 1
    @DanielMathias yes that is why this question is interesting. The question is, "What is the probability that, eventually, at any turn both boxes contain exactly the same numbers?". And I have no idea how to answer. So I upvoted... – Adam Rubinson Sep 03 '22 at 14:57
  • In the case with $n=2$, do you mean that if i put the ball numbered 1 in A, in B I could only put the ball numbered 2 right? – Turquoise Tilt Sep 03 '22 at 14:57
  • @TurquoiseTilt You don't have one of each number. You have infinitely many $1$'s, infinitely many $2$'s etc to put into the boxes... – Adam Rubinson Sep 03 '22 at 14:58
  • @AdamRubinson are you sure, because OP thinks that with $n=2$ the probability is $100%$, with this setting is possible – Turquoise Tilt Sep 03 '22 at 15:00
  • 1
    It may well be 100% though, I'm not sure. This may or may not have to do with $n-$ dimensional random walks? – Adam Rubinson Sep 03 '22 at 15:00
  • 1
    @TurquoiseTilt Adam is right, you can put the same number in both boxes – MATEO BOZZI Sep 03 '22 at 15:00
  • @AdamRubinson neither with that setting is possible but I still think that you don't have infinite balls each turn – Turquoise Tilt Sep 03 '22 at 15:01
  • @TurquoiseTilt what do you mean by, " neither with that setting is possible"? To give an example of how the game could play out: $$$$ Turn $1$: $A$ contains one $2$ and $B$ contains one $1$. $$$$ Turn $2$: $A$ contains two $2$'s and $B$ contains one $1$ and one $2$. $$$$ Turn $3$: $A$ contains one $1$ and two $2'$s and $B$ contains two $1$'s and one $2$. $$$$ Turn $4$: $A$ contains two $1$'s and two $2$'s and $B$ contains three $1$'s and one $2$. $$$$ Had it gone differently on turn $4,$ thee game could have stopped. But as it happened, this time it didn't and so you keep going. – Adam Rubinson Sep 03 '22 at 15:06
  • 1
    The reason it is $100%$ when $n=2$ is that the difference between the two boxes is equivalent to a random with step probabilities of $\frac14$ for $+1$, $\frac14$ for $-1$, and $\frac12$ for $0$. You can ignore the $0$s and get a standard random walk, and it is well known you have probability $1$ of eventually returning to the start. My guess is that it is also $100%$ for larger $n$ – Henry Sep 03 '22 at 15:09
  • 1
    @AdamRubinsonthat's exactly what I meant, but in your example the game would end at turn 3 :) – MATEO BOZZI Sep 03 '22 at 15:09
  • @Henry nice! Thanks!, but as far as I know, random walks in dimensions greater to 2 don't have a 100% chance of eventually returning to the start – MATEO BOZZI Sep 03 '22 at 15:10
  • @MATEOBOZZI yes I edited my example now – Adam Rubinson Sep 03 '22 at 15:11
  • 2
    For $n=2$ this is (effectively) the same as asking for the probability that, in a sequence of tosses of a fair coin, you eventually have the same number of Heads and Tails. That is indeed $1$, see this – lulu Sep 03 '22 at 15:12
  • This is in effect a 1-dimensional random walk. The reason I think it is $100%$ for $n>2$ and two boxes is that I think you expect to cross a $0$ difference infinitely often even with the more complicated steps and you expect some of those to be a difference of exactly $0$. It is possible that it is also $100%$ for three boxes (a version of a 2-D random walk) - I am not sure. I doubt it is $100%$ for four or more boxes for precisely the reason you give – Henry Sep 03 '22 at 15:14
  • I'm so happy this problem is getting solved, I've had it hanging for more than a month! – MATEO BOZZI Sep 03 '22 at 15:16
  • I'm fairly sure the answer is that it's 100% for $n \leq 3$ and less than that for $n \geq 4$ -- I'll work on writing up the answer later today if nobody beats me to it. (Also, my "fairly sure" might well be wrong.) – Aaron Montgomery Sep 03 '22 at 19:25

1 Answers1

1

Short answer / sketch of proof: The probability is $100\%$ for $n = 2, 3$ and is less than that for $n \geq 4$, because the question is equivalent to the question of recurrence and transience of random walks on $\mathbb Z^{n-1}$.

You can translate this process to a random walk on $\mathbb Z^n$ as follows:

  • placing ball $i$ into bin $A$ corresponds to taking a step of $+1$ in coordinate $i$
  • placing ball $i$ into bin $B$ corresponds to taking a step of $-1$ in coordinate $i$
  • the random walk does both of the above things at the same time as a single "step"

In the case where the two bins receive the same balls, the corresponding step of the random walk doesn't move. Your question is equivalent to whether the walk is guaranteed to ever returns to the origin (i.e. is recurrent or transient); note that the non-moving steps can be safely ignored, because they do not affect questions of recurrence.

The random walk is not a true random walk on all of $\mathbb Z^n$, but rather one on the sublattice $\{(x_1, \dots, x_n) : \sum x_i = 0\}$, which effectively reduces the dimension of the random walk by 1, which is the fundamental reason for the answer to the question.


To formalize the argument:

Claim: This aforementioned sublattice of $\mathbb Z^d$ (endowed with the graph structure induced by the random walk described above) is roughly isometric (a.k.a. quasi-isometric or roughly equivalent) to $\mathbb Z^{d-1}$.

If we can prove this claim, then we're done, since recurrence and transience of random walks are preserved under rough isometries (see Theorem 3.2 of Markvorsen, Mcguinness, and Thomassen, available here).

Proof of claim: This is one of those cases where the act of establishing notation and writing the proof down is harder than the actual idea itself. The entire idea of quasi-isometry is that we create a mapping between two metric spaces (graphs, in this case) such that the mapping is dense-ish in the target space and doesn't change graph distances too much.

We will define a mapping $f: \mathbb Z^{n-1} \to G := \{(x_1, \dots, x_n) : \sum x_i = 0\} \subset \mathbb Z^n$ -- specifically, we will declare that $f(x_1, \dots, x_{n-1}) = (x_1, \dots, x_{n-1}, -\sum x_i)$. It is not hard to see that this map satisfies the density condition of rough isometry; in fact, it is a bijection of vertices. Hence, we need only to check that distances between points are not distorted too much under this mapping.

First, note that neighboring vertices in the domain $\mathbb Z^{n-1}$ also map to neighboring vertices in $G$; indeed, for $\mathbf x_1, \mathbf x_2$ to be neighboring in $\mathbb Z^{n-1}$ means that they differ by one in exactly one position; that is, $\mathbf x_1 - \mathbf x_2 = \pm \mathbf e_i$ for some $i \in \{1, \dots, n-1\}$, where $\mathbf e_i$ has a $1$ in position $i$ and zeroes elsewhere. The images $f(\mathbf x_1), f(\mathbf x_2)$ then differ only by $\pm \mathbf e_i \mp \mathbf e_n$, which is a valid edge that corresponds to placing balls $i$ and $n$ in boxes $A$, $B$ (either respectively, or reversed, depending on the sign of the expression). Hence, $ d_{G}(f(\mathbf x_1), f(\mathbf x_2)) \leq d_{\mathbb Z^{n-1}}(\mathbf x_1, \mathbf x_2)$.

To obtain a bound in the other direction, consider neighboring vertices $\mathbf y_1, \mathbf y_2 \in G$. Note that every edge in $G$ has the form $\pm \mathbf e_i \mp \mathbf e_j$ for some $i, j \in \{1, \dots, n\}$. In the case where either $i = n$ or $j = n$, the preimages of $\mathbf y_1, \mathbf y_2$ are already neighbors in $\mathbb Z^{n-1}$ separated by $\mathbf e_i$. If neither $i$ nor $j$ are equal to $n$, then the preimages of $\mathbf y_1, \mathbf y_2$ are separated by a path of length 2 (namely, the path consisting of edges $\pm \mathbf e_i$ and $\mp \mathbf e_j$). Hence, $d_{\mathbb Z^{n-1}}(\mathbf x_1, \mathbf x_2) \leq 2 \cdot d_{G}(f(\mathbf x_1), f(\mathbf x_2))$, and the proof is complete.


EDIT: Adding a second, purely combinatorial proof of the result, in case it's more satisfying:

Let $p_{n, k}$ be the probability that the condition is met with $n$ boxes after $k$ balls have been added to each one. Since we have reformulated the problem as returns of a random walk to the origin, the event will happen infinitely often if and only if $\sum_{k=1}^{\infty} p_{n, k} = \infty$. (This is because $\sum_k p_{n, k}$ gives the expected number of returns of the random walk to the origin, and since the number of returns is a geometric random variable, it is infinite with probability 1 if and only if it has infinite expectation.)

We'll flesh out the edge cases ($n = 3, n = 4$) for clarity: starting with $n = 3$, fix $a, b, c$ such that $a + b + c = k$. The probability that a single box has exactly $a$ copies of $1$, $b$ copies of $2$, and $c$ copies of $3$ is $$\binom{k}{a, b, c} \left( \frac 1 3 \right)^a \left( \frac 1 3 \right)^b \left( \frac 1 3 \right)^c = \binom{k}{a, b, c} \left( \frac 1 3 \right)^k$$ where the multinomial coefficient counts the number of ways to choose the order in which the various balls are distributed. It follows from this that $$p_{3, k} = \sum_{a + b + c = k} \left[ \binom{k}{a, b, c} \left( \frac 1 3 \right)^k \right]^2 = \left( \frac 1 3 \right)^{2k} \sum_{a + b + c = k} \binom{k}{a, b, c}^2.$$ By Theorem 4 of this paper by Richmond and Shallit, we see that the above above is asymptotically equal to $3^{2k + 3/2} (4 \pi k)^{-1}$, so the expected number of returns is $$\sum_{k=1}^{\infty} p_{3, k} \sim \sum_{k=1}^{\infty} \left( \frac 1 3 \right)^{2k}3^{2k + 3/2} (4 \pi k)^{-1} = \sum_{k=1}^{\infty} C k^{-1} = \infty.$$

For $n = 4$, the corresponding sum is $$\sum_{k=1}^{\infty} p_{4, k} \sim \sum_{k=1}^{\infty} \left( \frac 1 4 \right)^{2k}4^{2k + 2} (4 \pi k)^{-3/2} = \sum_{k=1}^{\infty} C k^{-3/2} < \infty$$ confirming the above approach.