Expected number of consecutive guesses to get a given sequence of numbers

Question

I have a lock on my dorm door that's really stupid. Basically, it just checks whether or not the sequence of numbers I've put in is the combo, whether or not the lock was reset between guesses. So let's say my combo is 5556. Then I can input 555555556 into my lock and it'll unlock, without having to reset after inputting the first four numbers.

I tried to calculate the expected number of random number guesses to eventually input the right combo by assuming each "guess" was independent. For example, the input of 123454321 has 6 "guesses": 1234, 2345, 3454, 4543, 5432, 4321. Assuming this, the expected length of input required would be 10,000, for 10^4 permutations of a 4 digit combo.

However, to check my work, I created a simulation with a queue object and random number generators and ran 100 trials per experiment over 100 experiments in Python. In every experiment, the average was always above 10,000 by a significant margin, ranging from 500-2000.

I'm wondering, are the guesses really independent? What is the actual expected value?

The "guesses" are clearly not independent. If your first guess is 1234, it's guaranted that the next 3 guesses will fail. That is why the expected value is greater than 10^4. — Saša, Jan 08 '19 at 11:36
The average length is indeed 10,000. If your simulation suggests otherwise, then your simulation is not accurate. — Daniel Mathias, Jan 08 '19 at 12:51
Suggestion: try to think this problem through when the key length is $2$ and there are just two digits allowed. For that and other small cases you may be able to draw probability trees. — Ethan Bolker, Jan 08 '19 at 16:12
Fun related question: what's the shortest input string that is guaranteed to unlock the door? — eyeballfrog, Jan 08 '19 at 16:13

Peter Taylor · Accepted Answer · 2019-01-10T14:57:07.953

You can approach this as a Markov process. You find that the state transition table depends on the structure of the correct solution. To take two extremes, if the solution is $1234$ then your states are

Suffix: $\varepsilon$ goes to $1$ with probability $\frac1{10}$ and back to $\varepsilon$ with probability $\frac9{10}$
Suffix: $1$ goes to $12$ with probability $\frac1{10}$, to $\varepsilon$ with probability $\frac8{10}$, and back to $1$ with probability $\frac1{10}$.
Suffix: $12$ goes to $123$ with probability $\frac1{10}$, to $\varepsilon$ with probability $\frac8{10}$, and to $1$ with probability $\frac1{10}$.
Suffix: $123$ goes to $1234$ with probability $\frac1{10}$, to $\varepsilon$ with probability $\frac8{10}$, and to $1$ with probability $\frac1{10}$.
Suffix: $1234$ is capturing.

OTOH, if your solution is $1111$ then your states are

Suffix: $\varepsilon$ goes to $1$ with probability $\frac1{10}$ and back to $\varepsilon$ with probability $\frac9{10}$
Suffix: $1$ goes to $11$ with probability $\frac1{10}$, and to $\varepsilon$ with probability $\frac9{10}$
Suffix: $11$ goes to $111$ with probability $\frac1{10}$, and to $\varepsilon$ with probability $\frac9{10}$
Suffix: $111$ goes to $1111$ with probability $\frac1{10}$, and to $\varepsilon$ with probability $\frac9{10}$
Suffix: $1111$ is capturing.

Clearly the expected length should be longer for the second case than for the first: in both cases you need four consecutive successes, but in the first case a failure from one sequence can be the first success in another sequence.

In light of the comment

We tried using this line of reasoning to calculate the average, but it got way too convoluted.

here's how to do it without getting too convoluted. Take $1234$ as an example. Let $E_S$ denote the expected number of steps from suffix $S$ to the capturing suffix $1234$. The transitions convert directly into simultaneous equations $$\begin{eqnarray}E_\varepsilon &=& 1 + \frac{1}{10} E_1 + \frac{9}{10} E_\varepsilon \\ E_1 &=& 1 + \frac{1}{10} E_{12} + \frac{8}{10} E_\varepsilon + \frac{1}{10} E_1 \\ E_{12} &=& 1 + \frac{1}{10} E_{123} + \frac{8}{10} E_\varepsilon + \frac{1}{10} E_1 \\ E_{123} &=& 1 + \frac{1}{10} E_{1234} + \frac{8}{10} E_\varepsilon + \frac{1}{10} E_1 \\ E_{1234} &=& 0 \end{eqnarray}$$

We tried using this line of reasoning to calculate the average, but it got way too convoluted. This definitely seems like the way to approach it. Thanks! — Andrew Quoc-Anh Ho, Jan 09 '19 at 00:41
Actually, having seen Mike's answer, I think that's a much more elegant way to approach it. I won't be offended if you move the green tick. — Peter Taylor, Jan 09 '19 at 08:13

score 3 · Answer 2 · answered Jan 08 '19 at 19:42

We can prove the following general result:

Given a code $C$ of $n$ digits, for each $1\le i\le n-1$, let $b_i$ be a number which is $1$ if the first $i$ digits of $C$ equal the last $i$ digits of $C$, and $0$ otherwise. The expected wait time for $C$ is $$10^n+\sum_{i=1}^{n-1}b_i10^i.$$

For example, when $n=4$:

The expected wait time for codes like $aaaa$ is $11,110$.
The expected wait time for codes like $abab$ is $10,100$.
The expected wait time for codes like $abca$ is $10,010$.
The expected wait time for everything else is $10,000$.

To prove this, let us first assume that $b_i=0$ for all $i$, meaning no prefix of $C$ is also a suffix.

Imagine a casino with a ten digit roulette wheel. It spins this wheel once per minute, except that the casino shuts down once the code $C$ appears over the course of $n$ consecutive spins. Players may place an $\$x$ bet on the outcome of the spin; if they are wrong, they lost $\$x$, and if they are right, they win $\$9x$, so the bet is fair.

Imagine that every minute, a new person enters the casino. They first place a $\$1$ bet on the first digit of $C$. If they win, they place a $\$10$ bet on the second digit of $C$, and in general people who have won $k$ times place a $10^k$ bet on the $(k+1)^{st}$ digit of $C$. Note that anyone who does not make it to the end of $C$ will lose exactly $\$1$; for example, if they make it to digit two then lose, their net winnings are $+9+90-100=-1$. Only a person who makes it all the way through to the end of $C$ will win big, a total of $10^n-1$. This can only happen to one person, because we stipulated the casino shuts down once $C$ appears in order.

Since all of these bets are fair, the total expected winnings of all the players is $0$. On the other hand, letting $T$ be the total number of spins, the actual winnings are $10^n-T$, since the first $T-1$ people lose $1$ and the last person wins $10^n-1$. Equating these two, we get that the expected number of spins is $10^n$.

The full result comes from noting that when some of the $b_i$ are nonzero, then there are actually a couple more winners at the end of the game. Namely, the $i^{th}$ player from the end wins $10^i-1$ as long as the first $i$ digits of $C$ are equal to the last $i$ digits.

Wow your cases perfectly captured the averages I tried I'll have to look at this theorem more — Andrew Quoc-Anh Ho, Jan 09 '19 at 00:40

score 1 · Answer 3 · answered Jan 08 '19 at 16:04

Too long for comment. I used the following program to find out the average sequence length when the key is finally found.

import java.util.Random;

public class Competition {
    public static final String KEY = "1111";
    public static final int TOTAL_RUNS = 200000;

    public static int getSequenceLength(String key) {
        Random rnd = new Random();
        String current = "";
        int count = 0;
        while(!current.equals(key)) {
            // skip a few random numbers
            int skip = rnd.nextInt(10);
            for(int i = 0; i < skip; i++) {
                rnd.nextInt();
            }
            String digit = String.valueOf(rnd.nextInt(10));
            current += digit;
            if(current.length() > key.length()) {
                current = current.substring(1);
            }
            count++;
        }
        return count;
    }

    public static void main(String[] args) {
        long totalLength = 0;
        int totalRuns = 0;
        while(totalRuns < TOTAL_RUNS) { 
            totalLength += getSequenceLength(KEY); 
            totalRuns++;
            if(totalRuns % 1000 == 0) {
                String msg = String.format("Average sequence length after %d runs is %.2f", totalRuns, (totalLength / (double)totalRuns));
                System.out.println(msg);
            }
        }
    }
}

I have run 200.000 experiments (sequencies) for every key tested. It looked like Daniel's comment was correct (expected sequence lenght was about 10,000) for keys like 1234, 1122 or 5556.

But for keys like 3636 or 7474, the average sequence length stayed above 10,100. Maybe, it's just a kind of error that is expected. But for keys like 1111, 2222, 9999... I have consistently obtained sequencies of length well above 10,000, somewhere in 11,000+ range.

It could be that I'm just hitting some "regularity" in random number generator which is supposed to be more "random" but I doubt it. To make the sequence of digits as random as possible the program picks a random digit from a random number generator, then skips a few random numbers and then picks the next one. I doubt that Java random number generator can be so bad to generate a sequence which is always 10% longer than expected.

Skipping numbers from the RNG does not improve the results. That said, I get the same results, and Peter Taylor has explained the difference quite clearly. — Daniel Mathias, Jan 08 '19 at 17:22
@DanielMathias The purpose of my code was to prove that your comment was wrong, not to test the RNG. The expected sequence length is not 10.000 in a general case. — Saša, Jan 08 '19 at 18:44

Expected number of consecutive guesses to get a given sequence of numbers

3 Answers3

Linked