1

Say you are getting a random stream of binary strings. You get a 1 with a probability of p and a 0 with 1-p. You also have fixed string - s that you have access to. Let's write $s:=s_{1}...s_{n}$

You need to calculate the expected number of bits you need to receive in order for the random string to contain the string s. The answer can be an algorithm for computing it. Specifically, It can be dynamic programing which first calculates the expected value for substrings of s.

my attempts: Let's define X as a random variable which receives an infinite sequence of binary values $\left(a_{n}\right)_{n=1}^{\infty}$ and returns the first index i, for which s is contained in $a_{1}a_{2},...,a_{i}$. We are looking for the expectation of X. we know that: $$E\left[X\right]=\sum_{i\in N}i\cdot P\left(X=i\right)$$ and: $$P\left(X=i\right)=P\left(\left(a_{i-n}a_{i-n+1}...a_{i}\right)=s\,and\,s\notin a_{1},...,a_{i-n-1}\right)=P\left(\left(a_{i-n}a_{i-n+1}...a_{i}\right)=s\right)\cdot P\left(s\notin a_{1},...,a_{i-n-1}\right)$$ the first probability is easy to calculate. The second one is tricky and depends not only on the probability to get s, but on the structure of s itself. For example if 1 and 0 are equally likely, you are still more likely to get 01 in a random length 3 string than 00. I'm not sure how I would calculate this probability.

I have also tried calculating the expectation using Conditional expectation but I couldn't get it to work. for example, let's define Y to be analogous to X but for the string $s_{2},...,s_{n}$. We have: $$E\left[X\right]=E\left[X|s_{1}=a_{1}\right]\cdot\dfrac{1}{2}+E\left[X|s_{1}\neq a_{1}\right]\cdot\dfrac{1}{2}=E\left[X|s_{1}=a_{1}\right]\cdot\dfrac{1}{2}+\left(1+E\left[Y\right]\right)\cdot\dfrac{1}{2}$$ but I am not sure how to represent the first expectation with information about a substring.

BinyaminR
  • 567
  • 1
    I'd do it with Markov chains. Let $E_i$ be the expected number given that you have a current streak of $i$ letters (that is, your last $i$ letters are $s_1\cdots s_i$. Of course, the state $S_i$ can transition to $S_0$ or to $S_j$ for $j≤i$ depending on the nature of the special string $s$. – lulu Aug 19 '20 at 20:49
  • @lulu are the 'last letters' defined here? (since we are working with an infinite sequence) – BinyaminR Aug 19 '20 at 20:54
  • Yes. Say your special string is $11011101$ and your test string is $0010011\cdots$. then I start in $S_0$ of course, and then I stay there since I get a $0$. Then I stay there again because I got another $0$. But then I get a $1$ so I transition to $S_1$, but then I get a $0$ so I'm back in $S_0$. Wherever I am in the string, I am in one of those states. At the end of what I wrote I find myself in $S_2$ because I just got $11$, which are the first two characters of my special string. If I get a $0$ next, I'll be in $S_3$, but if I get another $1$ I will stay in $S_2$. – lulu Aug 19 '20 at 20:58
  • @lulu okay, so I understand how you can define the Markov chain, but can you explain how I can get the expectation from it? – BinyaminR Aug 19 '20 at 21:08
  • 1
    the transition probabilities give you a (potentially messy) system of linear equations to solve. As an exercise, do this for a simple string, like $1110$. That should be doable by hand. – lulu Aug 19 '20 at 21:20
  • @lulu understood, thanks! – BinyaminR Aug 19 '20 at 21:49
  • there is an example of a similar calculation in one of the answers of: https://math.stackexchange.com/questions/521130/expected-value-of-flips-until-ht-consecutively (the one by ely) – BinyaminR Aug 19 '20 at 21:54
  • but that computation is needlessly complicated. In the situation where you want $HT$ there are only two active states, $E_0$ before you get the first $H$ and $E_1$ after. And then we see $E_0=1+\frac 12\times (E_0+E_1)$ and $E_1=1+\frac 12\times E_1$ which instantly gives $E_0=4$ as desired. – lulu Aug 19 '20 at 23:24
  • @lulu I agree. There is a nice explanation about the technique in general at: http://www.aquatutoring.org/ExpectedValueMarkovChains.pdf – BinyaminR Aug 21 '20 at 12:53

0 Answers0