2

I have asked this question before but I think it wasn't clear what I implied with my succinct question, so I will be a bit more verbose this time.

Lets set the following example: Bernoulli trials, K=17 p=0.525 N=20,000

The probability of a streak of at least 17 consecutive successes in 20,000 trials is 15.3% The same but with N ten times larger, the probability of a streak of at least 17 consecutive successes in 200,000 trials is 81.01%

So my question is the following: is the probability of getting 17 consecutive successes still 81.01% if I run 10 independent trials of N 20,000?

If the law of large numbers are correct, nothing should change since N is simply incrementing, 20,000 today and 20,000 tomorrow is the same as running 40,000 straight, right? So what happens when I run 20,000 on the tenth day? Does that last 20,000 really have 81% of winning 17 streaks just because it is totaling 200,000? That definitely sounds like the Gambler's fallacy. If we consider that each trial is random and independent, 20,000 should always represent 15.3% regardless of how many times we run it...

It should be indistinct to be tossing the coin 200,000 nonstop and tossing 10 times groups of 20,000. How on Earth would pausing and resuming tosses change anything? Right? On the other hand each group of 20,000 tosses are independent and random so there is no way its probability of getting 17 streaks should increase.

So what is the right answer?

gt6989b
  • 54,422
  • This is not clear. Suppose My first block of $20000$ ends with $10;H's$ in a row, and the next starts with $7;H's$ in a row. Does that count? If it does, then you are back in the $200000$ case. If it doesn't, you aren't. – lulu Jul 17 '16 at 12:41
  • 1
    To make the contrast more plain: suppose you split your $200000$ trials into groups of ten. Then there is $0$ probability of getting seventeen $H's$ in a single block. – lulu Jul 17 '16 at 12:42
  • I guess that makes sense. I guess I wasn't considering the splitting of a streak between the groups. So there are 10 opportunities of breaking the streaks. But why are you saying there is 0 probability of getting seventeen H's in a single block? Isn't it still 15.3% according to the Bernoulli trials? – Dionysious Jul 17 '16 at 12:47
  • If I have blocks of length ten, then it is impossible to get $17;H's$ in a row within a single block, obviously. – lulu Jul 17 '16 at 12:50
  • My extreme example just amplifies your observation. Using groups of length ten, it is impossible to get the desired streak within a single block, yet of course there is a very high probability that we get the streak if we ignore the separation into blocks. All that means is that, as the block size decreases, the probability that a favorable streak spans multiple blocks increases. – lulu Jul 17 '16 at 12:53
  • Oh I see, you were talking groups of 10 trials, not dividing 20000 into 10 groups. Okay, yes. So if I keep doing 20,000 trials each day for a whole year totaling 7.300.000, it would still be fixed at the probability of 15.3%. – Dionysious Jul 17 '16 at 12:54
  • Per block, yes. Not for the entire year though. For completeness: suppose you ignore streaks which cross blocks. Then you have $10$ independent trials each with a win probability of $p=.153$. As @gt6989b correctly remarks, the probability that they all fail is $(1-p)^{10}=.847^{10}=0.190035222$ so the probability that at least one succeeds is $1-(1-p)^{10}=0.809964778$, so with your large blocks the probability that the only winning streaks span two blocks is, as expected, quite low. – lulu Jul 17 '16 at 12:58
  • If you do it every day for a year, the numbers are (naturally) even more stark. The probability that you fail every day over that year is now $.847^{365}$ which is effectively $0$. Thus the probability that you will have seen the desired streak at least once over that year is effectively $1$. – lulu Jul 17 '16 at 13:08
  • So we are back to having 80.99% of probability of getting 17 strikes after running all the 10 blocks of 20,000 trials each. – Dionysious Jul 17 '16 at 13:43
  • Which is not the same as $81.01%$! The difference is due to the possibility of straddling the blocks. Not a huge impact with such big blocks, but it grows as the block size shrinks. – lulu Jul 17 '16 at 14:32
  • Then my original question is still unanswered. But then, the prob of 1-(1-p)^days I made the following table. Days /Prob of getting 17 streaks.
    1. 15.30%
    2. 28.26%
    3. 39.24%
    4. 48.53%
    5. 56.41%
    6. 63.08%
    7. 68.73%
    8. 73.51%
    9. 77.56%
    10. 81.00%

    If each block/each day should be 15.3%, But according to this I have to expect a higher probability of getting the streaks as the day passes? How is that possible if each group is independent and random? Running 20K on the tenth day should be the same as running 20K the fifth or the first day, wouldn't it?

    – Dionysious Jul 17 '16 at 22:25
  • I am not following you at all. If an event occurs with probability $p$, and I make $n$ independent observations, then the probability that I never see the event is $(1-p)^n$. Obviously you have a higher chance of seeing the event if you have more trials, that's just common sense. If I throw $10$ dice I have a very low probability of getting all $6's$, but if I throw those dice millions of times then eventually I certainly expect to see it. – lulu Jul 17 '16 at 23:03
  • You appear to be confused by the basics of Bernoulli trials. Take a look at this introduction. You can see from this that, even if $p$ is quite low, if $n$ is large enough then you have a high probability of seeing what you want at least once. – lulu Jul 17 '16 at 23:05

1 Answers1

1

not quite the same, you could get 7 at the end of one trial and 10 at the beginning of the other, but that is a minor error. What you are essentially doing is running 1 bernoulli trial with 20k paths and $p = 15.3\%$, and then repeating it 10 times, so not getting any success has chance of $$(1-p)^{10}$$ which is indeed a reasonably small number...

gt6989b
  • 54,422
  • But then, the prob 1-(1-p)^days I made the following table. Days /Prob of getting 17 streaks. 1) 15.30% 2) 28.26% 3) 39.24% 4) 48.53% 5) 56.41% 6) 63.08% 7) 68.73% 8) 73.51% 9) 77.56% 10) 81.00% If each block/each day should be 15.3%, But according to this I have to expect a higher probability of getting the streaks as the day passes? How is that possible if each group is independent and random? Running 20K on the tenth day should be the same as running 20K the fifth or the first day, wouldn't it? – Dionysious Jul 17 '16 at 22:28
  • @Dionysious yes, getting the streak has same chance each day, but getting nothing in a sequence is a different story. intuitively, throwing a fair coin, getting 5 heads in 5 throws is a much smaller chance event than one head in one throw, although each throw is equivalent. – gt6989b Jul 17 '16 at 23:14
  • First thing we know is that for sure 200,000 trials has 81% of getting 17 strikes, just because of the shear amount. No problem understanding that. But how would be outcome if we split it by 20,000 trials a day, bear with me please. So on day one we know it is 15.3% of getting 17 strikes, and 99.97% of prob to get up to 11 strikes, and effectively we get 11 strikes and no more. Day 2, the accumulated N should be 40.000, and it has 99.98% prob of getting 12 strikes. Up to Day 10 that has 100% prob of getting 14 strikes. But it doesn't happen that way in my tests. It hits 11 strikes consistently – Dionysious Jul 18 '16 at 20:34
  • So how should I interpret that? – Dionysious Jul 18 '16 at 20:34
  • Theoretically am I supposed to see the max strikes increment each day? Shouldn't it be a random occurrence? – Dionysious Jul 18 '16 at 20:39
  • 1
    @Dionysious i don't understand what you are asking – gt6989b Jul 19 '16 at 13:28