# Q: What’s the chance of getting a run of K or more successes (heads) in a row in N Bernoulli trials (coin flips)? Why use approximations when the exact answer is known?

The original question was: Recently I’ve come across a task to calculate the probability that a run of at least K successes occurs in a series of N (K≤N) Bernoulli trials (weighted coin flips), i.e. “what’s the probability that in 50 coin tosses one has a streak of 20 heads?”. This turned out to be a very difficult question and the best answer I found was a couple of approximations.

So my question to a mathematician is: “Why is this task so difficult compared e.g. to finding the probability of getting 20 heads in 50 coin tosses solved easily using binomial formula? How is this task in fact solved – is there an exact analytic solution? What are the main (preferably simplest) approximations and why are they used instead of an analytic solution?”

Physicist: What follows was not obvious. It was the result of several false starts. It’s impressive in the same sense that a dude with seriously bruised legs, who can find anything in his house with the lights off, is also impressive. If you’re looking for the discussion of the gross philosophy, and not the detailed math, skip down to where you find the word “Jabberwocky” in bold. If you want specific answers for fixed, small numbers of coins, or you want sample computer code for calculating the answer, go to the bottom of the article.

The short answer: the probability, S, of getting K or more heads in a row in N independent attempts (where p is the probability of heads and q=1-p is the probability of tails) is:

$S(N,K) = p^K\sum_{T=0}^\infty {N-(T+1)K\choose T}(-qp^K)^T-\sum_{T=1}^\infty {N-TK\choose T}(-qp^K)^T$

Note that here ${A\choose B}$ is the choose function (also called the binomial coefficients) and we are applying a non-standard convention that ${A\choose B}= 0$ for $A < B$ which makes the seemingly infinite sums always have only a finite number of terms. In fact, for N and K fixed, the answer is a polynomial with respect to the variable p.

Originally this was a pure math question that didn’t seem interesting to a larger audience, but we both worked for long enough on it that it seems a shame to let it go to waste. Plus, it gives me a chance to show how sloppy (physics type) mathematics is better than exact (math type) mathematics.

Define {Xi}j as the list of results of the first j trials. e.g., if j=4, then {Xi}j might be “{Xi}4=HTHH” or “{Xi}4=TTTH” or something like that, where H=heads and T=tails. In the second case, X1=T, X2=T, X3=T, and X4=H.

Define “Ej” as the event that there is a run of K successes (heads) in the first j trials. The question boils down to finding P(EN).

Define “Aj” as the event that the last K terms of {Xi}j are T followed by K-1 H’s ({Xi}j = X1X2X3X4…THHH…HHH). That is to say, if the next coin (Xj+1) is heads, then you’ve got a run of K.

Finally, define p = P(H) and q = P(T) = 1-p. Keep in mind that a “bar” over an event means “not”. So “$\overline{H}=T$” reads “not heads equals tails”

The probability of an event is the sum of the probabilities of the different (disjoint) ways that event can happen. So:

$\begin{array}{ll}&P(E_{j+1})\\i)&=P(E_{j+1}\cap E_j\cap A_j)+P(E_{j+1}\cap E_j\cap \overline{A_j})+P(E_{j+1}\cap \overline{E_j}\cap A_j)+P(E_{j+1}\cap \overline{E_j}\cap \overline{A_j})\\ii)&=\left[P(E_{j+1}\cap E_j\cap A_j)+P(E_{j+1}\cap E_j\cap \overline{A_j})\right]+P(E_{j+1}\cap \overline{E_j}\cap A_j)+P(E_{j+1}\cap \overline{E_j}\cap \overline{A_j})\\iii)&=\left[P(E_{j+1}\cap E_j)\right]+P(E_{j+1}\cap \overline{E_j}\cap A_j)+P(E_{j+1}\cap \overline{E_j}\cap \overline{A_j})\\iv)&=P(E_j)+P(E_{j+1}\cap \overline{E_j}\cap A_j)+P(E_{j+1}\cap \overline{E_j}\cap \overline{A_j})\\v)&=P(E_j)+P(E_{j+1}\cap \overline{E_j}\cap A_j)+0\\vi)&=P(E_j)+P(E_{j+1}|\overline{E_j}\cap A_j)P(\overline{E_j}\cap A_j)\\vii)&=P(E_j)+pP(\overline{E_j}\cap A_j)\\viii)&=P(E_j)+pP(\overline{E_j}| A_j)P(A_j)\\ix)&=P(E_j)+pP(\overline{E_j}| A_j)qp^{K-1}\\x)&=P(E_j)+qp^KP(\overline{E_{j-k}})\\xi)&=P(E_j)+qp^K\left[1-P(E_{j-k})\right]\\xii)&=P(E_j)+qp^K-qp^KP(E_{j-k})\end{array}$

iv) comes from the fact that $E_j \subset E_{j+1}$. If you have a run of K heads in the first j trials, of course you’ll have a run in the first j+1 trials. v) The zero comes from the fact that if the first j terms don’t have a run of K heads and the last K-1 terms are not all heads, then it doesn’t matter what the j+1 coin is, you can’t have a run of K heads (you can’t have the event Ej+1 and not Ej and not Aj). vii) is because if there is no run of K heads in the first j trials, but the last K-1 of those j trials are all heads, then the chance that there will be a run of K in the first j+1 trials is just the chance that the next trial comes up heads, which is p. ix) the chance of the last K trials being a tails followed by K-1 heads is qpK-1. x) If the last K (of j) trials are a tails followed by K-1 heads, then whether a run of K heads does or doesn’t happen is determined in the first j-K trials.
The other steps are all probability identities (specifically $P(C)=P(C\cap D)+P(C\cap \overline{D}), \, P(\overline{C})=1-P(C),$ and Bayes’ theorem: $P(C\cap D)=P(C|D)P(D)$).

Rewriting this with some N’s instead of j’s, we’ve got a kick-ass recursion: $P(E_N)=P(E_{N-1})+qp^K-qp^KP(E_{N-K-1})$

And just to clean up the notation, define S(N,K) as the probability of getting a string of K heads in N trials (up until now this was P(EN)).

S(N,K)=S(N-1,K)+qpK-qpKS(N-K-1,K). We can quickly figure out two special cases: S(K,K) = pK, and S(l,K) = 0 when l<K, and that there’s no way of getting K heads without flipping at least K coins. Now check it:

$\begin{array}{ll}&S(N,K)\\i)&=S(N-1,K)+qp^K-qp^KS(N-K-1,K)\\ii)&=\left[S(N-2,K)+qp^K-qp^KS(N-K-2,K)\right]+qp^K-qp^KS(N-K-1,K)\\iii)&=S(N-2,K)+2qp^K-qp^K\left[ S(N-K-1,K)+S(N-K-2,K)\right]\\iv)&=S(N-3,K)+3qp^K-qp^K\left[ S(N-K-1,K)+S(N-K-2,K)+S(N-K-3,K)\right]\\v)&=S(N-(N-K),K)+(N-K)qp^K-qp^K\left[ S(N-K-1,K)+\cdots+S(N-K-(N-K),K)\right]\\vi)&=S(K,K)+(N-K)qp^K-qp^K\left[ S(N-K-1,K)+\cdots+S(0,K)\right]\\vii)&=p^K+(N-K)qp^K-qp^K\sum_{r=0}^{N-K-1} S(r,K)\\viii)&=p^k+(N-K)qp^K-qp^K\sum_{r=K}^{N-K-1} S(r,K)\end{array}$

ii) Plug the equation for S(N,K) in for S(N-1,K). iii-vi) is the same thing. vii) write the pattern as a sum. viii) the terms up to K-1 are all zero, so drop them.

Holy crap! A newer, even better recursion! It seems best to plug it back into itself!

$\begin{array}{ll}i)&S(N,K)=p^K+(N-K)qp^K-qp^K\sum_{r=K}^{N-K-1} S(r,K)\\ii)&=p^K+(N-K)qp^K-qp^K\sum_{r=K}^{N-K-1} \left[p^k+(r-K)qp^K-qp^K\sum_{\ell=K}^{r-K-1} S(\ell,K)\right]\\iii)&=p^K+(N-K)qp^K-qp^K\sum_{r=K}^{N-K-1}p^k-qp^K\sum_{r=K}^{N-K-1}(r-K)qp^K+\left(qp^K\right)^2\sum_{r=K}^{N-K-1}\sum_{\ell=K}^{r-K-1} S(\ell,K)\\iv)&=p^K+(N-K)qp^K-qp^K\sum_{r=1}^{N-2K}p^k-\left(qp^K\right)^2\sum_{r=0}^{N-2K-1}r+\left(qp^K\right)^2\sum_{r=2K+1}^{N-K-1}\sum_{\ell=K}^{r-K-1} S(\ell,K)\\v)&=p^K+(N-K)qp^K-p^k(N-2K)qp^K-\left(qp^K\right)^2\frac{(N-2K)(N-2K-1)}{2}+\left(qp^K\right)^2\sum_{r=2K+1}^{N-K-1}\sum_{\ell=K}^{r-K-1} S(\ell,K)\\vi)&=p^K+{N-K\choose 1}qp^K-p^K{N-2K\choose 1}qp^K-{N-2K\choose 2}\left(qp^K\right)^2+\left(qp^K\right)^2\sum_{r=2K+1}^{N-K-1}\sum_{\ell=K}^{r-K-1} S(\ell,K)\end{array}$

You can keep plugging the “newer recursion” back in again and again (it’s a recursion after all). Using the fact that $\sum_{i=1}^Ni={N+1\choose 2}$ and $\sum_{i=D}^M {\ell \choose D}={M+1 \choose D+1}$ you can carry out the process a couple more times, and you’ll find that:

$\begin{array}{l}S(N,K)=p^K\left[1-{N-2K\choose 1}qp^K+{N-3K\choose 2}\left(qp^K\right)^2\cdots\right]+\left[{N-K\choose 1}qp^K-{N-2K\choose 2}\left(qp^K\right)^2+{N-3K\choose 3}\left(qp^K\right)^3\cdots\right]\\=p^K\sum_{T=0}^\infty {N-(T+1)K\choose T}(-qp^K)^T-\sum_{T=1}^\infty {N-TK\choose T}(-qp^K)^T\end{array}$

In the approximate (useful) case:

Assume that N is pretty big compared to K. A string of heads (that can be zero heads long) starts with a tails, and there should be about Nq of those. The probability of a particular string of heads being at least K long is pk so you can expect that there should be around E=Nqpk strings of heads at least K long. When E≥1, that means that it’s pretty likely that there’s at least one run of K heads. When E<1, E=Nqpk is approximately equal to the chance of a run of at least K showing up.

Jabberwocky: And that’s why exact solutions are stupid.

Mathematician: Want an exact answer without all the hard work and really nasty formulas? Computers were invented for a reason, people.

We want to compute S(N,K), the probability of getting K or more heads in a row out of N independent coin flips (when there is a probability p of each head occurring and a probability of 1-p of each tail occurring). Let’s consider different ways that we could get K heads in a row. One way to do it would be to have our first K coin flips all be heads, and this occurs with a probability $p^{K}$. If this does not happen, then at least one tail must occur within the first K coin flips. Let’s suppose that j  is the position of the first tail, and by assumption it satisfies $1 \le j \le K$. Then, the probability of having K or more heads in a row in the entire set of coins (given that the first tail occurred at $j \le K$) is simply the probability of having K or more heads in a row in the set of coins following the jth coin (since there can’t be a streak of K or more heads starting before the jth coin due to j being smaller or equal to K). But this probability of having a streak of K or more after the jth coin is just S(N-j,K). Now, since the probability that our first tail occurs at position j is the chance that we get j-1 heads followed by one tail, so it is $p^{j-1} (1-p)$ . That means that the chance that the first tail occurs on coin j AND there is a streak of K or more heads is given by $p^{j-1} (1-p) S(N-j,K)$. Hence, the probability that the first K coins are all heads, OR coin one is the first tails and the remainder have K or more heads in a row, OR coin two is the first tails and the remainder have K or more heads in a row, OR coin three is the first tails and…, is given by:

$S(N,K) = p^{K} + \sum_{j=1,K} p^{j-1} (1-p) S(N-j,K)$

Note that what this allows us to do is to compute S(N,K) by knowing the values of S(N-j,K) for  $1 \le j \le K$. Hence, this is a recursive formula for S(N,K) which relates harder solutions (with larger N values) to easier solutions (with smaller N values). These easier solutions can then be computed using even easier solutions, until we get to S(A,B) for values of A and B so small that we already know the answer (i.e. S(A,B) is very easy to calculate by hand). These are known as our base cases. In particular, we observe that if we have zero coins then there is a zero probability of getting any positive number of heads is zero, so S(0,K) = 0, and the chance of getting more heads than we have coins is zero, so S(N,K) = 0 for K>N.

All of this can be implemented in a few lines of (python) computer code as follows:

An important aspect of this code is that every time a value of S(N,K) is computed, it is saved so that if we need to compute S(N,K) again later it won’t take much work. This is essential for efficiency since each S(N,K) is computed using S(N-j,K) for each j with $1 \le j \le K$ and therefore if we don’t save our prior results there will be a great many redundant calculations.

For your convenience, here is a table of solutions for S(N,K) for $1 \le N \le 10$ and $1 \le K \le 10$ (click to see it enlarged):

This entry was posted in -- By the Mathematician, -- By the Physicist, Combinatorics, Equations, Math, Probability. Bookmark the permalink.

### 36 Responses to Q: What’s the chance of getting a run of K or more successes (heads) in a row in N Bernoulli trials (coin flips)? Why use approximations when the exact answer is known?

1. Boris says:

Guys, I’m the author of the original question. I already lost hope for an answer.

Sorry for asking such a difficult question that made you spend so much time on solving it. But your answer is really useful. Thanks a bunch!

2. SW VandeCarr says:

Why wouldn’t the following work?

The probability of a run of 15 is 0.5^15

There are n-k+1 ways for a run of length k to occur in a series of length n (think about it)

Therefore 26(0.5^-15)=.000793457

3. SW VandeCarr says:

That is for a n=40 and k=15 (see previous)

4. Mathematician says:

Your approach does not work because probability(A or B) = probability(A) + probability(B) only when A and B are mutually exclusive (i.e. they can’t both happen at the same time). The problem here is that having a run of 15 in a row starting at position 1 does not rule out the possibility of a run of 15 in a row starting at position 2, and so forth.

5. SW VandeCarr says:

Thanks.

6. Physicist says:

Your approach is a good low-probability approximation. If you had p=0.01 instead of p=0.5, for example, then then chance of runs greater than k, and especially the chance of two independent runs of k, would be very small.

7. Marc van Gestel says:

Hi,

I am trying to work this out in Basic programming language and therefore writing the sequence down for example S(3,2) which I know gives P = 0.3750, but I don’t succeed.

Now what I do is:

S(3,2) = p^2 + sum_{1,2} p^(j-1)*(1-p)*S(2,2)

with p = 0.5 and S(2,2) = 0.5^2

I get:

S(3,2) = 0.5^2 + sum_{1,2} 0.5^(j-1)*0.5*0.5^2

the sum works out for j = 1: (0.5^0)*0.5*0.5^2 = 0.5^3 and
for j = 2: (0.5^1)*0.5*0.5^2 = 0.5^4
totalling 0.5^2 + (0.5^3 + 0.5^4) = 0.4375??

what am I doing wrong?

thanks

Marc

8. Physicist says:

In the second term of the sum you should have S(1,2), which is zero. Instead you have S(2,2).

9. Marc van Gestel says:

Hi Physicist,

You are absolutely right! Thank you.

Marc

10. bennym says:

wow Physicist and guys, you are amazing.
So is there a simple way to work these out? Yes I mean a CALCULATOR

Also one more question that is killing me,
If we take a coin flip with p=0.5, so based on the above we know the probabliity of k success in a row over n trials. thats fine.

But what happens if there is another person tossing a different coin, and they toss it in turns, one toss person A and one toss person B.
Now- what is the probablity that both together will have k success in a raw, for example if k=10 than each one should have 5 success in a raw (or more) at the same time.
Now to be precise instead of coin tosses take two independent different events, each one with its own probability (can be same or different), the point is that each event has a different set of rules and hence a different sequences of win/loss.

Thnaks.

11. Physicist says:

Although different approximations apply for different values of p, K, and N, in general the first term from the sums should be fine (p^K+(N-K)qp^K). If not, since they’re alternating sums, you can add up terms until they’re smaller than the error you’re looking for.
In the first case, if the two coins have the same probability, then it doesn’t matter who’s flipping what. If you imagine someone writing down a list of flips, they don’t care where the flips are coming from. The equations in the post still apply to that list.
When you start talking about different events with different probabilities, then things get uncomfortably tricky. The equations in the post are already pretty nasty. The equations for this solution should be about twice as nasty. It takes less time (usually) to write a computer program to run through a couple million samples than to derive an exact solution. And really, who cares about the difference between 62.1% and 62.13%?

12. Neo says:

would you be able to post the complete python code?
I would like to run it for numCoins = 31.556.926, minHeads=21, headProb=.5
thank you!

13. schalila says:

the source code can be found at http://pastebin.com/cPWHQXUF and below (hopefully):

 #!/usr/bin/env python

 import sys def probOfStreak(numCoins, minHeads, headProb, saved=None): if saved == None: saved = {} ID = (numCoins, minHeads, headProb) if ID in saved: return saved[ID] else: if minHeads > numCoins or numCoins <= 0: result = 0 else: result = headProb ** minHeads for firstTail in xrange(1, minHeads+1): pr = probOfStreak(numCoins-firstTail, minHeads, headProb, saved) result += (headProb ** (firstTail-1)) * (1 - headProb) * pr saved[ID] = result return result 

if __name__ == "__main__": sys.setrecursionlimit(2000) print probOfStreak(int(sys.argv[1]), int(sys.argv[2]), float(sys.argv[3])) 

14. MJC says:

Mathematician says that “If this does not happen, then at least one tail must occur within the first K coin flips.” Your recursive relationship is built on the assumption that there is a single failure within the string of k successes. How do your take into account the possibilities where multiple failures occur?

15. Andrew says:

Guys I found this post because I’m reading “The Drunkard’s Walk” now and cannot understand one story from it.

Here it goes:

Imagine that George Lucas makes a new Star Wars film and in one test market decides to perform a crazy experiment. He releases the identical film under two titles: “Star Wars: Episode A” and “Star Wars: Episode B”. Each film has its own marketing campaign and distribution schedule, with the corresponding details identical except that the trailers and ads for one film say “Episode A” and those for the other, “Episode B”.

Now we make a contest out of it. Which film will be more popular? Say we look at the first 20,000 moviegoers and record the film they choose to see (ignoring those die-hard fans who will go to both and then insist there were subtle but meaningful differences between the two). Since the films and their marketing campaigns are identical, we can mathematically model the game this way: Imagine lining up all the viewers in a row and flipping a coin for each viewer in turn. If the coin lands heads up, he or she sees Episode A; if the coin lands tails up, it’s Episode B. Because the coin has an equal chance of coming up either way, you might think that in this experimental box office war each film should be in the lead about half the time.

But the mathematics of randomness says otherwise: the most probable number of changes in the lead is 0, and it is 88 times more probable that one of the two films will lead through all 20,000 customers than it is that, say, the lead continuously seesaws”

I must say I fail to see the reason why the leader won’t seesaw! Can you explain please?

16. The Physicist says:

The proof of that statement is a bit subtle, but (this is a simplification) it’s a little like saying that it’s unlikely to flip heads, tails, heads, tails, … forever.
This is actually an experiment you can try yourself by flipping coins and keeping track of the score. You’ll find that overall it will tend to wander away from 0. The exact walk is closely modeled by the “Wiener process” (did not make up that name).

17. zXakari says:

Ok, so given that we all now know how to calculate the probability of a run of K in N trials. Can we use that knowledge to calculate the expected number of times we will see a run of K for N trial? e.g. there is an 96% of getting a run of six (heads or tails) or more in 200 trials. So what is the expected number of times we will se a run of 6 or more in 200 trials?

18. Max says:

In case anyone is interested, I’ve written JavaScript code to implement the algorithm on this site. It’s at
http://maxgriffin.net/CalcStreaks.shtml
I’d be glad to share it; just drop me an email.

19. T. Elem says:

Max you are a wonderful human being.

20. Jeff says:

In case anyone else is interested, I translated Max’s Javascript code into a MATLAB function which is posted on the Mathworks file exchange here:
http://www.mathworks.com/matlabcentral/fileexchange/39713

It and it returns the same results as “the mathematician’s” Python function and Max’s Javascript function.

I hope others will find it useful, as I found the code here and at Max’s site.

Thanks for working out the solution!

21. Fintan Costello says:

Thank you, nice work!

S(N,K)=S(N-1,K)+q p^k – q p^k S(N-K-1,K)

and the fact that S(N,K)=0 when N<K you get an upper bound

S(N,K) < (N-K+1) q p^k

(because q p^k S(N-K-1,K) is never negative, and N-K recursions bring you to N<K.

22. Brian says:

Can this be reversed? For example if a baseball team plays 162 games per year and has a 10 game winning streak, what is the most likely winning percentage for that team?

23. Yaniv Tenenbaum Katan says:

Thanks for the great answer. This saved me a lot of time.

I’m using this result in a project I’m working on, and will be published. I therefore want to akcnolwedge the author of this solution.

I would be thankful if you could let me know who to akcnolwedge

24. Raj says:

I implemented the explicit (non-recursive) form in Mathematica in response to a question on Math Stackexchange, in case anyone is interested:

25. egbert says:

If you are looking for a streak of 6 in a population and you have a streak of 7, that would be 2 streaks of 6.

I worked the problem out once in a way that is similar to what Mathematician came up with, but I arrived at a slightly different recurrence relation, and I’m wondering if it’s simply a matter of some fancy algebra to make them equivalent.

My logic was that the chance of a streak of K or greater in a sequence of length N is the sum of the chances of a streak first occurring starting at position i, over all values of i where a streak can start ( i <= N-K+1 ). The first term for i=1 is of course p^K. For each subsequent valid value of i, the chance of a streak starting at toss i is (1-p)p^k since a flip of tails must occur at position (i-1) with a chance of (1-p) in addition to the streak of K heads in a row represented by p^K. For this streak to be the first streak in the trials, there cannot be a previous streak of K occurring anywhere in the first i-2 tosses leading up to the tails toss just prior to the start of the streak at i, for which the chance is (1-S(K,i-2)). So the final recurrence relation I get is this:

S(N,K) = p^K + SUM[i=2,N-K+1](1-p)p^K(1-S(i-2,K))

In LaTeX notation:

$$S(N,K) = p^{K} + \sum_{i=2}^{N-K+1}(1-p)p^{K}(1-S(i-2,K))$$

27. MarcP says:

Thanks guys. Your solution solves a real-world problem I have. Just to connect everyone with reality. My product is different, but imagine this here: I want to sell chains and I have a manufacturer of chains supplying them. The chains I sell have to be 100 parts long without quality issues but the machine producing them is making mistakes at a certain low probability. Question: How likely is it that in a 200-part chain I buy from my supplier there is the needed high-quality 100-part chain that I want to trade (the remainder I’ll dump)? I’ll need Excel code actually

28. MarcP says:

Here’s the VBA code for all those wanting to process it in e.g. Excel

 Public Function pStreak(nTries, nHits, p) Dim c As Collection Set c = New Collection pStreak = pS(nTries, nHits, p, c) Set c = Nothing End Function Private Function pS(nTries, nHits, p, c As Collection) On Error GoTo err pS = c(CStr(nTries)) 'if nonexisting the exception jumps to err label Exit Function err: If nHits > nTries Or nTries <= 0 Then result = 0 Else result = p ^ nHits For j = 1 To nHits px = pS(nTries - j, nHits, p, c) result = result + ((p ^ (j - 1)) * (1 - p) * px) Next End If c.Add result, CStr(nTries) 'we take the string of nTries so the collection index is hashed pS = result 

End Function