# Q: What is Bayes’ rule and how do I use it to improve my life?

Mathematician: Bayes’ theorem in one of the most practically useful equations coming from the field of probability. If you take its implications to heart it will make you better at figuring out the truth in a variety of situations. What makes the rule so useful is that it tells you what question you need to ask to evaluate how strong a piece of evidence is. In my own life, I apply this concept nearly every week.

Let’s consider an example. Suppose you have a cough that’s been going on for days, and you’re not sure what’s causing it. You believe it could be caused by allergies (we’ll call this hypothesis A), or by a bronchitis (which we’ll call hypothesis B).

Now, let’s suppose that you take an anti-allergy medication, and for the next hour your cough disappears. We can view this occurrence as evidence, which we should expect to have a bearing on how much more likely A is than B. But how strong is this evidence, and how do we rigorously show which of our hypotheses it supports? Bayes’ rule tells us that the answer lies in the Bayes Factor, which is the answer to the following question:

“How much more likely would it have been for this evidence to occur if A were true than if B were true?”

This question completely captures how strongly the evidence supports A compared to B. It must lead to one of three conclusions:

1. The evidence would have been more likely to occur if A were true than if B were true. This implies the evidence supports A rather than B.
2. The evidence would be just as likely to occur if A were true as if B were true. This implies the evidence has no bearing on the question of whether A or B is more probable. Hence,  the “evidence” isn’t really evidence at all when it comes to evaluating the relative likelihood of these two hypothesis.
3. The evidence would have been more likely to occur if B were true than if A were true. This implies the evidence supports B rather than A.

In our cough example, the Bayes factor becomes the answer to the question:

“How much more likely would it be for my cough to disappear for an hour after taking an anti-allergy medication if I was suffering from allergies compared to if I had bronchitis?”

You can estimate this value in a rough way. Anti-allergy medication should have essentially no effect on coughs from bronchitis. It is possible that you have bronchitis and your cough just happened to go away by chance during this particular hour, but that is unlikely (certainly less likely than 1 in 10 for a long lasting, consistent cough). On the other hand, allergy medication tends to be fairly effective against coughs caused by allergies, so we should expect there to be at least a 1 in 3 chance that your cough would go away after taking the medication if you did in fact have allergies. Hence, in this case we can produce a conservatively low estimate for the Bayes factor with just a couple minutes of thought. How likely the cough was to have stopped given A we put at least at 1/3. How likely the cough was to stop given B we put at most at 1/10. Hence, the Bayes Factor, which is how likely the cough was to stop given A compared to how likely it was to stop given B, is greater than (1/3) / (1/10) = 3.3.

This should be interpreted as saying that we should now believe A is at least 3.3 times more likely compared to B than we used to think it was. In other words, the Bayes Factor tells us how much our new evidence should cause us to update our belief about the relatively likelihood of our two hypothesis. However many times more likely we thought A was than B before evaluating this anti-allergy medication evidence, we should update this number by multiplying by the Bayes factor. The result will tell us how much more likely A is than B having taken into account both our prior belief and the new information.

Now suppose that (since you get coughs caused by allergies a lot more often than you get bronchitis) you thought it was 4 times more likely that A was true than B before you took the anti-allergen. You now have more evidence, since you saw that the cough disappeared after taking the medicine. You’ve already calculated that the Bayes Factor is at least 3.3. All we have to do now is adjust our prior belief (that A was 4 times more likely than B) by multiplying by the Bayes Factor. That means you should now believe A is at least 13.3 = 3.3 * 4 times more likely than B.

Bayes’ rule is remarkably useful because it tells us the right question to ask ourselves when evaluating evidence. If we are considering two hypotheses, A and B, we should ask “how much more likely would this evidence have been to occur if A were true than if B were true?” On the other hand, if we just are evaluating one hypothesis, and we want to know whether evidence makes it more or less likely, we can replace B with “not A” and phrase the question as “how much more likely would this evidence have been to occur if A were true compared to if A were not true?” The answer to this question, which is the Bayes Factor for this problem, completely captures the strength of the evidence with regard to A. If the answer is much greater than 1, then the evidence strongly supports A. If the Bayes Factor is slightly bigger than 1, it slightly supports A. If it is precisely 1, the evidence has no bearing on A (i.e. the “evidence” doesn’t actually provide evidence with respect to A). If it is slightly below 1, it should slightly reduce your credence in A. If it is substantially below 1, it should substantially decrease your belief in A.

Unfortunately, the human brain does not always deal with evidence properly. Our intuition about what is, or is not evidence, and what is strong versus weak evidence, can be terribly wrong (see, for instance, the base rate fallacy). However, by thinking in terms of the Bayes factor, we can check our intuition, and use evidence much more effectively. We can avoid many thinking errors and biases. We simply need to get in the habit of asking, “How much more likely would this evidence have been to occur if A were true than if B were true?”

Worried that someone doesn’t like you because he hasn’t returned your phone call in two days? Ask, “how much more likely would this be to occur if he liked me than if he didn’t like me?”

Believe that “an absence of evidence for A is not evidence of absence of A”? Ask, “how much more likely would this absence of evidence for A be to occur if A were not true versus if A were true?”

Think that the stock you bought that went up 30% is strong evidence for you having skill as an investor? Ask, “how much more likely is it that my highest returning stock pick would go up 30% if I was an excellent investor compared to if I was just picking stocks at random?”

Proof:

Now, let’s break out some math to prove that what we’ve said is right. We’ll use P(A) to represent the probability you assigned to hypothesis A being true before you saw the latest evidence. We’ll use E to refer to the new evidence. We’ll let P(A|E) be the probability of A given that you’ve seen the evidence E, and P(A,E) will be the probability of both A and E occurring. Now, by definition, we have that:

$P(A|E) = \frac{P(A, E)}{P(E)}$.

The intuitive explanation behind this definition comes from observing that the probability of A when we know E (the left hand side) should be the same as how often both A and E are true compared to how often just E is true (the right hand side). Now, we can reuse this definition, but this time for P(E|A). This gives us:

$P(E|A) = \frac{P(A, E)}{P(A)}$.

Rearranging so that the last two expressions give P(A,E) alone on one side of the equation, and then setting them equal to each other, we get:

$P(A|E) P(E) = P(E|A) P(A)$.

Dividing both sides by P(E) yields the typical representation of Bayes’ rule:

$P(A|E) = \frac{P(E|A) P(A)}{P(E)}$.

Now, we can write the same expression but replace our first hypotheses A with our second hypotheses B. This yields Bayes’ rule for B:

$P(B|E) = \frac{P(E|B) P(B)}{P(E)}$.

Dividing the expression for P(A|E) by the expression for P(B|E) we get:

$\frac{P(A|E)}{P(B|E)} = (\frac{P(E|A)}{P(E|B)}) (\frac{P(A)}{P(B)})$

In words, this says that how much more likely A is than B after evaluating our evidence, which is the left side of our equation, is equal to the product of the two factors on the right. The second factor on the right is how much more likely A was than B before we saw our evidence. This reflects our “prior” belief about A and B. The first factor on the right is the Bayes Factor, which “updates” our prior belief to incorporate the new evidence. The Bayes Factor just says how much more likely the evidence would be to occur if A were true than if B were true. To summarize: how much more likely A is than B now is just equal to how much more likely A was than B before we saw our new evidence, times how much more likely this evidence would be to occur if A were true than if B were true.

If, rather than comparing two hypotheses, we want to just update our belief about the single hypothesis A, we can do this by substituting the event “not A”, written $\overline A$, for the event B. Then, our formula reads:

$\frac{P(A|E)}{P(\overline A|E)} = (\frac{P(E|A)}{P(E|\overline A)}) (\frac{P(A)}{P(\overline A)})$.

This entry was posted in -- By the Mathematician, Equations, Math, Probability. Bookmark the permalink.

### 8 Responses to Q: What is Bayes’ rule and how do I use it to improve my life?

1. Russ Abbott says:

You might find my wiki page on Bayes’ theorem useful.

2. micha says:

Shouldn’t that last formula be further simplified, using P(not-A) = 1 – P(A) to eliminate one unknown from the last term?

3. The Mathematician says:

Sure, if you’re into that sort of thing! There is no “right” way to simplify an expression. It’s just a matter of whatever is convenient or useful.

I understand the mathematics behind Bayes’s Theorem, but am still confused on how to apply it to real life problems.

For example, if I wanted to figure out the answer to the second example in this post:

“Worried that someone doesn’t like you because he hasn’t returned your phone call in two days? Ask, ‘how much more likely would this be to occur if he liked me than if he didn’t like me?'”

I would have to compute:

P(doesn’t like me | no call)/P(likes me| no call) = (P(no call | doesn’t like me)/P(no call | likes me))*(P(doesn’t like me)/P(likes me)).

Your post says “You can estimate this value in a rough way”, but any rough estimate seems to me not to help. One could argue any of the four probabilities on the right side of the equation to be higher or lower, just as one could argue the final answer without going through Bayes. (Of course, the base rate fallacy would successfully be dodged.)

Since the procedure is solid, my question essentially is: How do you find accurate values to use in the equation, when you are working on real problems and not some idealized textbook examples?

5. shivam sharma says:

how can calculus be used to improve life

6. Norma says:

This technology is NOT less than 5 yrs old.

The ANcient Egyptians knew it and used it. IT’s all over their texts.
The great Teacher, Yeshua ben Joseph, taught it, and used it.

It was taught in the Ancient Schools of Wisdom.
It was taught by the Cathars.

THAT”s why the State and Churches destroyed the Schools and slaughtered its adherents.

7. k says:

The thing is that you may estimate the probabilities from your previous experience:

* First, p1 = P(no call | doesn’t like me): Take someone who doesn’t like you, how often does this person return your calls? Take this as your value for p1 (or you can even do better by guesstimating how often “on average” people that don’t like to do not return your calls over two days). Eg. say on the 10 examples, I can think of, only one returned my calls, nine didn’t. So: p1 = 9/10 = 0.9

* Next, p2 =P(no call | likes me): Same approach. Does your typical friend (or, to be more precise, the typical person who likes you) often remain silent over two days? What is your estimate of this probability: p2 = 70% = 0.7 (people around me are very busy it is not rare that my friends call back only after two or three days).

* Now for p3 = P(doesn’t like me) and p4 = P(likes me): That’s a bit more tricky. How do you go along with people. Pretty well? Would you say that 4 out of 5 of the new persons you meet like you? then p4=80% =0.8 and p3=20% =0.3.

Now we can compute the Bayes Factor that this person who didn’t return your calls *doesn’t like you*:
BF = (.9/.7)*(.2/.8) = 0.32
Good! This BF is below 1. You shouldn’t worry.

In fact if you compute the inverse, namely the Bayes Factor that this person who didn’t return your calls (actually) *does like you*, you get:
BF = 1/0.32 = 3.11
Meaning that, even if he or she didn’t return your calls, it’s actually 2-times more probable that this person likes you. According to statisticians BF between 3 and 10 means that there is “substantial evidence” that you should not worry.

Aren’t you a much happier person (and less confused reader) now?