3

I am trying to learn conditional probability and wherever I go to learn about bayes theorem there is always this example of some rare disease for which test isn't perfectly accurate.

These examples usually show that because of probability of having a disease in the first place is so low, the test coming back positive isn't usually big indicator of having a disease and thus should be re-performed.

However I wonder if it actually reflects the reality because it always assumes (or at least that's the way I understand it) that these healthy people are tested on equal frequency as sick people.

However I don't see this a case, because how many healthy people are actually getting tested to even have a chance for getting false positive? Shouldn't this equation also include probability of going to be tested when you are in fact sick or healthy to be more accurate?

Slajni
  • 43
  • 1
    Why do you think it assumes healthy people and sick people are tested with equal frequency? You seem to be assuming that people are only given the test if they have symptoms of the disease, or something like that, but that isn't necessarily the case. There have been actual examples in the US where the CDC has recommended against routine testing for certain diseases because of this. Prostate cancer is one example if I recall correctly. – saulspatz Aug 27 '20 at 07:10
  • Related: https://math.stackexchange.com/questions/32933/describing-bayesian-probability – Henry Oct 08 '20 at 09:25

3 Answers3

3

I would argue that the "rare disease" example is indeed a good one because of the base rate fallacy: a tendency of people who are not familiar with Bayes' Theorem to ignore the prior probability that someone has the disease and focus on the sensitivity and specificity of the test. What Bayes' Theorem tells us is how to combine all three sources of information to draw correct inferences. Your question is a very natural one, however, given the unfortunate tendency of introductory probability and statistics texts to avoid thinking carefully about the meaning of the prior probability. This example may help.

Let $S$ be the event that you have Covid, $S^c$ be the event that you do not have Covid, $P$ be the event that you test positive and $N$ be the event that you test negative. If our lab test has specificity $1 - \alpha$ and sensitivity $\beta$, then:

$$\mathbb{Pr}(P|S) = \beta, \quad \mathbb{Pr}(P|S^c)=\alpha$$

If we define $\pi \equiv \mathbb{Pr}(S)$ then $\mathbb{Pr}(S^c) = 1 - \pi$ and by Bayes' Theorem:

$$\mathbb{Pr}(S|P) = \frac{\mathbb{Pr}(P|S)\mathbb{Pr}(S)}{\mathbb{Pr}(P|S)\mathbb{Pr}(S) + \mathbb{Pr}(P|S^c)\mathbb{Pr}(S^c)} = \frac{\beta \pi}{\beta \pi + \alpha(1 - \pi)} = \frac{1}{1 + \displaystyle \frac{\alpha}{\beta} \cdot \frac{(1 - \pi)}{\pi}}$$

From this expression we see that the probability of your having Covid depends on only two quantities: the ratio of false positive to true positives, namely $\alpha / \beta$, and the prior odds that you have Covid, namely $\pi / (1 - \pi)$.

Textbook examples typically give you values for $\alpha, \beta, \pi$ and simply ask you to turn the crank to calculate $\mathbb{Pr}(S|P)$. But a better way of thinking about the preceding formula is as a recipe for updating our initial belief about whether or not you have Covid, $\pi/(1 - \pi)$, after observing a positive test result. This recipe applies regardless of the values of $\pi$, $\alpha$, and $\beta$ although the precise conclusion will vary.

Let's suppose that the characteristics of the test, $\alpha$ and $\beta$ are fixed. For a test with excellent sensitivity and specificity, then perhaps $\alpha / \beta \approx 1/100$. If Covid is rare, you haven't shown any symptoms, and you don't know anyone who has been infected, then perhaps the prior odds that you have Covid are around $1/1000$. But what if you have shown symptoms and know someone who was infected? In this case, perhaps the odds could be even. For a test with $\alpha/\beta \approx 1/100$, the range of prior odds $\pi/(1-\pi) \in [0.001, 1]$ gives $\mathbb{Pr}(S|P) \in [0.09, 0.99]$ approximately.

So what's the answer? The key point here is that it depends on our initial beliefs. These beliefs, in turn, should depend on what I know about you before carrying out the test. Textbook examples tend to tacitly assume an initial position of complete ignorance: if I know nothing about you whatsoever, then it's reasonable to set $\pi$ equal to the base rate of Covid in the population as a whole. If I know that you've shown symptoms, then perhaps I should try to set $\pi$ based on prior information about the share of people with Covid among those with symptoms. But regardless of how I arrive at my choice of $\pi$, through outside information, clinical intuition, or pure speculation, I should always update my beliefs in the same way, using Bayes' Theorem and the test characteristics.

  • Wow, that is great answer. I couldn't grasp what is bothering me but your saying "tendency of introductory probability and statistics texts to avoid thinking carefully about the meaning of the prior probability" basically sums it up. Thank you for your answer, now everything is clear for me. – Slajni Aug 27 '20 at 11:00
1

You ask two different questions: "is it a good example" and "is it a realistic example". My answers would be "yes" and "no".

It is a good example as it shows the main usage of Bayes' theorem: reversing the condition. We know the probabilities of the results given the health conditions but are asked about the health given the result. What the theorem gives us is a way to change a question to one which conditions in a way we know how to handle. For that purpose, the example is good. In this context, it is also a good example of the effect the prior probability has on the end result.

As always, the model presented here is a partial model of reality. You can always improve it an add some probability of having a test, to begin with given the health condition or other factors to your liking. The point is usually not to get an accurate calculation about some test, but rather to understand Bayes' theorem. What I usually do when I teach it, is to ask the class about improvements. For example, to improve the test you can ask the positive subjects to have another one, test only people with some symptoms that are related to the decease and so on.

YJT
  • 4,621
  • So in general assuming that I got tested for some rare disease because I had some symptom of it this doesn't mean that the actual probability of me having the disease is based on sensitivity/specitivity of the test right? The fact that I have at least some of the symptopms should make me suspicious. – Slajni Aug 27 '20 at 07:18
  • 1
    The fact that you have some symptoms increase your prior probability, so given the symptoms, the disease for you is no longer "very rare" but only "rare", so the probability of being sick given a positive result increases. – YJT Aug 27 '20 at 08:51
1

That would just be reflected in the prior probability. Symptoms are just another piece of evidence (just like a test), after which you should update your belief using Bayes’ theorem. For instance, if you have a cough, it might be Covid, but there are many reasons why you might have a cough. So the symptom of a cough should increase the probability of Covid, but not by a lot without any further evidence.

On the other hand, if you feel a lot of pain in your ankle every time you put pressure on it, the probability of you having a broken ankle increases dramatically. So the prior probability of you having a broken ankle before a doctor looks at it has increased by a lot as well.

molarmass
  • 2,014
  • 13
  • 16