3

A COVID test has an accuracy of 99% for sick person (meaning a sick person will get a COVID+ result with a chance of 99%) and an accuracy of 93% for non-sick person (meaning a non-sick person will get a COVID- result with a chance of 93%). Also suppose that 1 person in a million, is sick. If a person gets $n$ COVID+ results, what is the chance that they are sick? Note that test results won't affect each other and so they are independent.


my try: I solved this for 1 test result using conditional probability and got this: $$P(A|B)=\frac{P(A\cap B)}{P(B)} = \frac{99\%\times 10^{-6}}{99\%\times 10^{-6} + 7\%\times 999,999\times10^{-6}}$$ Then I tried to expand the results for two tests. I mean to find the chance of being sick if we know the first and second tests are positive. I tried to use the 3 event extension of Bayes' theorem: $$P(A|B,C) = \frac{P(B|A,C)P(A|C)}{P(B|C)}$$ assuming A is "being sick", B is first positive test and C is the second positive test. But couldn't find $P(B|A,C)$ and that is where I am stuck and cannot go further. I would be thankful if you help me.

  • Can you please edit your post to include what you found for 2 tests? – Anne Bauval Mar 13 '23 at 14:43
  • 3
    You seem to be assuming repeated tests are independent. That is not obviously correct in reality, so you should at least state it. – Henry Mar 13 '23 at 14:46
  • 1
    @AnneBauval I just added my idea for two cases and the reason I couldn't go further. Because all other terms except the one mentioned, are obvious. – Jacob Martina Mar 13 '23 at 16:01
  • @Henry You are right. I assumed test results won't affect each other so they are independent. Added this statement to the question – Jacob Martina Mar 13 '23 at 16:02
  • For 2 tests, why not simply write $P(A|B\cap C)=\frac{P(A\cap B\cap C)}{P(B\cap C)}$ and compute it like you did for 1 test? – Anne Bauval Mar 13 '23 at 16:08
  • @AnneBauval B and C are independent but I have no idea how to calculate $P(A\cap B\cap C)$. I wanted to use inclusion-exclusion principle but I also cannot find $P(A\cup B \cup C)$ as well. I appreciate if you have any idea on how to calculate any of these terms. – Jacob Martina Mar 13 '23 at 16:22

4 Answers4

2

A COVID test has an accuracy of $\textbf v$ for sick person (meaning a sick person will get a COVID+ result with a chance of $\textbf v$) and an accuracy of $\textbf f$ for non-sick person (meaning a non-sick person will get a COVID- result with a chance of $\textbf f$). Also suppose that the disease prevalence is $\textbf p.$ Note that test results won't affect each other and so they are independent.

enter image description here

Legend. $$P(\text{actually sick} \mid \text{all $n$ results are positive})\\ =\frac{P(\text{actually sick and all $n$ results are positive})}{P(\text{all $n$ results are positive})}\\ =\frac{pv^n}{pv^n+(1-p)(1-f)^n}. $$ (The denominator corresponds to Branches 1 and 3 in the above diagram.)

P.S. That boldfaced assumption is not in general true: for example, the second of two successive COVID swab tests on the same subject is more probable than the first to register negative, due a lower concentration of antigens in the second reagent tube.

ryang
  • 38,879
  • 14
  • 81
  • 179
1

Addendum-1 added to respond to the comment/question of Jacob Martina.


Responding as followup on the comment already given by Anne Bauval, since no one else has explicitly shown the calculations. Also, responding because Covid questions get answers as clear as I can make them.

There are two ways of getting $n$ consecutive positive results. Either:

  • because the patient has the disease and has gotten $n$ consecutive true positives.
    The probability of this happening is

    $\displaystyle A = \left[ ~10^{-6} ~\right] \times \left[ ~(0.99)^n ~\right].$

  • because the patient does not have the disease and has gotten $n$ consecutive false negatives.
    The probability of this happening is

    $\displaystyle B = \left[ ~1 - 10^{-6} ~\right] \times \left[ ~(1 - 0.93)^n ~\right].$

Then, the conditional probability that the person has Covid, given the premises of the problem, including the premise that the person had $~n~$ consecutive positive results is

$$\frac{A}{A + B} $$

$$= \frac{\left[ ~10^{-6} ~\right] \times \left[ ~(0.99)^n ~\right]}{\left\{ ~\left[ ~10^{-6} ~\right] \times \left[ ~(0.99)^n ~\right] ~\right\} + \left\{ ~\left[ ~1 - 10^{-6} ~\right] \times \left[ ~(1 - 0.93)^n ~\right] ~\right\}}$$

$$= \frac{\left[ ~10^{-6} ~\right] \times \left[ ~(0.99)^n ~\right]}{\left\{ ~\left[ ~10^{-6} ~\right] \times \left[ ~(0.99)^n ~\right] ~\right\} + \left\{ ~\left[ ~999999 \times 10^{-6} ~\right] \times \left[ ~(0.07)^n ~\right] ~\right\}}$$


Addendum

The above approach merely represents the Mathematical attack given by the problem constraints. In reality, it is highly unlikely that Covid tests are independent of each other. So, you have to take the above computations with a huge grain of salt.


Addendum-1
Responding to the comment/question of Jacob Martina.

How can we know the wanted probability is $~\dfrac{A}{A+B}~$ and the denominator is not squared, (as in) $~\dfrac{A}{(A+B)^2}.~$

To answer this question, I will need to stretch your intuition. First, consider the following diagram.

In the diagram, the enclosed rectangle represents the Universe, and the two circles represent events $~E_1~$ and $~E_2.~$ Now, consider two questions:

  • Based on the diagram, what is the physical representation of the probability of event $E_2$ occurring?

    Answer : It is the fraction $~~\displaystyle \frac{\text{Area of the circle}~E_2}{\text{Area of rectangle} ~\textit{Universe}}.$

  • Based on the diagram, what is the physical representation of the probability of event $(E_2|E_1)$ occurring, which represents the probability of event $~E_2~$ occurring, under the assumption that event $~E_1~$ has occurred?

    Answer : It is the fraction $~~\displaystyle \frac{\text{Area of the intersection}~E_1 \cap E_2}{\text{Area of the circle}~E_1}.$

The point of the bullet point questions/answers is to stretch your intuition, around these ideas.

The mathematical equation that represents the answer to second bullet point question above is

$$\displaystyle p(E_2|E_1) = \frac{p(E_2 \cap E_1)}{p(E_1)}.$$

Before you proceed further, reviewing this section of my answer, I suggest you read the Wikipedia - Conditional Probability. Note that this Wikipedia article uses the variable $~A~$ to represent event $~E_2~$ and the variable $~B~$ to represent $~E_1.$

Now, please refer back to the diagram. Using my variables, $~E_2~$ and $~E_1,~$:

  • Let $~A~$ denote the probability of the events $~E_1~$ and $~E_2~$ both occurring.
    That is, $~A = p(E_1 \cap E_2).~$

  • Let $~B~$ denote the probability of the events $~E_1~$ and $[\neg E_2]~$ both occurring.
    That is, $~B = p(E_1 \cap [\neg E_2]).~$

This implies that $~p(E_1) = A + B.$

This explains why

$$p(E_2 ~| ~E_1) = \frac{p(E_1 \cap E_2)}{p(E_1)} = \frac{A}{A+B}.$$

Now, it remains to interpret the problem, so as to find a convenient choice, for events $~E_1~$ and $~E_2.$

You are given that $~n~$ consecutive positive results have occurred. You are then asked for the probability that the patient has the virus.

So, I let :

  • $~E_1~$ denote the event that $~n~$ consecutive positive results occurred.

  • $~E_2~$ denote the event that the patient has the virus.

Therefore, I am being asked to compute $~p(E_2 ~| ~E_1).$

If you review the start of my answer, you will see that I am (in effect) letting:

  • $~A~$ denote the $~p(E_1 \cap E_2).$

  • $~B~$ denote the $~p(E_1 \cap [\neg E_2]).$

This explains my computation $~\dfrac{A}{A + B}.$

user2661923
  • 35,619
  • 3
  • 17
  • 39
  • I just didn't understand one part and am confused. how can we know the wanted probability is $A/A+B$ and the denominator is not squared ($\frac{A}{(A+B)^2}$). I would be thankful if you provide some more hint or visualization of this part. – Jacob Martina Mar 14 '23 at 12:57
  • 1
    @JacobMartina See the Addendum-1 that I have just added to the end of my answer. – user2661923 Mar 14 '23 at 18:45
0

There is a form of Bayes theorem helpful for such problem: $\frac{P(A|B)}{P(\bar A | B)} = \frac{P(A)}{P(\bar A)} \cdot \frac{P(B | A)}{P(B | \bar A)}$ (so-called "odds form").

Taking $A$ as "person is sick", $B$ as "person got positive result $n$ times" and $C$ as "person got positive result $1$ time", we know $P(A)$, $P(C | A)$, $P(C | \bar A)$ and, if we assume $P(B | A) = (P(C|A))^n$ (assuming tests are independent given true person status), we can find $\frac{P(A|B)}{P(\bar A | B)}$.

Then, we can find $P(A | B)$ as $\frac{\frac{P(A|B)}{P(\bar A | B)}}{1 + \frac{P(A|B)}{P(\bar A | B)}}$.

mihaild
  • 15,368
0

For 2 tests, why not simply write $$P(A\mid B\cap C)=\frac{P(A\cap B\cap C)}{P(B\cap C)}$$ and compute it like you did for 1 test?

$P(A\cap B\cap C)=(0.99)^210^{-6},$ and $P(B\cap C)$ is the sum of that and of $P(\bar A\cap B\cap C)=(0.07)^2999999\cdot10^{-6}.$

You will easily generalize to $n$ tests.

Anne Bauval
  • 34,650