4

The scenario given by the problem is as follows:

We are testing for a disease D that we think is present, D+, with probability 0.4, and absent, D-, with probability 0.6. We believe that a test has sensitivity P{T+|D+}=0.75 and specificity P{T-|D-}=0.8.

Q1: What is our probability that the disease is present if we perform the test and it is positive, T+, and our probability the disease is absent if that test is negative, T-?

My ans:

We can apply Bayes' formula to calculate P{D+|T+} and P{D-|T-}.

P{D+|T+} = 5/7

P{D-|T-} = 24/29

Q2: Suppose we perform three tests, conditionally independent given D. Given each possible number of positive test results, 0, 1, 2, or 3, what is our probability that the disease is present?

My ans:

Let k denote the number of positive test results.

We know that P{ [exactly] k successes in n trials | p } = (nCk)x(p^k)x((1−p)^(n−k)), hence we can calculate P{k=0|D+} and P{k=0|D-}.

We can then apply Bayes' formula to calculate P{D+|k=0}.

P{D+|k=0} = 0.692

Using the same approach for k=1:3, we derive:

P{D+|k=1} = 0.5

P{D+|k=2} = 0.308

P{D+|k=3} = 0.165

Would really appreciate it if someone can verify whether my reasoning and answers are correct. Thank you for your help in advance.

RobPratt
  • 45,619
  • Your answer for $(2)$ is definitely wrong. As the number of positive tests for the disease increases, the probability of having the disease should also increase, which is opposite to the trend your calculations show. – Cathedral Jul 24 '22 at 04:43
  • Welcome to MathSE. This tutorial explains how to typeset mathematics on this site. – N. F. Taussig Jul 24 '22 at 10:18

1 Answers1

6

Consider any $k$ of the $n$ tests show +ve results. The probability of the tests showing $k$ +ve results for any random person is

$$P(k \text{ +ve's})=P(\text{D}^+)\cdot {n\choose k}\cdot p^k\cdot(1-p)^{n-k}+P(\text{D}^-) \cdot {n\choose k}\cdot (1-q)^k\cdot q^{n-k}$$

where $p$ represents the sensitivity and $q$ represents the specificity of the test.

Now, the probability that the chosen person has the disease given that $k$ of $n$ tests showed up positive is

$$P(\text{D}^+| \space k \text{+ve's})=\frac{P(\text{D}^+)\cdot {n\choose k}\cdot p^k\cdot(1-p)^{n-k}}{P(\text{D}^+)\cdot {n\choose k}\cdot p^k\cdot(1-p)^{n-k}+P(\text{D}^-) \cdot {n\choose k}\cdot (1-q)^k\cdot q^{n-k}}$$

You can now enter $n=3$ and $k=0,1,2,3$ to get the required answers:

$$P(\text{D}^+|\space k=0)\approx 0.02$$ $$P(\text{D}^+|\space k=1)\approx 0.196$$ $$P(\text{D}^+|\space k=2)\approx 0.746$$ $$P(\text{D}^+|\space k=3)\approx 0.972$$

which also follow the expected trend.

Cathedral
  • 1,185
  • The exercise took care to specify that the three tests are conditionally independent given $D^+.$ But it is unclear whether they are also conditionally independent given $D^-.$ We'd have to assume the latter (equivalently, assume that successive tests are always independent of one another) to use that $P(T_2^-|D^-T_1^+)=P(T_2^-|D^-T_1^-)=0.8.$ – ryang Jul 24 '22 at 06:48
  • @ryang It says Suppose we perform three tests, conditionally independent given D and not D$^+$ so I think it's safe to assume no test impacts the result of any other – Cathedral Jul 24 '22 at 07:36
  • @Cathedral Technically, "given E" means that E is an event, in which case "given D" is reasonably interpreted as "given (the event) that the patient is diseased" (I mean, the event isn't the disease itself, but the patient being afflicted or not afflicted). Of course, I do agree with your interpretation of what the exercise is trying to say. – ryang Jul 24 '22 at 07:40
  • 2
    That does makes a fair amount of sense, however given the low level of the problem, I'd go with my interpretation. Also, kudos on the extremely well written and researched answer(the link you provided). – Cathedral Jul 24 '22 at 07:49