Need help understanding conditional probability of continuous random variable conceptually

Question

I need help understanding some stuff about conditional density. Here is an example from my book.

A bird lands in a grassy region described as follows: $x ≥ 0$, and $y ≥ 0$, and $x + y ≤ 10$. Let $X$ and $Y$ be the coordinates of the bird’s landing. Assume that $X$ and $Y$ have the joint density: $f(x, y) = 1/50$ for $0 ≤ x$ and $0 ≤ y$ and $x + y ≤ 10$,

Given that the bird's landing $y$-coordinate is $2$, what is the probability that the $x$-coordinate is between $0$ and $5$?

$f(y) = \int_0^{10-y}\cfrac{1}{50}dx = \cfrac{10-y}{50}$ for $0\leq y \leq 10$

so $f(x|y) = \cfrac{f(x,y)}{f(y)} = \cfrac{1}{10-y}$ for $0 ≤ x$ and $0 ≤ y$ and $x + y ≤ 10$

The probability is $P(0 \leq X \leq 5| Y=2) = \int_0^5f(x|2) = \int_0^5\cfrac{1}{8}dx = 5/8$

I know the answer above is correct. I know how to calculate conditional probability. But my question is how come the given part $Y=2$ does NOT have a probability of $0$ since that's the probability that a continuous random variable equals an EXACT value I.E. $\int_2^2f(y)dy = 0$ making the answered undefined? Why is this not the case? What am I misunderstanding conceptually about the $Y=2$ in $P(0 \leq X \leq 5| Y=2)$ What am I misunderstanding conceptually about conditional density?

Also the conditional pdf will always have the same bounds of the joint pdf correct?

score 2 · Accepted Answer · answered Apr 06 '21 at 17:40

For a continuous random variable, although any specific outcome of that variable may have a probability of $0$ of being observed, that does not mean that once a realization is observed, it was an impossible event.

Continuous random variables are convenient abstractions for modeling real-world stochastic phenomena that theoretically occur on a continuous scale, but are rarely measured with infinite precision. For instance, we might model temperature or position or time as a continuous random variable, but we rarely measure (hence observe) these quantities as if they are truly continuous because those measurements are limited in the precision of the measuring instrument.

Once you are told that the bird landed somewhere on a line at exactly $Y = 2$, the outcome of that event was determined. You cannot assign a probability of $0$ because it has occurred. You already wrote $$f(x \mid y) = \frac{1}{10-y}, \quad 0 \le x \le 10-y, \quad 0 \le y < 10.$$ That means the conditional density of $x$ is uniform on $[0, 10-y]$.

Another way to think about the question is to look at the geometry. Once you are told that the bird lands on a line at $Y = 2$ within the triangle, you know that the length of this line is $10 - 2 = 8$. The probability that the bird's horizontal position on this line is between $0$ and $5$ is simply $5/8$.

So if we don't know what $Y$ is, the probability that $Y=$ any specific value is $0$. But if we do know what $Y$ is, the probability that Y equals that specific value is non-zero. Have I got this right? — , Apr 06 '21 at 18:44
Please take a look at my answer and see if you agree this is more subtle than it may seem. — nanoman, Apr 07 '21 at 05:39

score 2 · Answer 2 · answered Apr 06 '21 at 18:18

Indeed $P[Y=2]=0$. And so, indeed, we cannot define $P[0\leq X \leq 5 |Y=2]$ under the basic definition $P[A|B] = \frac{P[A \cap B]}{P[B]}$. We can only define it if we extend the definition of conditional probability to treat this new and special case of conditioning on $Y=2$ when $Y$ has PDF $f_Y(y)$ that is continuous and positive at $y=2$.

You can think of it being defined as $$ P[0\leq X \leq 5|Y=2] = \lim_{\delta \searrow 0} P[0\leq X \leq 5 |Y\in [2, 2+\delta]]$$

Indeed you can compute the right-hand-side because for $0<\delta<1$ we get \begin{align} &P[\{0\leq X \leq 5\} \cap Y \in [2, 2+\delta]] = \frac{5\delta}{50}\\ &P[Y \in [2, 2+\delta]] = \frac{\delta(8 + \delta/2)}{50} \end{align} and you can verify the limit as $\delta\searrow 0$ is $5/8$.

score 1 · Answer 3 · answered Apr 06 '21 at 17:49

The answer is that the conditional probability $P(A | B)$ may be nonzero even when the unconditional probability $P(B)$ vanishes. This can happen with discrete as well as continuous random variables. Let's say that any day that Elliott drives to work, he has a 7% chance of being pulled over (so event $A$ = "being pulled over) and that also Elliott does not have a car, so he has a 0% chance of driving to work that day and a 100% chance of taking public transit (so event $B$ = "driving to work that day"). Clearly it still makes sense to say $P(A | B) = 0.07$ despite the fact that $P(B) = 0$.

score 0 · Answer 4 · answered Apr 07 '21 at 03:16

There are several ways to answer this. There's the limit answers given in another answer. There's also an answer in terms of hyperreal numbers: the probability isn't zero, it's simply a lower order of hyperreal number. Another way of thinking about it is that we can use probability densities to calculate conditional probabilities (remember that when you graph a probability distribution function, the function has finite values, even though the probability of the variable being exactly one particular number is zero).

But for an intuitive understanding, I think it's best to think of conditional probabilities as the ratio between the "size" of different event spaces. $P(0 \leq X \leq 5| Y=2)$ is the "size" of the set $\{(x,y) :(0 \leq X \leq 5 )\land (Y=2)\}$ divided by the "size" of the set $\{(x,y) :(Y=2)\}$. Both of these sets are line segments, so the relevant "size" is their lengths. So the conditional probability is simply the ratio of their lengths. You're thinking of probability corresponding to the area of the sets, and seeing that their areas are zero, and so you're getting $\frac 0 0$. But area is only one type of "size", and conditional probabilities can be calculated by comparing any type of "size", as long as it's the same type for both sets. If we divide a volume by a volume, we get a finite number. If we divide an area by an area, we get a finite number. If we divide a length by a length, we get a finite number. If we divide a length by an area, we get zero. If we divide a length by an area, we get infinity.

The problem is that a 2D pdf is inherently based on area. Length is not a valid basis of comparison here, because it doesn't transform the same way as area under a change of variables. To describe a 1D subset by something that scales like area, it needs to also have a "relative thickness" function as mentioned in my answer. In other words, it matters how the 1D subset was obtained as a limit of 2D subsets. — nanoman, Apr 07 '21 at 05:50

score 0 · Answer 5 · answered Apr 07 '21 at 05:37

You are correct that this is a very subtle question, known as the Borel–Kolmogorov paradox. As Kolmogorov wrote:

The concept of a conditional probability with regard to an isolated hypothesis whose probability equals $0$ is inadmissible.

An indication that the naively computed conditional probability is ill-defined is that a nonlinear transformation of the variables may give inconsistent answers.

The resolution is that conditional probability should be defined in terms of a condition with finite probability. A limit may then be taken where the probability goes to zero, but the exact choice of condition may affect this limit, just as the value of a general indeterminate form $0/0$ may depend on how the limit is taken.

Intuitively, when we condition on a 1D subset of a 2D space, we need to specify not merely a curve but a "relative thickness" along the curve. This is a necessary input for the problem to be well-defined. Then, if we properly account for this thickness during a nonlinear transformation of variables, the conditional probability remains consistent.

In your example, the bird landing on the line $Y = 2$ can only be physically observed to some finite precision. The nontrivial implicit assumption being made in your example is that the precision with which $Y$ is observed is independent of the value of $X$. (Michael's answer denotes this precision by $\delta$ but does not explain the importance of the assumption.)

If the assumption holds for this choice of variables (Cartesian coordinates), it will not hold if we reparametrize $Y$ with a different coordinate, say $Y' = (Y - 2)/(X + 1)$. It is true that $Y = 2 \Leftrightarrow Y' = 0$, i.e., both describe the same line, but the densities along that line are non-trivially different: $f_{X,Y}(x, 2) = 1/50$ and $f_{X,Y'}(x, 0) = (x + 1)/50$.

A physical motivation for this $Y'$ is to imagine that you are standing outside the grassy region, at $(x, y) = (-1, 2)$. Then $Y'$ represents your line of sight to the bird. It's plausible that you have a fixed angular acuity in your observation, so you'd actually have an $X$-independent precision in $Y'$ rather than in $Y$. So in this case, we get a well-defined but different answer: Conditional on observing $Y' = 0$, the probability that $0 \le X \le 5$ is $7/16$, which is less than $5/8$. This makes sense: There's a lower probability for the bird to be at small $X$, because if it's at large $X$ (farther from you) it can more easily "appear" to be on the line $Y = 2 \Leftrightarrow Y' = 0$.

Of course, we can also solve the "line of sight" version in the original Cartesian coordinates $(X, Y)$ if we account for the "thickness" factor mentioned above. We are conditioning on a wedge rather than a rectangle around $Y = 2$, and this non-uniformity has an effect on the conditional probability that persists even when we take the limit of an extremely precise observation where the wedge and rectangle both shrink to a line.

Need help understanding conditional probability of continuous random variable conceptually

5 Answers5