10

I have some difficulties to understand the Radon-Nikodym derivative and link it to the ordinary way of obtaining the probability density function, which is through the derivative of cumulative distribution function (c.d.f.). How can one intuitively link the Radon-Nikodym derivative to the non-measure theoretical definition of probability density function?

I think I need to be more clear here. If we have a c.d.f. then even if it is discontinuos, one can take the derivative and at discontinuous points we have dirac functions. How can one see this from Radon-Nikodym derivative point of view? It might be possible that the c.d.f. is almost nowhere differentiable I guess. Then we wont have a density. Is this case directly observable from Radon-Nikodym derivative point of view?

(Radon-Nykodim Theorem):

Let $(\Omega,\mathcal{F})$ be a $\sigma$-finite measurable space with two measures $\mu$ and $\nu$ on it, where $\nu$ is absolutely continuous with respect to the measure $\mu$. That is for every set $A\in\mathcal{F}$, $\mu(A)=0\Longrightarrow \nu (A)=0$. Then there exists a unique positive measurable function $f:\Omega\rightarrow [0,\infty)$, s.t.

$$\nu(A)=\int_A f \mathrm{d} \mu\quad\forall A\in\mathcal{F}$$

Question1: My first question is about the definition of absolute continuity of one measure w.r.t. another. It is only required that if one of them gives zero for some set $A$, then the other must also give zero. It seems that it is enough for the density function to exist. How is this absolute continuity is linked to the absolute continuity of functions? I know that a function is absolutely continuous if it is differentiable and the derivative is then integrable. But I cannot link this to the definition in Radon-Nykodim derivative. Doeas the definition of measures fill the gap?

Question2: Consider the set of probability measures $$\mathcal{Q}=\{Q:Q=(1-\epsilon)P+\epsilon H,\, H\in\mathcal{H}\}$$ where $\mathcal{H}$ is the set of all probability measures, $P$ is a probablity measure which has a density $p$. Then, it seems there are some $Q\in\mathcal{Q}$, which are not absolutely continuous with respect to $P$. As long as I understood, this only says that those $Q$ cannot have a density with respect to $P$ but they can have a density function with respect to another measure right? At any case since $H$ can be any measure I think there must be some $Q$ which do not accept any density function at all? Especially, if $H$ has some abrubt changes. Am I wrong?

Question3: Now consider the following set $$\mathcal{G}=\{g:D(g,f)\leq \epsilon\}\quad D(g,f)=\int g\log(g/f)\mathrm{d}\mu$$ where $f$ and $g$ are some density functions. In this set it is now known that every density exists and therefore their corresponding probability measures were Radon-Nikodym differrentiable. Here is it true to say that every measure $G_1$ corresponding to $g_1\in \mathcal{G}$ is absolutely continuous with respect to another measure $G_2$ corresponding to $g_2\in \mathcal{G}$?, therefore any $G$ corresponding to $g\in \mathcal{G}$ is also absolutely continuous w.r.t. $F$ corresponding to the density $f$? How to compare the set given in Question 3 to the set given in Question 2 in terms of the existence of the densities and absolute continuity issues?

My last question is a notational issue. In the papers I read they assume that $F$ and $G$ are absolutely continous w.r.t. some dominating measure e.g. $\mu=F+G$. Then I know that $f$ and $g$ exist but to use the same $\mu$ for every $G$? for examples when they define $D(g,f)=\int g\log(g/f)\mathrm{d}\mu$, there is only one $\mu$ but uncountable many number of $G$ and apperantly all of them has a density with respect to some measure, say $\phi_G$ but then who guarrantees that $\mu=\phi_G$ for all $G$.

It seems I have alot of confusions. I hope you can help me to clarify these issues. Thank you very much for reading this post. Any comment or answer will be highly apprecated.

1 Answers1

5

I learnt about RN derivative from "Real Analysis" by Folland, and would advise you to check it out there (Chapter 3) as it may answer your coming questions. In particular, Theorem 3.5 answers your Q1. It state that

If $\nu$ is a finite signed measure and $\mu$ is a positive measure, then $\nu\ll \mu$ iff for any $\varepsilon >0 $ there exists $\delta > 0$ such that $\mu(E)<\delta$ implies $|\nu(E)|<\varepsilon $ for any mesaurable $E$.

Now, if $\mu$ is our probability measure and $F$ is the corresponding CDF, then choosing the following $E = \bigcup_{k=1}^n(t_k,t_{k+1}]$ gives us that $\nu\ll \lambda$ implies that $F$ is absolute continuous (as a function). Here $\lambda$ denote the Lebesgue measure.

Regarding Q2: the density is defined relatively to another measure. Whatever measure $Q$ you take, it always has density w.r.t. itself - please, tell me if this fact is not clear to you. Furthermore, indeed if $P = \lambda$ and $H = \delta_0$ then $Q$ does not admit density w.r.t. $P$, however it clearly admits density w.r.t. $Q$ itself.

In probability theory it may be confusing that most of the time we are talking about densities w.r.t. $\lambda$, so that we do not even mention $\lambda$ and say just "density". For that reason you may forget that we are talking about relative density, as there is no "absolute" density at least in measure theory. There density is exactly RN derivative, hence it requires specifying the "denominator" measure.

Q3: I am not sure what exactly you mean here. If $\nu\ll\mu$ we can define KL divergence by $$ D(\nu,\mu) := \int \log\left(\frac{\mathrm d\nu}{\mathrm d\mu}\right)\mathrm d\nu = \int \frac{\mathrm d\nu}{\mathrm d\mu}\log\left(\frac{\mathrm d\nu}{\mathrm d\mu}\right)\mathrm d\mu \tag{1} $$ and this is defined purely in terms of measures, so it does not depend on their representation through densities.

Regarding your title question, please check out this and that. I'm expecting you'll reconsider and (or) reformulate your question after reading this answer, unless everything already became clear to you. Just come back and we can proceed. And I encourage you to check Folland's book in general.

Added: let's agree on the following - since there is some confusion regarding the notion of the density, we only use terms "function" and "RN derivative". We can define KL divergence $D(\nu,\mu)$ for measures $\nu\ll\mu$ as in $(1)$. We can also fix some reference measure $\psi$ and define a similar map for functional arguments, that is let $$ \bar D_\psi(g,f):= \int g \log\left(\frac gf\right)\mathrm d\psi \tag{1'} $$ for which to be well-defined, we assume that $$ \{f = 0\} \subseteq \{g = 0\} \tag{2}. $$ Now, these two notions are relates as follows: $\bar D_\psi(g,f) = D(\bar\nu,\bar\mu)$ where $$ \bar\nu(\cdot) := \int_{(\cdot)}g\,\mathrm d\psi\qquad \bar\mu(\cdot) := \int_{(\cdot)}f \,\mathrm d\psi $$ and of course $(2)$ implies that $\nu\ll\mu$. So indeed, to talk about the set $\mathcal G$ of all functions $g$ you need to assume that every function from this set satisfies $(2)$: but if you don't assume that the KL divergence would be infinite for those $g$ (you take integral of $\log$ of infinity) so for sure it is greater than $\epsilon$.


Let me also summarize some relations in one-dimensional case. The basic object is the probability measure $\mu:\mathscr B(\Bbb R) \to [0,1]$. Its CDF is a function on real numbers $F_\mu:\Bbb R\to [0,1]$, which is given by $F_\mu(x):=\mu((-\infty,x])$; hence, to each probability measure there corresponds its unique CDF. Vice-versa, from any function satisfying a couple of properties we can construct a probability measure whose CDF is given by the latter function, see e.g. here. Thus, probability measures on real line and CDFs are in one-to-one correspondence, only the former is a function of sets, whereas the latter is the function of real numbers. If $\mu \ll \lambda$ then its RN derivative $f_\mu := \frac{\mathrm d\mu}{\mathrm d\lambda}:\Bbb R \to \Bbb R_+$ is commonly referred to as a density function of $\mu$, however it would be more formal to say that $f_\mu$ is the density of $\mu$ w.r.t. $\lambda$. Notice that $$ F_\mu(x) = \int_{-\infty}^x\mathrm \mu(\mathrm dt) = \int_{-\infty}^xf_\mu(t)\, \lambda(\mathrm dt), $$ hence if $\mu\ll\lambda$, then by LDT we have that $F'_\mu(x)$ exists $\lambda$-a.e. and $F'_\mu(x) = f_\mu(x)$ ($\lambda$-a.e.) For example, if $F_\mu\in C^1(\Bbb R)$ then $F'_\mu$ is a version of the RN derivative $\frac{\mathrm d\mu}{\mathrm d\lambda}$, and by changing $F'_\mu$ on $\lambda$-null sets in any way we can obtain other versions of that RN derivative (since RN derivative is only defined uniquely $\lambda$-a.e.). In fact, in most of the practical cases we compute RN derivatives using usual derivatives; there are not many other methods to compute RN derivatives.

SBF
  • 36,041
  • I had a look at the book. Thanks again. Let me ask you first about the KL-divergence part, since it must be the simplest part. Yes as you said if $\nu$ is absolutely continuous w.r.t. $\mu$ then we can define the KL divergence. Similarly we can define it over the density functions, say $f$ and $g$. Here, we will need a measure $\mu$ to which both the measure $F$ and $G$ are absolutely continuous. – Seyhmus Güngören Nov 25 '14 at 19:21
  • So far no problem. When we define the set $\mathcal{G}$, it is composed of all densities $g$, which satisfy ${g:D(g,f)<\epsilon}$, for $f$ a known density function. At this point, every element ($g$) of this set is different from one other, although they all have the same symbol. If this is the case then either there must be some measure $\mu$ to which all $G$s, probability measures of $g$s are absolutely continuous, or in my opinion the given definition in the question would be wrong? – Seyhmus Güngören Nov 25 '14 at 19:21
  • @SeyhmusGüngören: edited a bit more – SBF Nov 25 '14 at 20:53
  • Okay so basically we still go through the distribution function to get the density for practical purposes. Do you know if there exists a single measure $\psi$ for which all we could call $D_\psi$ as simply $D$?. I mean every measure $\nu$ whose density $g\in\mathcal{G}$ is absolutely continuous with respect to the same measure $\psi$? – Seyhmus Güngören Nov 25 '14 at 22:05
  • @SeyhmusGüngören can you reformulate carefully your last question, I don't get it? In particular, a function can't be absolute continuous wrt a measure. Also, as I asked, please don't you the word density so far as it causes confusion, instead RN derivative and mention both measures. – SBF Nov 26 '14 at 06:05
  • okay. I am talking about the probability measures $G$, whose density functions $g$ are in the set $\mathcal{G}={g:D(g,f)\leq \epsilon}$. If I happen to say that there exists a single $\psi$ to which all $G$ are absolutely continuous. Is there such a dominating measure $\psi$? For example I have $3$ $G$ measures, $G_0,G_1,G_2$ and RN derivatives of all $G_i$, namely $g_i\in \mathcal{G}_i$, $i\in{0,1,2}$. Is it possible to have a single measure $\psi$ such that $\mathrm{d}G_i/\mathrm{d}\psi=g_i$ for all $i$? Now if we have uncountably many $G_i$ is there such a single $\psi$? – Seyhmus Güngören Nov 26 '14 at 10:17
  • @SeyhmusGüngörenL countably many - yes, just sum them up with coefficients $2^{-n}$; uncountably many - no: think of $G_x = \delta_x$ Dirac measure, for each $x\in \Bbb R$ – SBF Nov 26 '14 at 10:19
  • dirac measure? is it the reference measure $\psi$ of $\mathrm{d}G/\mathrm{d} \psi$? I came across with this post. http://mathoverflow.net/questions/130470/existence-of-dominating-measure-for-weak-compact-set-of-measures – Seyhmus Güngören Nov 26 '14 at 13:00