I've been attempting to understand the proof of the Donsker-Varadhan dual form of the Kullback-Liebler divergence, as defined by $$ \operatorname{KL}(\mu \| \lambda) = \begin{cases} \int_X \log\left(\frac{d\mu}{d\lambda}\right) \, d\mu, & \text{if $\mu \ll \lambda$ and $\log\left(\frac{d\mu}{d\lambda}\right) \in L^1(\mu)$,} \\ \infty, & \text{otherwise.} \end{cases} $$ with Donsker-Varadhan dual form $$ \operatorname{KL}(\mu \| \lambda) = \sup_{\Phi \in \mathcal{C}} \left(\int_X \Phi \, d\mu - \log\int_X \exp(\Phi) \, d\lambda\right). $$
Many of the steps in the proof are helpfully outlined here: Reconciling Donsker-Varadhan definition of KL divergence with the "usual" definition, and I can follow along readily.
However, a crucial first step is establishing that (for any function $\Phi$) $$\tag{1}\label{ineq} \operatorname{KL}(\mu\|\lambda)\ge \left\{\int \Phi d\mu-\log\int e^{\Phi}d\lambda\right\},$$ said to be an immediate consequence of Jensen's inequality. I can prove this easily in the case when $\mu \ll \lambda$ and $\lambda \ll \mu$:
$$ \operatorname{KL}(\mu\|\lambda) - \int \Phi d\mu = \int \left[ -\log\left(\frac{e^{\Phi}}{d\mu / d\lambda}\right) \right] d\mu \ge -\log \int \frac{e^{\Phi}}{d\mu / d\lambda} d\mu = -\log\int\exp(\Phi)d\lambda.$$ However, this last step appears to crucially rely on the existence of $d\lambda/d\mu$ and thus that $\lambda \ll \mu$, which isn't assumed by the overall theorem. Where I have been able to find proofs of the above in the machine learning literature, this assumption seems to be implicitly made, but I don't believe it is necessary and it is very restrictive.
My question is: how can we prove \ref{ineq} without assuming $\lambda \ll \mu$?