Maximum Likelihood Estimation in Presence of Colored Noise

Question

I am trying to test system identification in presence of measurement noise (1) A white Gaussian noise (2) Colored noise - pink, violet. When we are estimating parameters we do so in presence of iid, zero mean uncorrelated noise.

Q1: I would like to know if the colored noise are correlated or not. I think they have different distribution but I could not find any information if the samples will be correlated or not.

Q2: In estimation we assume that noise is additive white gaussian noise which is uncorrelated iid. What happens when the noise is not gaussian then how do we estimate theta? For ex: $x = s(\theta) + Colored noise$ where we are trying to estimate $\theta$. Will the performance i.e, MSE vary with different levels of colored and non colored noise?

"colored" noise is about the power spectrum, not the distribution or p.d.f. actually, a conditional p.d.f. does have something to do with the spectrum, but not the unconditional p.d.f. — robert bristow-johnson, Jun 14 '14 at 04:20
@robertbristow-johnson:From the reply below, is the pdf always considered to be Gaussian for colored noise? — Ria George, Jun 17 '14 at 04:19
not always, but usually. a simple counter-example is high-pass triangular p.d.f. dither, $y[n]$ formed from uniform p.d.f. "white" noise $x[n]$ (what comes from a good rand() function) as so: $$ y[n] = x[n] - x[n-1] $$ that is colored noise (less spectrum at low frequencies), but it ain't gaussian p.d.f., it's triangular p.d.f. — robert bristow-johnson, Jun 18 '14 at 02:21

Dilip Sarwate · Accepted Answer · 2014-06-16T11:09:35.173

Samples of colored noise (taken at different times) generally are correlated random variables because the autocorrelation function of the noise process is not a delta function as it is in the case of white noise. Thus, if we assume a zero-mean process (noise is generally assumed to be regardless of its color), then the covariance of two signals separated in time by $\tau$ seconds is $R(\tau)$ where $R(t) = \mathcal F^{-1}(S(f)$ is the autocorrelation function of the process (inverse Fourier transform of power spectral density). Note that is is possible for $R(t)$ to be zero for some values of $t$ (e.g. $R(t) = \operatorname{sinc}(t)$ is a valid autocorrelation function), but it cannot be zero for all nonzero $t$.

As far as the density function of any sample, if the process is Gaussian, the sample is Gaussian even if the process has been filtered with a linear filter before sampling. But if the process is not Gaussian (it is, let us say, LaPlacian), then while each sample will be LaPlacian, the same cannot be said generally of samples of the process after filtering of any kind. In other words, Gaussianity survives linear filtering, LaPlacism generally does not.

So, how does maximum-likelihood estimation work when samples have correlated noise? Consider the case when we wish to estimate the unknown mean of a $\mathcal N(\mu, 1)$ random variable, and we have two observations $x$ and $y$. In the standard case of independent observations, the likelihood function is $$L(\mu) = \frac{1}{2\pi}\exp\left(-\frac{1}{2}\left[(x-\mu)^2+(y-\mu)^2\right]\right).$$ The _maximum-likelihood estimator for $\mu$ is the number $\hat{\mu}$ that maximizes $L(\mu)$, which works out to be the number $\hat{\mu}$ that minimizes $(x-\mu)^2+(y-\mu)^2$. This is a quadratic in $\mu$ and the maximum-likelihood estimate turns out to be $\hat{\mu}=\frac{x+y}{2}$. When the observations are correlated with correlation coefficient $\rho$, then $$L(\mu) = \frac{1}{2\pi\sqrt{1-\rho^2}} \exp\left(-\frac{1}{2\sqrt{1-\rho^2}} \left[(x-\mu)^2-2\rho(x-\mu)(y-\mu)+(y-\mu)^2\right]\right).$$ Once again we need to find the $\hat{\mu}$ where $(x-\mu)^2-2\rho(x-\mu)(y-\mu)+(y-\mu)^2$ has a minimum. We still have a quadratic in $\mu$ but now we get terms like $xy$ in the coefficients. What $\hat{\mu}$ works out to be is left for you to work out.

And what if we have $n$ observations where $n > 2$? All the above still applies. For independent identically distributed Gaussian noise in the samples, the sample mean $n^{-1}\sum_i x_i$ is the maximum-likelihood estimate of $\mu$ but in the case of correlated Gaussian random variables we get very messy minimization problems because the quadratic that we are trying to minimize depends on the inverse of the covariance matrix and the result is a nonlinear function of the data instead of a simple easy-to-remember result like the sample mean.

What if the noise is not Gaussian? The same principles apply -- set up the likelihood function and find where it attains its maximum value -- but the calculations are quite a bit different, all depending on what you assume or know is the joint density of the observations.

Maximum Likelihood Estimation in Presence of Colored Noise

1 Answers1