Maximum Likelihood for Colored Noise

Question

I have the following question about the maximum likelihood (ML) in presence of inter-symbol interference and colored noise.

Assume the communication system is as follows. Information source, modulator, transmit pulse filtering, channel, AWGN, matched receiver pulse filtering, sampler, and then ML estimator.

The received signal after the sampler is given as

$$y = h\star h\star c\:a + n\star h$$

where $y$ is the received samples (contains transmit symbols + ISI + noise), $a$ is the transmit symbols, $h$ transmit/receive pulse shaping filter, $c$ is the channel impulse response, $n$ is the AWGN, and $\star$ denotes convolution. This can be further written as

$$y = A\:a + \eta$$

where $A$ is a convolution matrix whose elements contains $h\star h\star c$ and $\eta$ is a colored noise.

I wonder what will be the ML in this case?
And how to derive/calculate it?

Thanks.

Could you please mark my answer? Or address what is missing? Thank You. — Royi, Nov 21 '22 at 18:55

Royi · Answer 1 · 2021-11-12T07:25:00.133

3

Let's have a look on the following model:

$$ y \left[ n \right] = \left( h \ast x \right) \left[ n \right] + \left( g \ast w \right) \left[ n \right] $$

Where $ x \left[ n \right] $ is the signal of interest and $ w \left[ n \right] $ is the AWGN with unit Variance.

In Matrix form it is written by:

$$ \boldsymbol{y} = H \boldsymbol{x} + G \boldsymbol{w} $$

Where $ H $ and $ G $ are the convolution matrices of the model.

Let's define $ v = G \boldsymbol{w} $ then $ v \sim \mathcal{N} \left( \boldsymbol{0}, G {G}^{T} \right) $ which implies (If the input to Linear Operator is Gaussian Variable then the output is also Gaussian Variable):

$$ \boldsymbol{y} \sim \mathcal{N} \left( H \boldsymbol{x}, G {G}^{T} \right) $$

Then the Maximum Likelihood is given by:

$$ \arg \max_{ \boldsymbol{x} } \det \left( 2 \pi G {G}^{T} \right)^{-\frac{1}{2}} {e}^{ -\frac{1}{2} {\left( H \boldsymbol{x} - \boldsymbol{y} \right)}^{T} {\left( G {G}^{T} \right)}^{-1} \left( H \boldsymbol{x} - \boldsymbol{y} \right) } $$

It is easy to see you only need to deal with the term in the power of the Exponent, so it is equivalent of:

$$ \arg \min_{ \boldsymbol{x} } {\left( H \boldsymbol{x} - \boldsymbol{y} \right)}^{T} {\left( G {G}^{T} \right)}^{-1} \left( H \boldsymbol{x} - \boldsymbol{y} \right) $$

Which is nothing more than Weighted Least Squares problem which matches the result of Linear Regression for Colored Noise (As expected for Gaussian Noise).

The optimal solution is given by:

$$ \hat{x} = {\left( {H}^{T} {\left( G {G}^{T} \right)}^{-1} H \right)}^{-1} {H}^{T} {\left( G {G}^{T} \right)}^{-1} x $$

edited Nov 12 '21 at 07:25

answered Jul 25 '18 at 05:14

Royi

19,608
4
197
238

Hi: I had only read upto the argmax determinant and I guess missed the argmin part below it. My mistake and my apologies. I would think that your expression must reduce to Aitken since it's also closed form. Note that, if the variance is not known ( in this case it's one but in real life, probably not ), your likelihood expression becomes 2 dimensional in which case FGLS is probably preferred. – mark leeds Jul 26 '18 at 16:58
@markleeds, You seem to misunderstand what you call Aitken method. It is the exact as the above. Nothin more than the case of Weighted Least Squares. Read closely the model you linked to. – Royi Jul 26 '18 at 20:28
I really don't understand your remarks. In case above the noise doesn't have Variance in the sense of a scalar (Not even necessarily Scalar Matrix). It has a Covariance Matrix which is assumed to be known as in the paper you attached. If the Covariance Matrix isn't known the ML become a complete different story. – Royi Jul 27 '18 at 05:43
Hi Royi: I was referring to the case where the matrix is diagonal but the variances on the diagonal are all different. You are correct that the paper I linked to does not explain FGLS. There are so many ways that the assumption of known variance can be false. So, I've included three links. Obviously, you don't need to read them but my point was the assumption of know variance is unrealistic and, when one doesn't make that assumption, Aitken and your approach do not apply. Three links are in next comment. The last link is long winded and only section 5 of it is really relevant. – mark leeds Jul 27 '18 at 11:53
https://projecteuclid.org/download/pdf_1/euclid.ss/1177012408 https://courses.cit.cornell.edu/econ620/Lec11.pdf http://www.univ-orleans.fr/deg/masters/ESA/CH/Chapter5_GLS.pdf – mark leeds Jul 27 '18 at 11:54
The case you mention is different than the question above. Moreover, in case the noise samples are independent yet with different variance, as in your example, the above works and becomes much easier to calculate. I'm not sure what FGLS is and your remarks are vague. Could you summarize your point? You started with declaring my solution is more complicated than Regression for the Generalized Noise (Which is not). I'm not sure what's your point now. – Royi Jul 27 '18 at 11:57
Hi Royi: I'm not clear on what is not clear. All I am saying is that the assumption of known variance is unrealistic and, when the assumption is not true, your approach AND Aitken both won't work. If you're not interested in this case, that's fine. Originally, I didn't notice the argmin part of your answer and so I didn't know that your approach reduced to Aitken. My bad there. – mark leeds Jul 27 '18 at 12:03
That's not true. In many cases the variance is known very well (For instance, data coming from a sensor). Your remark, while it s true for many cases, is not the case for the above question (Assuming $ h $ is known). Indeed in the case the parameters of the noise aren't known it means we can either solve the whole ML problem at once or use 2 step approach where the first step is estimating the noise parameters. There are many choices on either approach. – Royi Jul 27 '18 at 12:12
When I said it works for Diagonal Matrix I meant known diagonal matrix. Again, your comments, while are true, add no information for the discussion. I could even generalize more than your "Addition" who said the noise is Guassian to begin with? Then the whole Least Squares approach doesn't hold. When we solve a problem we start with a model. In the case of the problem above the Covariance Matrix can be inferred from the Filter applied on the noise. – Royi Jul 27 '18 at 12:16
maybe known very well in dsp. my bad there. see halbert white's material for estimation of diagonal covariance matrix where variances are assumed to be different. if you think that you have your own methodology, that's fine with me. I don't want to argue about white's contributions. you can see for yourself if you want. all the best. – mark leeds Jul 27 '18 at 12:17
@markleeds, I maybe wrong but it seems you miss the point here and mixes things. Do you know ML estimator for the mean of Gaussian R.V. given some samples? Do you know the ML estimator for the variance when the mean known? Do you know the ML estimator for the Mean and Variance when both aren't known? The above is just the same in the case for Multi Variate Gaussian Random Variables. The OP asked for the Mean Estimator given the variance is known. – Royi Jul 27 '18 at 14:18

orchi_d · Answer 2 · 2021-05-07T05:49:12.830

Royi's answer is excellent. I'd like to discuss a different way of arriving at the same answer, one that tells you how to find the matrix $G$ if you don't know it. In many applications this matrix will be unknown. This involves something known as a 'whitening' operation. Let's say $\eta$ has a covariance matrix $K_\eta$ (which is PSD by definition). It can be written in terms of its eigenvalues and eigenvectors as $$K_\eta = Q\Lambda Q^\top$$

Now, the whitening operation involves converting the covariance matrix to a diagonal matrix. To do this, let us we will find a linear transformation matrix $\nu = G\eta$ such that $\nu$ is white noise with unit variance. The covariance of $\nu, K_\nu = G K_\eta G^\top$ $$K_\nu = G(Q\Lambda Q^\top)G^\top \\ = (GQ \Lambda^{\frac{1}{2}})(\Lambda^{\frac{1}{2}}Q^\top G^\top) \\ = (GQ \Lambda^{\frac{1}{2}})(GQ \Lambda^{\frac{1}{2}})^\top $$ If we want, $K_\nu = I$, then, $$(GQ \Lambda^{\frac{1}{2}})^{-1} = (GQ \Lambda^{\frac{1}{2}})^{\top}\\ \Lambda^{-\frac{1}{2}}Q^{-1}G^{-1} = \Lambda^{\frac{1}{2}}Q^\top G^\top $$ We know that $Q^\top = Q^{-1}$. So post multiplying both sides with $G$ gives us, $$\Lambda^{-\frac{1}{2}}Q^{\top} = \Lambda^{\frac{1}{2}}Q^\top G^\top G \\ G^\top G = Q \Lambda^{\frac{1}{2}} \Lambda^{\frac{1}{2}}Q^\top \\ G^\top G = K_\eta^{-1} $$ $G$ is known as the whitening matrix, and you can transform your equation to $$Gy = GAx + G\eta \\ \tilde{y} = Hx + \nu $$

where $y \sim \mathcal{N}(Hx, I)$, and the maximum likelihood estimate is $$\hat{x} = (H^\top H)^{-1} H^\top \tilde{y}\\ \hat{x} = (A^\top G^\top G A)^{-1} A^\top G^\top G y \\ \hat{x} = (A^\top K_\eta^{-1} A)^{-1} A^\top K_\eta^{-1} y$$

score 1 · Answer 3 · edited May 06 '21 at 17:48

1

First, I think that the expression $ A = h\star h\star c $ is not correct. Actually, $ A $ is a convolution matrix whose elements contains $ h\star h\star c $. The dimensions of $ A $ will depend on the dimensions of $ h $ and $ c $. If $ h $ has $ N_{h} $ coefficients and $ c $ has $ N_{c} $ coefficients, and let $ m = h\star h\star c $, then $ m $ will have $ N_{a} = 2N_{h} + N_{c} -2 $ coefficients.

For example, if $ h = [h_{1} \ h_{2} \ h_{3}]^{T} $ has 3 coefficients and $ c = [c_{1} \ c_{2} \ c_{3} \ c_{4}]^{T} $ has four coefficients, then $ m $ will have 8 coefficients.

The matrix $ A $ will have $ N_{m} $ rows and $ N $ columns, where $ N $ is the length of your $ a $, so as to make the product $ Aa $ equivalent to the convolution operation $ m\star a $.

Pay attention the colored noise is the result of AWGN through a linear system, hence it is still an Additive Gaussian Noise So, the ML problem can written as:

$$ \hat{a} = \arg \min_{a} {\left\| A a - y \right\|}_{2}^{2} $$

The solution to this ML problem can be found at any book about estimation theory and other materials like In Jae Myung - Tutorial on Maximum Likelihood Tstimation. The analytical solution will depend on the statistics involved. The implementation of the ML can be made using the Viterbi algorithm, for example.

edited May 06 '21 at 17:48

lennon310

3,590
19
24
27

answered Apr 16 '16 at 00:25

JohnMarvin

677
1
4
10

Thank you for your insight. You are correct. $A$ is a convolution matrix whose elements contains $hhc$, I will update the question accordingly. Are you sure that $ \hat{a} = arg \ min_{a}|Aa - y|^{2}. $ is the correct expression of the ML (given that $\eta$ is colored noise$? In the meantime, I will look into books of estimation theory. – Noor Apr 17 '16 at 01:41
Hi Moor and JohnMarvin: Just a heads up that minimizing the squared deviation only results in an ML estimator in the case of the gaussian likelihood assumption. I'm not sure what the likelihood is in this case because the terminology is different from what I'm used tp since background is statistics-econometrics. interesting to read nonetheless. – mark leeds Jan 25 '18 at 14:23
That is right mark leeds. He has that noise $ n $ that is AWGN. However, it is filtered by the receive filter, generating the coloured noise $ \eta $. It is important to notice that although the decorrelation between the samples is not guaranteed, the noise will still have a Gaussian distribution. So, as you said, the ML is still achieved when you minimize the squared deviation. :) – JohnMarvin Jan 28 '18 at 15:39
@JohnMarvin. Thanks. It's really appreciated but even things like AWGN are foreign to me. If you don't mind, do you know of a reasonable level EE book that covers this type of material. I know that DSP books don't because I have a lot of those. My background is stat-econometrics but I' trying to become more knowledgable about EE topics. In some ways, the two fields are not so different. In other ways, EXTREMELY different. Thanks again. – mark leeds Feb 24 '18 at 17:53
@JohnMarvin, I'm not so sure about your solution. When you write the ML function you'll have to deal with the Covariance Matrix of the colored noise. – Royi Jul 25 '18 at 04:45
@Royi: I see what you did with the likelihood but there's an easier equivalent way which results in what is referred to as the Aitken estimator in statistics. See page 4 of this document for a nice discussion of it if you're interested. https://eml.berkeley.edu/~powell/e240b_sp06/glsnotes.pdf – mark leeds Jul 25 '18 at 06:57
@markleeds, Why do you find it easier? It is equivalent. He used Regression (Least Squares) while I used ML which for Gaussian Noise collide. – Royi Jul 25 '18 at 09:41
@royi: Hi: I agree it's the same. but in your case, with real data, you'd have to optimize over a possibly complicated likelihood. Aitken OLS as you know, is a closed form solution. Just thought I'd point it out in case you were unaware. Sounds like you already were aware, so my apologies for noise. – mark leeds Jul 25 '18 at 21:22
The above is also a closed form solution. Read my answer again. It is the exact same term to minimize. It was just derived from different perspective (Regression vs. ML). For Gaussian Noise Regression (By Least Squares) and ML are the same. It's better to have this discussion as comments to my answer. – Royi Jul 26 '18 at 06:06

Maximum Likelihood for Colored Noise

3 Answers3

Linked