Deconvolution with noisy measurement of impulse response function

Question

I observe a noisy complex-valued signal that has passed through some linear time-invariant filter: $$y(t) = (h * x)(t) + n(t)$$ where $h(t)$ is a causal and finite impulse response, and $n(t)$ is additive Gaussian white noise, $n \sim \mathcal{CN}(0, \sigma^2)$.

I also have a noisy approximation of the filter: $\hat{h}(t) = h(t) + \epsilon$. You can assume $\epsilon$ is also AGWN.

The goal is to recover $x(t)$ as best as possible given only $y(t)$ and $\hat{h}(t)$.

What is the best deconvolution scheme to use in this context? Additionally, how would you avoid overfitting (i.e., is there a nice way to regularize the solution for cases when the signal-to-noise ratio is quite low)?

More deconvolution literature talks specifically about 2D images or stochastic processes. I can't find much literature on deconvolution in a signal-processing context.

To add some context to answer some questions in the comments:

I encountered this problem in the context of radio astronomy. I have a radio source that emits from signal $x(t)$. As the signal propagates through the interstellar medium, it can take many different paths as it propagates towards us - this is the filter, $h(t)$. The noise $n(t)$ primarily comes from terrestrial sources and the receiver at the radio telescope (hence, why no noise actually goes through the filter).

I only have noisy approximations of the filter because the radio source I am observing happens to randomly emit incredibly bright "impulses" (few nanoseconds wide) every now and then, which lets me directly but noisily measure $h(t)$ (literally, the impulse response).

In practice, the length of $h[n]$ is $\sim\!10000$, and the length of $x[n]$ is $\sim\!30,000,000$.

You might have a look at this good old paper: G.K. Wertheim, Deconvolution and Smoothing: Applications in ESCA, J. Electron Spectroscopy and Related Phenomena, 6 (1975) 239-251. I do not have an online link to this paper, just an old scanned copy. It will provide some insight into the issues that arise, particularly the problem of division by either zero or tiny numerical values. Good luck! — Ed V, Nov 29 '20 at 14:52
I skimmed the paper and it mostly talks about real data and phaseless response functions? In my case, I have a complex signal and a complex filter (with arbitrary phases). It feels like the methods mentioned in the paper don't apply to my case? — XYZT, Nov 29 '20 at 23:12
I don’t know: this is the first I am hearing about the complex signal, etc. Perhaps someone else here can help. — Ed V, Nov 29 '20 at 23:33
When I say complex, I just mean a baseband signal centered at zero frequency (so, you end up with a complex-valued signal). — XYZT, Nov 30 '20 at 22:30
@XYZT, You problem matches the Total Least Squares problem. See my solution below. — Royi, Mar 28 '23 at 09:10
One way I can think of is that if h(t) is an FIR filter with zeros inside the unit circle, then the output of the filter response can be fed into an auto-regressive process with h(t) as feedback weights. It works with no noise cases, need to check for noisy cases. — learner, Mar 28 '23 at 15:55
@learner I am not sure what you mean by "zeros inside the unit circle" - but it sounds like that is almost certain not true in this case. — XYZT, Mar 28 '23 at 16:14
In your problem formulation, you are not also passing the noise through the LTI filter. Instead, I think you need to say that $x(t)=s(t)+n(t)$ and then apply $x(t)$ to the filter. Your goal then is to recover $s(t)$. — Jonathan Williams, Mar 30 '23 at 12:46
@JonathanWilliams This is not true in my actual problem. Noise is only applied after the filter. — XYZT, Mar 31 '23 at 09:11
I assume your noise for $\hat{h}(t)$ is independent but identically distributed from the noise that is on $y(t)$? Meaning these were two separate measurements? And is this determined using multiple captures (with the same $x(t)$ and $h(t)$ and only the noise varies the result, or is the condition that the solution is to be determined from a single capture? — Dan Boschen, Mar 31 '23 at 21:17
@DanBoschen I added some context to the question that should answer your (and many other) question. — XYZT, Apr 02 '23 at 00:10
I assume your channel is not stationary, meaning you can’t average subsequent readings to improve your estimate of h(t)? — Dan Boschen, Apr 02 '23 at 02:13
@DanBoschen I actually do some form of what you said, so the noise in $\hat{h}(t)$ has a much smaller variance than the noise term in $y(t)$. — XYZT, Apr 02 '23 at 05:51
Ok given what you described, I don't think you can do better than the FFT approach as I detailed in my answer. Just be sure you don't have deep channel nulls, which can occur if the delay spread significantly exceeds the inverse of the signal bandwidth. (In which case you can manually limit the spectral nulls in the FFT to not go below -25 dB or so for the frequency band within the signal bandwidth (modify -25 dB based on your own test results) — Dan Boschen, Apr 02 '23 at 11:17

Royi · Accepted Answer · 2023-04-05T08:36:42.960

2

Since we're dealing AWGN noise, we can try formulate the problem as some kind of a Least Squares problem.

The way to model the problem is as we model the Total Least Squares Problem:

$$ \boldsymbol{y} \approx H \boldsymbol{x} \implies \boldsymbol{y} = \left( H + E \right) \boldsymbol{x} + \boldsymbol{n} $$

Where in the model $E$ and $n$ are composed by AWGN noise.
Since we're dealing with zeros mean noise which is Gaussian, minimizing their variance is equivalent to the Maximum Likelihood estimator, hence we can solve the following problem:

$$ \begin{align} \arg \min_{\boldsymbol{x}, E, \boldsymbol{n}} \quad & {\left\| \begin{bmatrix} E & \boldsymbol{n} \end{bmatrix} \right\|}_{F}^{2} \\ \text{subject to} \quad & \begin{aligned} \boldsymbol{y} & = \left( H + E \right) \boldsymbol{x} - \boldsymbol{n} \end{aligned} \end{align} $$

The solution is given by $ \boldsymbol{x} = -{V}_{H \boldsymbol{y}} {V}_{\boldsymbol{y} \boldsymbol{y}}^{-1} $. Pay attention that the negation of $ \boldsymbol{n} $ has no effect, I just wanted to match the derivation of Wikipedia. So if you want to go deeper into the derivation, things will match.

Where we define the matrices by the SVD:

$$ \begin{bmatrix} H & \boldsymbol{y} \end{bmatrix} = \begin{bmatrix} {U}_{H} & {U}_{\boldsymbol{y}} \end{bmatrix} \begin{bmatrix} {\Sigma}_{H} & 0 \\ 0 & {\Sigma}_{\boldsymbol{y}} \end{bmatrix} {\begin{bmatrix} {V}_{H H} & {V}_{H \boldsymbol{y}} \\ {V}_{\boldsymbol{y} H} & {V}_{\boldsymbol{y} \boldsymbol{y}} \end{bmatrix}}^{*} $$

In the above we use $ H $ is the matrix formulation of the convolution operator. You may build it according to Generate the Matrix Form of 1D Convolution Kernel.

So the recipe is quite simple in your case.

How to Solve

Given we know how to generate $H$ matrix (See Generate the Matrix Form of 1D Convolution Kernel), the solution for the case above, where $ \boldsymbol{y} $ is a vector, is pretty straight forward:

function [ vX ] = TlsRegression( mH, vY )
numCols = size(mH, 2);
[~, ~, mV] = svd([mH, vY]);
vX = -mV(1:numCols, numCols + 1) / mV(numCols + 1, numCols + 1);
end

For instance, for Linear Regression of a quadratic polynomial we can see a comparison between the Least Squares and the Total Least Squares solution:

The difference is about the error to be minimized:

The Total Least Squares (TLS) minimizes the perpendicular error, the Least Squares (LS) minimizes the vertical distance, namely it assumes the model is perfect.

edited Apr 05 '23 at 08:36

answered Mar 28 '23 at 09:09

Royi

19,608
4
197
238

1

Can you please elaborate on how the matrices in your answer relate to $y(t)$, $x(t)$, and $\hat{h}(t)$ from my problem formulation? – XYZT Mar 28 '23 at 14:37
@XYZT, I wrote in the answer: $ H $ is built by your coefficients as shown in the link. The vector $ \boldsymbol{y} $ is just the samples in time you have. – Royi Mar 28 '23 at 14:40
Royi I see what you mean, and I've no doubt your answer is correct, but I have to agree with @XYZT (to some extent) that your notation might not be as clear as it could be. – Gillespie Apr 01 '23 at 03:31
@Gillespie, I linked to a question which shows how to build $H$ from a kernel, isn't that clear? Please let me know what should be done to make it clearer. I'd be happy to do so. – Royi Apr 01 '23 at 09:37
1

For me at least, that's not a problem (though it could be for others). I think the main thing is that one has to read most of the Total Least Squares Wikipedia article to get most of the variable definitions, and the definitions you do include in the answer are at the end, after they are used. Also jumping straight into augmented SVD with little explanation may be difficult for some people.
Like I said, it's a good answer, and I understand it better after studying it more carefully. Some definitions up front and a little explanation of the augmented matrix SVD step might help though.
– Gillespie Apr 01 '23 at 15:04
@Gillespie, Building $H$ has nothing to do with the SVD. Given $H$ all operations above are few MATLAB lines. – Royi Apr 01 '23 at 17:24
1

@Royi I know building $H$ has nothing to do with SVD. My point was that $H$ wasn't the confusing part. Rather, it's a lack of definitions up front, and relying a lot on people having read all of the total least squares article. – Gillespie Apr 01 '23 at 19:07
@Royi It appears the problem of deconvolution is better described as "structured total least squares" since $H$ and $E$ are Toeplitz. Would you be able to provide some guidance in your answer on how to tackle this specific form of TLS? Also, it seems that in many cases, $\boldsymbol{x} = -{V}{HH} {V}{\boldsymbol{y} \boldsymbol{y}}^{-1}$ does not give the TLS solution. – XYZT Apr 01 '23 at 23:57
@XYZT, Indeed. For the case of regression, it might be that mV(numCols + 1, numCols + 1); from the code is zero and there is no TLS solution. But then it means the solution is just a combination of $H$ itself. Indeed for deconvolution we need to take into consideration that the noise is correlated along $ H $. Hence we can use methods as in Maximum Likelihood Analysis of the Total Least Squares Problem with Correlated Errors. In this question you asked for the framework, which is the Total Least Squares as I suggested. – Royi Apr 02 '23 at 00:48
1

@XYZT, If you want a numerical implementation, please open a new question, I'd be happy to answer it as well. – Royi Apr 02 '23 at 00:49
@Royi Thanks! I am sure I'll do that once I have digested your answer here properly :) – XYZT Apr 02 '23 at 01:04
@XYZT, Great. Ping me here with a link when you open the question. It will be interesting to tackle it. – Royi Apr 02 '23 at 18:26

score 1 · Answer 2 · edited Apr 03 '23 at 12:59

1

I used $h[n]$, $\hat{h}[n]$, $x[n]$ in my answer below to represent samples of $h(t)$, $\hat{h}[t]$, $x[t]$ since I am proposing a discrete time solution using FFTs.

Proposed Approach

If $h[n]$ is significantly shorter than $x[n]$ and $y[n]$, (as would be typically expected) my solution would be to zero pad $\hat{h}[n]$ and $y[n]$ to the same length and at least double the length of the longest of the two (to avoid time domain aliasing), and then use the FFT of each as $\hat{H}[k]$ and $Y[k]$. $x[n]$ is then estimated by taking the inverse FFT of the ratio as:

$$x[n] = \text{ifft}(Y[k] / \hat{H}[k])$$

This approach is effective as long as there are not deep nulls in the channel within the occupied bandwidth of the signal (both of which can be confirmed via the FFT's performed). Deep nulls in the channel can occur when the delay spread (length of the dominant portions of the channel impulse response) significantly exceeds the inverse of the signal bandwidth. Such nulls will lead to noise enhancement degrading the result if not compensated for. By deep nulls, this would typically be channel nulls of -25 dB or more. I show an example simulation result at the end of this post with an in-band channel null of -10 dB, which as demonstrated is not an issue. This is a motivation for decision feedback equalization in communication systems, but for this application, the nulls can be compensated for by limiting the minimum value in $\hat{H}(k)$ for the portions of the channel response that are within the occupied bandwidth of the signal (I suggest -25 dB, but confirm in simulation what would be best based on the characteristics of the waveform and channel). Further, if the energy for $h[n]$ is known to be in a constrained portion of the spectrum, further benefit can be reached by averaging or filtering $\hat{h}[n]$ to reduce the noise improving the estimate for $\hat{h}[n]$ to be used in the processing above.

As Royi has clarified in the comments, the above approach is a least squares solution, in a very straightforward and efficient computation!

A simulation confirming the proposed approach is shown in the plot below, where $x[n]$ was estimated from $\hat{h}[n]$ and $y[n]$ alone:

edited Apr 03 '23 at 12:59

Royi

19,608
4
197
238

answered Apr 01 '23 at 16:09

Dan Boschen

50,942
2
57
135

But the least squares solution doesn't fit the model of the question. The reason is the noise the filter coefficient will make the solution worse. Your proposed operation in the frequency domain is the least squares solution. – Royi Apr 01 '23 at 18:28
@Royi yes exactly- Simply using the FFT and inverse FFT rather than the Wiener Hopf equations (the cross correlation vector and autocorrelation matrix). I'm hoping that will be clearer for the OP. – Dan Boschen Apr 01 '23 at 19:34
The Wiener Deconvolution is not the Least Squares solution. You write against the Least Squares solution but the division in the frequency domain is the Least Squares solution. The Wiener Deconvolution is the MMSE estimator (While the LS is a parametric method). – Royi Apr 01 '23 at 19:53
If you build the matrix $ H $ from my solution above, then your solution is just $ {\left( {H}^{T} H \right)}^{-1} {H}^{T} \boldsymbol{y} $ (Assuming everything is real, just for simplicity). Assuming $ H $ is built for the cyclic convolution. – Royi Apr 01 '23 at 19:55
I agree with @Royi that the straightforward frequency domain solution needs to be regularized in some way or you'll get some unstable results. – Gillespie Apr 01 '23 at 21:36
Yes I mention that specifically- those are limited to conditions with deep nulls and the “unstable” result is noise enhancement (can’t have zeros near the unit circle as the reciprocal results in high gain of what is low signal content at that frequency and high noise). – Dan Boschen Apr 01 '23 at 21:59
@Royi thanks- I confused “Wiener Deconvolution” with the “Wiener Hopf Equations” which is what I referenced in the answer at my related posts— in any event I am not advocating that approach here as long as the channel doesn’t have deep nulls (and it is much shorter than x and y as I suspect). – Dan Boschen Apr 01 '23 at 22:01
But your method is the Least Squares method. It means it assumes $\boldsymbol{h}$ is known with no errors. Which doesn't fit the model of the OP. – Royi Apr 01 '23 at 22:05
@Royi - no that is not true. My method uses the noisy h and the received y. Please reread my first two paragraphs: $\hat{h}[n]$ refers to the noisy channel response the OP has. – Dan Boschen Apr 01 '23 at 22:59
1

@DanBoschen, I read it. My argument is that your solution is the regular least squares solution. Care to open a question what's the real world meaning of $x[n] = \text{ifft}(Y[k]/ (\hat{H}[k])$? I will show you it is just the Least Squares solution to the equation $ \boldsymbol{y} = H \boldsymbol{x} $ where $ H $ is the cyclic convolution matrix. Hence, if you apply the inverse DFT using the "dirty" channel coefficients, you're basically solving the Least Squares model assuming the coefficients you have are the real ones. – Royi Apr 01 '23 at 23:12
@Royi- I see! Interesting. I need to think that through and will open a question as you suggest. Thanks! – Dan Boschen Apr 02 '23 at 00:57
@Royi I posted a seperate question as to the difference between the FFT approach and least squares as done by the Wiener-Hopf equations. It will be interesting to see what additional insights come up from you and others: https://dsp.stackexchange.com/questions/87326/least-squares-solution-using-the-dft-vs-wiener-hopf-equations Thanks for your thoughts and input! – Dan Boschen Apr 02 '23 at 12:40
@DanBoschen, In case the length of $\boldsymbol{h}$ is much shorter than $\boldsymbol{x}$ than solving the problem in the time domain, under the periodic convolution model, might be faster and yield the same results. – Royi Apr 03 '23 at 13:01

Gillespie · Answer 3 · 2023-04-02T02:57:47.373

One practical method would be to use the matched filter. Matched filters are commonly thought of as useful for finding the scale and delay of a known signal buried in noise, but if you think about it, the same principle allows the matched filter to perform channel estimation (which is essentially what your problem is).

This is why matched filters are used in remote sensing/image formation. For example, synthetic aperture radar attempts to form an image of an unknown scene by convolving a known "training sequence" (i.e., the transmitted radar waveform, $h(t)$ in your case) with the unknown scene/channel impulse response ($x(t)$ in your case). Of course, $x(t)$ has some noise added to it, and perhaps you don't know the waveform the radar actually transmitted due to hardware distortion, so you have a noisy estimate of the waveform, $\hat{h}(t)$, rather than $h(t)$.

This problem is solved by cross-correlating the received signal, $y(t)$, with transmitted signal, $\hat{h}(t)$ (matched filtering). Or equivalently, convolving $y(t)$ with the conjugated and time reversed version of $\hat{h}(t)$, $\hat{h}^*(-t)$. For Gaussian noise, the matched filter maximizes SNR.

Another Method

One could also use regularized deconvolution. In the frequency domain, deconvolution just means dividing the observed spectrum $Y(f)$ by the filter spectrum $H(f)$. But this can be subject to dividing by zero at points, so typically $H(f)$ is regularized to have some moderate minimum value first.

A similar solution is Wiener deconvolution.

but $\hat{h}(t)$ is not the "transmitted signal", it is the channel. The OP wants to determine the transmitted signal, which is $x(t)$. — Dan Boschen, Apr 01 '23 at 17:25
I don't think it matters which one you call the channel and which you call the "training sequence" (in comms)/"transmitted signal" (in remote sensing) because $h(t)x(t) = x(t)h(t)$. The fundamental thing is that $h(t)$ is the known signal, and $x(t)$ is the unknown. — Gillespie, Apr 01 '23 at 21:32
Typically the channel is much shorter than the training sequence; this is what leads to the overdetermined solution for the channel when using least squares solutions. — Dan Boschen, Apr 01 '23 at 21:57
That's fair Dan. Though I suppose the OP didn't say what his specific application was. — Gillespie, Apr 01 '23 at 23:11
@Gillespie I have added additional context to the question now. The "impulse response" is indeed orders of magnitude smaller than the "input signal". — XYZT, Apr 02 '23 at 00:21

Deconvolution with noisy measurement of impulse response function

3 Answers3

How to Solve

Another Method

Linked