9

Least Squares channel estimation (or equalization) can be accomplished using the "Wiener Hopf Equations", or the Discrete Fourier Transform. Both appear to be least squares solutions. How do the two compare and contrast? Under what conditions would one approach be preferred over the other, and if given the same conditions, will there be a performance difference between the two? I would like to see a clear mathematical explanation as to where the differences are for this application (or show how they are identical if that is the case). My simulation results included as an answer here conclude they are not the same and show a condition where the Discrete Fourier Transform approach as detailed below is inferior for channel estimation. This is in contrast to answers already provided, so my question as posted is still open. As I posted originally, together with the math, I am looking for practical insight as to when it would make sense to use on over the other, assuming they are both least squares solutions to the channel estimation problem.

Background Details

In general the channel estimation problem involves an input, a channel, and an output where the time domain output is the convolution of the input with the channel with additive noise given as:

$$y[n] = (x[n]*h[n]) + N[n]$$

Where:

$x[n]$: time domain transmitted waveform
$y[n]$: time domain received waveform
$h[n]$: channel impulse response
$N[n]$: additive random stationary noise
$*$: convolution operator

block diagram

For channel estimation, $x[n]$ and $y[n]$ are known ($x[n]$ is a spectrally rich training or sounding sequence and $y[n]$ is the received signal for that sequence), and $h[n]$ is the unknown to be estimated. Use of the Wiener-Hopf equations is a common approach for channel estimation (or channel compensation as an equalizer when we instead swap the input and output in the block diagram above: for that case given $y[n]$, we solve for the equalizer filter to produce $x[n]$). This results in a least-squares solution (minimizing the least-squared error) for the estimate of the channel. The derivation is detailed in this post and bottom lined here:

$$\tilde{h} = (A^TA)^{-1}A^Ty$$

Where:

$\tilde{h}$: vector containing estimate of the channel
$A$: convolution matrix (Toeplitz matrix with $x$ on main diagonal)
$y$: vector of received values

Optionally given $x[n]$ and $y[n]$ we can use the DFT to solve for $h[n]$ as I propose below:

Zero pad $x[n]$ and $y[n]$ to the same length and at least double the length of the longest of the two. This is to minimize the effect of time domain aliasing, the padding may need to be even longer depending on the duration of the actual impulse response, review the solution and confirm the resulting response has decayed to the noise floor or otherwise increase the length with additional padding). Then use the DFT of each as $X[k]$ and $Y[k]$, with $h[n]$ estimated by taking the inverse DFT of the ratio as:

$$h[n] = \text{ifft}(Y[k]/X[k])$$

The resulting $h[n]$ should exceed the time duration of the channel's impulse response (otherwise it will have been distorted by time-domain aliasing in the processing), and then can be truncated and windowed to the dominant length of the response (but care should be taken not to use symmetrical windows for minimum phase results where dominant energy exists near the start of the sequence.).

Note, both approaches are not suitable when the delay spread of the channel far exceeds the inverse of the channel bandwidth, as this is the recipe for deep spectral nulls. Deep nulls in the channel within the signal's occupied bandwidth will be extremely small values in the channel spectrum $X[k]$, leading to noise enhancement. With the DFT method, such nulls can be compensated for by limiting the minimum value within $X[k]$.

Recap of Primary Question

Given these two different approaches: one using a matrix of autocorrelation values and a cross correlation vector in the Wiener-Hopf equations, and the other using the DFT of the input and output, how do they compare and contrast? Will they lead to the same result or is one preferred over the other in certain situations? Note that this question applies to solving for any of the three when the other two are known (tx, rx and channel).


I will summarize a main point since the complete answer I am looking for hasn't been provided yet:

Channel estimation with the Wiener-Hopf equations:

$$h = (A^TA)^{-1} A^Ty$$

Where: $(A^TA)$: The autocorrelation matrix using the known transmitted sequence
$A^Ty$: The cross correlation vector of received sequence and known transmitted sequence

Channel estimation using DFTs:

$$h[n] = \text{ifft}(Y[k]/X[k])$$

Where:

$Y[k]$ is the DFT of $y[n]$, as a matrix operation $Y=Dy$
$X[k]$ is the DFT of $x[n]$, as a matrix operation $X=Dx$
$\text{ifft()}$ is an inverse DFT operation

I understand that $\text{ifft}(\text{fft}(a)\text{fft}(b))$ is a circular convolution of $a$ and $b$, and how mathematically the DFT matrix and inverse DFT matrix products as used leads to a circular convolution vector in time. I am looking for the further direct connection to the autocorrelation matrix and cross correlation vector used in the Wiener Hopf expression.

So mathematically how do the samples in $h[n]$ above relate to the vector $h$ between the two methods? Under condition of noise, is one approach preferred to the other or are they equivalent? What is the practical insight here?

Dan Boschen
  • 50,942
  • 2
  • 57
  • 135
  • 1
    I verified my assertion. If code needed to show it numerically, let me know. – Royi Apr 02 '23 at 18:25
  • $A$ is the convolution matrix of $x$, correct? Then $(A^TA)$ would be the autocorrelation matrix of the input ($x$), not the received sequence ($y$), right? – Gillespie Apr 03 '23 at 16:57
  • 1
    Ah I see you are referring to a mistake in my text- will fix that, thanks! – Dan Boschen Apr 03 '23 at 20:01
  • The question is very long, but the bottom line of your question is a one liner. What are the different results for the problem given different models of the discrete convolution. One more thing to take into is that the model above is not the mode of Wiener. In Wiener's model the input is stochastic. This is the reason that in his work the least squares is on the correlation function (Which for Wide Stationary signals is deterministic). Then both model are just equivalent (LS on the 2nd moment vs LS on the data itself). – Royi Apr 05 '23 at 09:41
  • @Royi See my post I provided, this had no noise, no stochastics and used the least squares formulation that is used in the Wiener-Hopf equations just as I provided above. If calling it a Wiener Filter is adding to confusion I can remove that. My simulation is showing they are not equivalent, but if they would be identical under the case of such deterministic signals, then I am missing something somewhere. Perhaps there is one more averaging step required or something like that, that isn't yet showing up in the math, or some time offset I am supposed to be doing etc. – Dan Boschen Apr 05 '23 at 12:21
  • I may try to make a simpler case that would be easier for me to share--- what I have is from a comm simulation model I already had for demonstrating the Wiener-Hopf equations for channel equalization. (which I also couldn't do as nicely just using zero-padded FFTs) – Dan Boschen Apr 05 '23 at 12:22
  • @DanBoschen, The whole idea behind the Wiener Hopf is converting the operation to a deterministic one by working the Auto Correlation instead of the samples. I suggest a simple question with ground truth model you create and then we can compare many solutions. By the way I am not sure about the assertion for zero padding in this context. I wrote in the chat that the way to utilize FFT is in an overlap and save manner. – Royi Apr 05 '23 at 13:19
  • Let me take out any reference to Wiener Hopf and refer to the formula as I am using it, effectively, with deterministic signals - I think my reference to it is causing a distraction. I use either formula as written to determine the channel. Your overlap and save comment is very good as that may be the averaging effect I was looking for. I will try that case later. Can you expand your answer to show, mathematically how it then becomes equivalent? I think that is what is missing (looking for practical insight). What overlap and how are they to be combined that makes the two exactly identical? – Dan Boschen Apr 05 '23 at 13:27

4 Answers4

5

Following Royi's derivation, we want to show that,

$$\begin{align} \hat{h} = \arg \min_h||Xh - y||^2 = (X^T X)^{-1} X^H y = IDFT(Y \oslash X) \end{align}$$ where $X$ is a circular convolution matrix that is circulant.

Deriving the Wiener-Hopf solution is simple, $$\begin{align} \hat{h} &= \arg \min_h||Xh - y||^2 \\ & = \arg \min_h (Xh - y)^H (Xh - y) \\ &= \arg \min_h h^HX^HXh + y^Hy - 2h^HX^Hy \\ &= \arg \min_h h^HX^HXh - 2h^HX^Hy \end{align} $$ Taking the derivative of the cost function with respect to $h$, we get $2X^HX\hat{h} - 2X^Hy = 0 \implies \hat{h} = (X^HX)^{-1} X^Hy$.

Now, to show the equivalence of the Wiener-Hopf equations to the DFT convolution theorem. Using the symbol $\mathscr{D}$ to denote the DFT matrix, we can write $X = \mathscr{D}^H \Lambda_X \mathscr{D}$, since the eigenvectors of any circulant matrix is the DFT matrix, and its eigenvalues are the DFT of the first row of the matrix X. We will use the following facts in proving the equivalence

  • We can write $\Lambda_X = \text{Diag}(\mathscr{D} x)$.
  • $\text{Diag}(a) b = a \odot b$, i.e, pre-multiplication with a diagonal matrix is equivalent to taking the Hadamard product of the two vectors.
  • $\mathscr{D}$ is unitary, i.e., $\mathscr{D}^{-1} = \mathscr{D}^H, \ \mathscr{D}^H \mathscr{D} = \mathscr{D} \mathscr{D}^H = I$.

In the Wiener-Hopf solution, replace $X = \mathscr{D}^H \Lambda_X \mathscr{D}$, and you get $$\begin{align} \hat{h} &= \left((\mathscr{D}^H \Lambda_X^H \mathscr{D}) (\mathscr{D}^H\Lambda_X \mathscr{D})\right)^{-1} \mathscr{D}^H \Lambda_X^H \mathscr{D} y \\ &= (\mathscr{D}^H |\Lambda_X|^2 \mathscr{D})^{-1} \mathscr{D}^H \Lambda_X^H \mathscr{D} y \\ &= (\mathscr{D}^H |\Lambda_X|^{-2} \mathscr{D}) \mathscr{D}^H \Lambda_X^H \mathscr{D} y \\ &= \mathscr{D}^H |\Lambda_X|^{-2} \Lambda_X^H \mathscr{D} y \\ &= \mathscr{D}^H \Lambda_X^{-1} \mathscr{D} y \\ &= \mathscr{D}^H \text{Diag}(\mathscr{D}x)^{-1} \mathscr{D} y \\ &= \mathscr{D}^H (\mathscr{D}x)^{-1} \odot ( \mathscr{D} y) \\ &= \mathscr{D}^H ( \mathscr{D} y) \oslash (\mathscr{D}x) \\ &= \text{IDFT}\left(\text{DFT}(y) \oslash \text{DFT}(x)\right) \end{align}$$

Now, if $X$ is a Toeplitz matrix, and we have performed linear convolution, not circular convolution, then, $X = Q \tilde{\Lambda}_x Q{^H}$ where the eigenvectors are unitary, but are no longer the DFT matrix. Now, the proof no longer holds, and $\hat{h} = Q^H (\tilde{\Lambda}_x)^{-1} Q y$. However, in the process of zero padding $x, y$ and ensuring the DFT length is more than twice their lengths, you are essentially performing linear convolution.

orchi_d
  • 577
  • 2
  • 7
  • Comments have been moved to chat; please do not continue the discussion here. Before posting a comment below this one, please review the purposes of comments. Comments that do not request clarification or suggest improvements usually belong as an answer, on [meta], or in [chat]. Comments continuing discussion may be removed. – Peter K. Apr 07 '23 at 11:51
1

My assertion is as following:

Given a model: $ \boldsymbol{y} = \boldsymbol{h} \circledast \boldsymbol{x} $ where $ \circledast $ is the periodic / circular convolution operation the following are equivalent: $$ {\mathscr{D}}^{-1} \left( \left( \mathscr{D} \boldsymbol{y} \right) \oslash \left( \mathscr{D} \boldsymbol{x} \right) \right) = \arg \min_{\boldsymbol{h}} \frac{1}{2} {\left\| X \boldsymbol{h} - \boldsymbol{y} \right\|}_{2}^{2} $$ Where $ \mathscr{D} $ is the DFT Operator (Matrix), $ \oslash $ is the element wise division and $ X $ is the cyclic / periodic convolution matrix of the data. Namely $ X \boldsymbol{h} = \boldsymbol{h} \circledast \boldsymbol{x} $.

Some remarks:

  • The $ \mathscr{D} $ is the Unitary Form of the DFT Matrix.
  • $ \mathscr{x} = \mathscr{D} \boldsymbol{x} $ is the DFT transform of $ \boldsymbol{x} $.
  • $ \mathscr{y} = \mathscr{D} \boldsymbol{y} $ is the DFT transform of $ \boldsymbol{y} $.
  • We're assuming $ \mathscr{D} \boldsymbol{x} $ has no zeros. Basically it means that solving the problem in the original domain is much more stable. As there is always a solution.
  • We could to build the matrix $ X $ as a cyclic / periodic convolution matrix (See Generate the Matrix Form of 1D Convolution Kernel). Then use the classic solution of Linear Least Squares.
  • Since $ X $ is a periodic / cyclic convolution operator it has the structure of a Circulant Matrix.

The best way to derive the equivalence is using the circulant property. Namely we can write $ X = \mathscr{D}^{H} \operatorname{Diag} \left( \mathscr{x} \right) \mathscr{D} $ where $ \mathscr{x} = \mathscr{D} \boldsymbol{x} $ is the DFT transform of $ \boldsymbol{x} $.

$$ \begin{aligned} \hat{\boldsymbol{h}} & = \arg \min_{\boldsymbol{h}} \frac{1}{2} {\left\| X \boldsymbol{h} - \boldsymbol{y} \right\|}_{2}^{2} \\ & = \arg \min_{\boldsymbol{h}} \frac{1}{2} {\left\| \mathscr{D}^{H} \operatorname{Diag} \left( \mathscr{x} \right) \mathscr{D} \boldsymbol{h} - \boldsymbol{y} \right\|}_{2}^{2} && \text{$X$ is a circulant matrix} \\ & = \arg \min_{\boldsymbol{h}} \frac{1}{2} {\left\| \mathscr{D} \mathscr{D}^{H} \operatorname{Diag} \left( \mathscr{x} \right) \mathscr{D} \boldsymbol{h} - \mathscr{D} \boldsymbol{y} \right\|}_{2}^{2} && \text{$\mathscr{D}$ is a unitary matrix} \\ & = \arg \min_{\boldsymbol{h}} \frac{1}{2} {\left\| \operatorname{Diag} \left( \mathscr{x} \right) \mathscr{D} \boldsymbol{h} - \mathscr{D} \boldsymbol{y} \right\|}_{2}^{2} && \text{$\mathscr{D}$ is a unitary matrix} \\ & = \arg \min_{\boldsymbol{h}} \frac{1}{2} {\left\| \operatorname{Diag} \left( \mathscr{x} \right) \mathscr{h} - \mathscr{y} \right\|}_{2}^{2} && \text{DFT transform} \\ \end{aligned} $$

The term inside the norm can be brought to zero by solving the diagonal linear system:

$$ \mathscr{h} = \operatorname{Diag}^{-1} \left( \mathscr{x} \right) \mathscr{y} = \mathscr{x} \oslash \mathscr{y} = \left( \mathscr{D} \boldsymbol{y} \right) \oslash \left( \mathscr{D} \boldsymbol{x} \right) $$

Hence $ \hat{\boldsymbol{h}} = {\mathscr{D}}^{-1} \left( \left( \mathscr{D} \boldsymbol{y} \right) \oslash \left( \mathscr{D} \boldsymbol{x} \right) \right) $ as required.

In addition we know $ \hat{\boldsymbol{h}} = {\left( {X}^{T} X \right)}^{-1} {X}^{T} \boldsymbol{y} $.

Royi
  • 19,608
  • 4
  • 197
  • 238
  • Thanks @Royi! I understand well what you assert is equivalent (fast convolution or correlation with DFT's depending on using complex conjugate). I was more interested in how this compared to the Wiener-Hopf equations and the applications those are used for (channel estimation). You state this in the last line but I was hoping for more detail showing that they are exactly the same (or not). Would you assert that the two approaches I gave are equivalent? If not, how do they differ and what would be the considerations to use one or the other for purposes of channel estimation? – Dan Boschen Apr 02 '23 at 21:37
  • (Wiener-Hopf convolution is not circular) – Dan Boschen Apr 02 '23 at 21:40
  • @DanBoschen, My assertion was simple about your answer to https://dsp.stackexchange.com/questions/71660. Your answer states that the Least Squares Solution doesn't fit hence you gave the option of division in frequency domain. All I said that under the assumption of cyclic convolution it is the Least Squares solution. Pay attention that for discrete convolution the difference between convolution types is basically how you handle the boundaries. – Royi Apr 03 '23 at 07:06
  • @DanBoschen, The classic Wiener Hopf for FIR filter just use a specific boundary conditions. I have to say that most of Wiener work was when the model is also sotchastic. This is not the LS model. – Royi Apr 03 '23 at 07:38
  • @DanBoschen, The Wiener Hopf model is about the data and the noise being stochastic. It collides with the Least Squares method since the classic assumption is that the distribution of the data is Gaussian / We only care about the 1st and 2nd moments. Hence you can practically solve it just like I did above. Regarding how to define the discrete convolution, it is an hyper parameter of the problem. My point being the solution you offered matches the LS solution under the assumption of periodic convolution. Hence it doesn't make sense both saying the LS is wrong while offering it as a solution. – Royi Apr 03 '23 at 08:41
  • You are answering a question I didn't ask here. Your referenced question inspired this, and I agreed there with your comment that both can be least squares solutions and removed all part of my answer regarding "LS is wrong". Still my question is given they are both least squares, are they the same. I like what you started with here and upvoted, but still my questions as asked here remain. I am hoping you or someone can offer further insight into the use of one or the other for channel estimation and if they are equivalent or not. When to use either if different. – Dan Boschen Apr 03 '23 at 11:44
  • They are different in the few first last rows of the matrix $X$ in my answer. That's it. The only difference in the matrix is the rows handling the boundary conditions, this is what I wrote. – Royi Apr 03 '23 at 11:46
  • So what happens if we choose to use one or the other approach for channel estimation, given they are different? Is one better over the other? This is what I am getting at. What is the impact or practical consideration for use? Thanks! – Dan Boschen Apr 03 '23 at 11:48
  • @DanBoschen, You're implying about adaptive filters (LMS). They have 2 motivations: Compute resources (Usually not relevant in our days) and being adaptive to changing models (Not stationary). For changing models Wiener Hopf is not relevant, as it was developed for the stationary case. We can say that if there is no change for long enough time it will converge, but the idea is just being adaptive (And there are much faster convergence methods than just LMS). – Royi Apr 03 '23 at 12:57
  • 1
    Your formula near the bottom says $ \mathscr{h} = \left( \mathscr{D} \boldsymbol{y} \right) \oslash \left( \mathscr{D} \boldsymbol{x} \right) $ and then says $ \hat{\boldsymbol{h}} = {\mathscr{D}}^{-1} \left( \left( \mathscr{D} \boldsymbol{y} \right) \oslash \left( \mathscr{D} \boldsymbol{x} \right) \right) $. So $\hat{\mathscr{h}} = \mathscr{D}^{-1}\mathscr{h}$? I had trouble following that and its conclusion. – Dan Boschen Apr 05 '23 at 12:35
  • 1
    I guess the different between \mathscr{h} ($\mathscr{h}$ - The DFT samples) and boldsymbol{h} ($\boldsymbol{h}$ - The time domain samples) isn't clear enough. But anywhere you see $\mathscr{D}$ you can replace it with DFT(). So $ \hat{\boldsymbol{h}} = {\mathscr{D}}^{-1} \left( \left( \mathscr{D} \boldsymbol{y} \right) \oslash \left( \mathscr{D} \boldsymbol{x} \right) \right) $ means $ \hat{\boldsymbol{h}} = IDFT \left( DFT(\boldsymbol{y}) \oslash DFT(\boldsymbol{x}) \right) $. – Royi Apr 05 '23 at 13:20
  • 1
    Ah so $h = DFT(\boldsymbol{h}$)? If so that is slightly confusing. – Dan Boschen Apr 05 '23 at 13:23
  • @DanBoschen, I guess it is :-). But the derivation is full. – Royi Apr 05 '23 at 13:30
1

This is not so much an answer as some observations that may help. Using Dan's notation, let:

  • $A = $ the convolution matrix of $x$ s.t. $Ah = x*h$
  • $A^H = $ Hermitian transpose (or "adjoint") of $A$ (for real numbers, $A^H = A^T$)
  • $A^\dagger$ = pseudo-inverse of $A$
  • $X = DFT(x[n])$
  • $H = DFT(h[n])$
  • $Y = DFT(y[n])$
  • $\oslash = $ element division (./ in MATLAB)
  • $\odot = $ element (Hadamard) product
  • $* = $ convolution operator

Then,

$$ y = Ah + n $$

$$ \hat{h} = A^{\dagger} y $$

Ignoring aliasing etc. for now (which could be a big deal!),

$$ Y = X \odot H + N $$

$$ \hat{H} = Y \oslash X $$

If the last equation is truly the inverse of convolution in the frequency domain, then: $$ \hat{h} = IDFT(Y \oslash X) = A^\dagger y $$

The un-regularized solution using the Moore-Penrose definition of $A^\dagger$ is $$ \hat{h} = (A^HA)^{-1}A^H y $$

But this can be unstable, so it is better to use the Tikhonov Regularized Least Squares solution which essentially adds some diagonal loading: $$ \hat{h} = (A^H A + \sigma I)^{-1}A^H y $$

I believe that Dan's frequency domain solution ($\hat{h} = IDFT(Y \oslash X)$) after compensating for nulls in $X$ (which I've typically heard referred to as regularized de-convolution) is roughly equivalent to the regularized least squares solution given above.

Note on Matched Filtering

It's worth noting that the matched filter approximates the solution to $Ah = y$ by ignoring the auto-correlation portion of the least squares solution:

  • Least squares: $\hat{h} = (A^HA)^{-1}A^H y$
  • Matched filter: $\hat{h} = A^H y$

This amounts to assuming that that adjoint of convolution ($A^H$) approximates the inverse of convolution ($A^\dagger$). For imaging applications at least, this is often a useful approximation. This is what I was trying to explain in my answer here (perhaps unsuccessfully).

Edit

I think the heart of the matter is that convolution is diagonalized by the Fourier Transform: $D A = \Lambda$, where $D$ is the DFT matrix, and $\Lambda$ is a diagonal matrix with values $\lambda_1, \lambda_2,... \lambda_N$ on the diagonal. The inverse of a diagonal matrix is another diagonal matrix with the reciprocal values ($1/\lambda_k$) on the diagonal.

Thus, it is easier to invert $A$ after taking the DFT. Once we do that, we can take the inverse DFT to go back to the time domain. You could evaluate this numerically to show that $D^{-1} A D = (A^H A)^{-1} A^H$.

The devil is in the details though, and you have to take into account circular convolution vs. linear, etc.

Edit 2

Looks like @Orchi_d has developed this same point more fully in his answer.

Gillespie
  • 1,767
  • 4
  • 26
  • Thanks Gillespie, both interesting and useful. Could you add a key to your variance symbols? I am still hoping to see the mathematical bridge that ties the auto correlation and cross correlation operations in one approach to the inverse DFT and DFT product in the other, which we know is just a convolution. I am going to resort to a simulation to confirm if they are indeed the same (or not). – Dan Boschen Apr 03 '23 at 19:57
  • 1
    Thanks for the update. I meant various symbols, not “variance symbols” but I think you figured that out. All clear now. – Dan Boschen Apr 03 '23 at 21:25
  • @Dan, I think if we accept that $A^\dagger y$ and $Y \oslash X$ invert the convolution operation, then that shows qualitatively that they are equivalent. But that's not perfectly rigorous I guess. – Gillespie Apr 03 '23 at 21:31
  • yes it does suggest that - it looks like Orchi_d provided the rigor but I appreciate the efforts / contributions from all of you. On equivalency as I commented just now to Orchi_d- if we zero pad the DFT out 2x to be linear convolution then it is truly equivalent… is that how you see it? – Dan Boschen Apr 03 '23 at 21:33
  • I think that's correct, because zero-padding enough allows you to effectively perform linear convolution via FFT circular convolution. – Gillespie Apr 03 '23 at 22:20
  • @Gillespie, I don't see the added value of the answer (By Orchi_d as well) over what I did. My last sentence said exactly that, the solution is the same under the assumption of periodic. It is better to ask to derive things more explicitly form the poster or edit. – Royi Apr 05 '23 at 09:37
1

This is not a complete answer but a companion simulation to the other answers showing that the two approaches when used for a practical application are not identical as concluded in some of the other answers. Further details are needed on how to make them identical (if they can be) for such a practical case):

I have expanded on an existing simulation / demonstration I already had showing the use of the Wiener-Hopf equations for equalization of a QPSK Modulated signal under a multi-path (dispersive) channel condition. There is no additional noise added, only the inter-symbol interference of the channel. (If I saw the approaches did match, my next step was to add noise to see how they each held up; I didn't get that far). As described in the post I linked in the OP, we can use the equations for channel estimation (instead of compensation) by swapping Tx and Rx in the equation.

That said, I introduce the QPSK signal used as a spectrum shown below:

QPSK Spectrum

The frequency response of the channel used is as follows (the channel would be the unknown we seek, so this is "the reveal" of truth):

Channel Frequency Response

The spectrum of the Rx signal as received is shown below:

Rx Spectrum

Here is the Python code I used for channel estimation:

def convmtx(h,n):
    # creates the convolution (Toeplitz) matrix
    # h is input array, 
    # n is length of array to convolve 
    return linalg.toeplitz(np.hstack([h, np.zeros(n-1)]), np.hstack([h[0], np.zeros(n-1)]))

omit = 100 # initial samples to exclude ntaps = 25 # number of coefficients to capture impulse resposne depth = ntaps * 80 # length of waveform to use for Least Squares solution delay = ntaps//2 A = convmtx( tx_shaped[omit:omit+depth], ntaps) R = np.dot(np.conj(A).T, A) X = np.concatenate([np.zeros(delay),rx_distort[omit:omit+depth], np.zeros(int(np.ceil(ntaps/2)-1))]) ro = np.dot(np.conj(A).T, X)
channel_coeff = np.dot(linalg.inv(R),ro)

Below is the frequency magnitude response of the recovered channel. Both magnitude and phase are very important for channel estimation and equalization, but the magnitude only will give us an initial indication of possible matching. Note, as I would expect, where there is signal energy in the spectrum we will get a very good estimate of the actual channel, but not elsewhere. (It is for this reason that we use sounding waveforms for channel estimation that are spectrally rich).

Wiener-Hopf Channel Estimation

Proceeding with the alternate FFT approach, using the same waveform segments as done above, I get a much noisier result. The Python code I used for the FFT method was as follows:

X = fft.fft(tx_shaped[omit:omit+depth], 2*depth)
Y = fft.fft(rx_distort[omit:omit+depth], 2*depth)
channel = fft.ifft(Y/X)

FFT Approach

Conclusion: The two approaches do correctly estimate the channel where signal energy exists, but the results are clearly not identical even though they are using the same waveform segments for both cases. I think there is a lot more "devil in the details" in terms of using the direct FFTs for channel estimation for ensuring the variance in the estimated result is minimized. I do not yet however see the flaw in Orchi_D's derivation as well as any of the helpful details provided by Royi and Gillespie. However, before accepting an answer, I need to reconcile the conclusions given with my simulation of a practical application.

I assume some subsequent block averaging of smaller FFT blocks (somewhat similar to Welch’s method) would result in a closer result for the FFT method compared to Wiener-Hopf— but on its own it is interesting how this apparent averaging of the noise takes place which still isn’t clear to me between the methods as presented.


Additional details upon request:

from orchi_d: If x is N samples and y is N+M-1 samples, you can zero pad x, and take the first M points from the IDFT (since the length of h is M samples), and compare that to the result of the least squares solution. Can you share that plot?

Let me first add the plots that include the actual channel response I used and estimated channel response (at least the magnitude of those):

For Wiener Hopf: Wiener Hopf Actual and Estimated Channel Responses

For FFT approach:

FFT Actual and Estimated Channel Responses

Notice all the energy in the estimated channel response at 2000 samples and 4000 samples (the duration of x was 4000 in this case, and this was used for y as well and both were padded out to twice the length). The length of h used in the Wiener-Hopf appraoch was 25 samples. So I believe Orchi_d is suggesting I try x as 3975 samples and y as 4000 samples and still zero pad both to 8000 samples, and then only use the first 25 samples of the inverse DFT result. This results in the following:

modified FFT Actual and Estimated Channel Responses

It is interesting in the plot above what it does to the energy at 2000 and 4000 samples, such smearing just by removing the last 25 samples of x! Using just the first 25 samples of the IFFT as Orchi_d suggested did improve the estimate (still no where near as good as Wiener Hopf, and perhaps not near as good as if I averaged the adjacent bins in the original FFT result). An obvious question is how the channel estimate compares if using just the first 25 samples from the original IFFT processing, this is below, confirming that it is significantly worst:

channel response with first 25 samples using original FFT method

Dan Boschen
  • 50,942
  • 2
  • 57
  • 135
  • 1
    Not really relevant here, but I'm curious to see how this fares when using standard transfer function estimates (so instead of $\texttt{IFFT}(\frac{\texttt{FFT}(Y)}{\texttt{FFT}(X)})$, use $\texttt{IFFT}(\frac{P_{yx}}{P_{x}})$ – Jdip Apr 05 '23 at 03:07
  • @Jdip Yes! I'm curious too, good comment; maybe I can add that to my sim on my next spare time cycles.... with $P_{yx}$ and $P_x$ computed using FFT's. Note too, important to $P_{yx}$ consideration, that in what was done here, NO noise was added. – Dan Boschen Apr 05 '23 at 03:13
  • Ah! That is important to note, then might not be particularly useful... I'll still be looking out! – Jdip Apr 05 '23 at 03:46
  • I think this comparison requires a different question. As a side note, numerically, this is not a good way to estimate the parameters. I am talking about the code which at its end you estimate channel_coeff using the direct Wiener method. – Royi Apr 05 '23 at 14:24
  • @Royi the code done using the “direct Wiener method” resulted in an excellent match where the occupied signal was. My issue is in trying to do it with the “equivalent” FFT approach; I think it is somewhere there that wasn’t a good way to estimate the channel. This comparison was done to confirm the conclusions in the other answers here, articulating the use case for me. However improving the numerical stability in my approach to using W-H would be interesting to me - that would be a good other question, but not the question I am asking here, nor do I think that suggests the mismatch here – Dan Boschen Apr 05 '23 at 14:39
  • All I said that the code is not the proper way, numerically, to solve the problem. Even if you get in this specific case a good solution. That's all I'm saying. – Royi Apr 05 '23 at 14:48
  • I always get a good solution, but I think it is because my sounding waveform is whitened over spectrum of interest so the autocorrelation matrix is dominant on the main diagonal. Regardless as you said, that is a side detail; a different question if I wanted to ask it. Thanks for your thoughts and attempts to answer my original question. I tried to clarify what is missing and hoping someone can answer. – Dan Boschen Apr 05 '23 at 17:20
  • @DanBoschen, I don't think anything is missing. You're building 2 different models and surprised they yield 2 different answers. I don't get why. I think I wrote many times. The convolution model you use in the code is different from the convolution model in my derivation of the LS solution. My suggestion for Overlap and Add was a completely different context. It was in the context of solving the LS problem using iterative method which requires the convolution operator and its adjoint (Correlation). – Royi Apr 06 '23 at 05:57
  • @Royi well I am left confused. I realize now they are not the same but I don't see that in your answer (we can zero pad the FFT such that both are linear convolutions, no?). It's not only that they are not the same, but one approach (W-H) works quite well for channel estimation in my application, where the FFT approach, which you show is a least squared estimate of the channel under circular correlation, does not estimate my channel well at all. In the answer as it stands, there is no real explanation of this. – Dan Boschen Apr 06 '23 at 06:26
  • 1
    @DanBoschen, It works well because you generate the data using different model of convolution. If you generate the data using the cyclic convolution, results will be different. It is all about the model, not about the solution. – Royi Apr 06 '23 at 06:29