2

I haven't done any signal processing in a while, but I'm completely stymied by the instability and bias in my implementation of a normalized least mean squares filter.

I checked the usual things:

  1. Am I actually near the convergence lower bound (variance of AWGN in test signal)? No.
  2. Is my step size below $\frac{2}{trace[\mathbf{R}]}$ (where $\mathbf{R}$ is just an outer product of the signal used as instantaneous estimate)? Yes.
  3. Is my dot product of windowed past samples using the convention of increasing delay indices ("reverse" order)? Yes.
  4. Am I using the previous signal index when I compute predicted value, not the current one? Yes.

I narrowed down my issue by implementing a simple 1-tap filter. My reference is the algorithm on page 324 of Haykin's Adaptive Filter Theory, and I'm using the test conditions on page 280 of the same:

  • Input: first-order AR process, $a = -0.99$ with noise variance $\sigma_u^2 = 0.93627$
  • Trial count: 100 trials
  • Signal length: 500 samples

Error

According to Haykin, I should be below $10^{-1}$ MMSE after less than 100 samples. That's not what I'm seeing, but I'm convinced it has to be something basic.

Here are MMSE and average absolute filter coefficient error, visualized:

error plot

The MMSE doesn't decrease exponentially, and the filter coefficient never comes closer than a distance of 2 from the actual AR process value.

And in fact, the MMSE is still much farther than the noise variance near the end of the signal:

[10.260998119102881, 16.060967587632792, 41.72963567568045, 49.232814770283696, 50.56247908944245, 84.29056731939356, 99.74962522053087, 74.94509263319398, 61.55708377260941, 43.79321013379245, 47.68084407515727, 51.84551183735604, 23.977556992678352, 23.691841969230886, 15.292586007185905, 12.710425182987175, 14.689863202271942, 15.03357374836029, 16.305144820997945, 21.770075252230768, 16.501486407946132, 21.62300216817481, 15.461623713499273, 26.354991476964795, 28.001121132610393, 21.632068009342056, 15.812522324768985, 15.522990356558456, 53.24514663841896, 9.36613637386399, 52.24856602614516, 98.50333720012102, 251.59105479441695, 193.85612892514402, 124.53671103656986, 159.16459102635278, 82.2544192379602, 50.25510472853462, 67.03420998875504, 9.59758344071307, 9.770861267272064, 21.975670899135086, 28.84699022842896, 29.070465314026674, 15.05395348320232, 20.20596174124456, 30.609319675498526, 14.396998393433874, 17.654986587869196, 9.664719322902583]

Filter code

To produce the above results, I tried implementing the simplest possible 1-tap NLMS filter I could, which is given below.

extern crate rand;

use std::f64::EPSILON;
use rand::distributions::{IndependentSample, Normal};

extern crate gnuplot;

use gnuplot::{Figure, Caption, Color};


fn main() {
  let trial_count = 100;
  let signal_len = 500;

  let noise_std_dev = 0.96761;
  // AR process coefficient [n-1]
  let process = [-0.99];

  let mut rand_gen = rand::OsRng::new().unwrap();

  let mut mmse: Vec<f64> = vec![0.0; signal_len];
  let mut avg_filt_err: Vec<f64> = vec![0.0; signal_len];

  let noise_dist = Normal::new(0.0, noise_std_dev);

  for _ in 0..trial_count {

    let mut sig_vec: Vec<f64> = Vec::with_capacity(signal_len);

    // Zero padding to start noise process
    sig_vec.push(0.0);

    for n in 1..signal_len {
      let noise = noise_dist.ind_sample(&mut rand_gen);
      let p1 = if n > 0 {
        -process[0]*sig_vec[n-1]
      } else {
        0.0
      };
      sig_vec.push(noise + p1);
    }

    let step_size = 0.05;

    let mut pred_vec: Vec<f64> = Vec::with_capacity(signal_len);

    let mut filt_coef: f64 = 0.0;    

    // Zero padding to start filter process
    pred_vec.push(0.0);

    for sig_ind in 1..signal_len {
      let pred = filt_coef * sig_vec[sig_ind - 1];
      pred_vec.push(pred);
      let err = sig_vec[sig_ind] - pred;
      mmse[sig_ind] += err.powi(2) / trial_count as f64;
      filt_coef += step_size * sig_vec[sig_ind - 1] * err /
        (sig_vec[sig_ind - 1].powi(2) + EPSILON);
      avg_filt_err[sig_ind] += (process[0] - filt_coef).abs() / trial_count as f64;
    }
  }

  let x: Vec<usize> = (0..mmse.len()).collect();
  let mut fg = Figure::new();
  fg.axes2d()
    .lines(&x, &mmse, &[Caption("MMSE"), Color("black")]);

  let mut fg2 = Figure::new();
  fg2.axes2d()
     .lines(&x, &avg_filt_err, &[Caption("Average filter error"), Color("red")]);
  fg.show();
  fg2.show();
}
  • Will a reference code assist you? I have a code which runs perfectly. – Royi Apr 29 '17 at 07:23
  • I suggest you to check your signal processing algorithm using Octave/Matlab before converting it into a more specific code... Which language is this, C++ 2011? – Fat32 Apr 29 '17 at 12:45
  • @Fat32 This is Rust. I was hoping to avoid Octave, but I might do that. – bright-star Apr 29 '17 at 15:30
  • @Royi Maybe? I am not sure if my problem is something conceptual or just a tiny bug in the code. – bright-star Apr 29 '17 at 15:30
  • Ok. Eventhough Rust seems to be a memory and thread safe language, you should still first check your algorithm using an Octave kind of interpreted language. If this step proves ok, then it's certain that Rust stage coding creates the error and then it's an off-topic question as a consequence. If the Octave step does not work, however, then you shall come and ask it here... Or else someone willing to answer your Rust involved question can still help. – Fat32 Apr 29 '17 at 17:40
  • I'll replace the code snippets with the Octave port I just wrote, and leave the links. – bright-star Apr 29 '17 at 18:05
  • I tried running the codes in Octave 4.0.2 and got the following error: error: lms: operator *: nonconformant arguments (op1 is 500x1, op2 is 500x1) error: called from lms at line 11 column 15, nlms at line 24 column 16.... Do you have any idea what that error is? And why that happened to my testing but not to your testing ? – Fat32 Apr 29 '17 at 19:35
  • @Fat32 I apologize, a change I had made to the slice_window function was missing. If you replace that code, you should be able to replicate my results (Octave 4.0.3) – bright-star Apr 29 '17 at 19:55
  • LMS.m script (implementing linear predictor) is returning lms_filter = 1.0 always. So this line is problematic, as it should return (I assume) the AR coefficient which is -0.98 in your test code The problem is either in the lms predictor or more probably in the redundant slice window script. Also your understading of filter coefficient and filter order is different. For example an AR-(1) process will have two coefficients, one of them being 1 and the other being a. But it somewhat depends on your coding style. As a matter of fact, the window shall include 2 samples not 1. – Fat32 Apr 29 '17 at 21:06
  • All of this Octave stuff is a red herring. I'll replace the question code with a basic simple implementation of a first-order LMS filter which shows the problem. – bright-star Apr 30 '17 at 01:06
  • 1
    I dont know Rust but as I intrepret your filter code it looks like a N-LMS FIR filter with one tap, a MA(1) model but your system is a AR(1). To be able to model the AR process reasonably good as an MA process you will need at least 100 taps (if you have a1=-0.99) – Claes Rolen May 04 '17 at 20:24
  • @ClaesRolen Yes... there's that, too, but I don't believe the implementation is correct either. – Peter K. May 05 '17 at 10:55

1 Answers1

2

TL,DR Summary:

  • Your code is in error because of the minus sign on process[0] in the signal generation if statement.
  • Once that is corrected for, the adaptive filter seems to converge in all cases.
  • The reason you're not seeing what Haykin says regarding the MMSE is because you are not using the desired signal $d[n]$ to form the error. All bets are off if you don't implement the algorithm the way the analysis of MMSE assumes; you'll have to redo the analysis.

So I've tried to translate your original code to R (because that's what I'm currently using mostly), and my first guess seems to be close to correct.

If I plot avg_filt_err at the end of the run using the original code, then I get the first picture below. If I change the sign when setting up the signal (as my first guess suggested) then the error reduces significantly (see second picture below).

This means I changed

  p1[n] <-  if (n > 1) -process[1]*sig_vec[n-1] else 0.0

to

  p1[n] <-  if (n > 1) process[1]*sig_vec[n-1] else 0.0

i.e. just the sign on process[1] was changed (R is 1-indexed, hence the [1] instead of [0] from your code).

The third and fourth pictures show the actual coefficient changing. It appears to converge close to -0.99 in the second case and diverges in the first.

Original code error

Error after modification #1

Coefficient change in original code

Coefficient change after modification #1

If I make my second suggestion, and use $d(n)$ instead of the received signal:

So change

 err <- sig_vec[sig_ind] - pred

to

 err <- p1[sig_ind] - pred

(after saving p1 when it's produced) then it gets what I would expect. Error and convergence of the coefficient plots follow.

Error after both modifications #1 and #2.

enter image description here

Finally, to use the noisy received signal (your suggestion below in the comments) then the error needs to be changed to:

err <- sig_vec[sig_ind] - pred

In that case, you can see why this is not a great solution: it works, but noise will keep throwing the estimate off and it has to re-converge. Top diagram is the error bottom diagram is the coefficient value.

Error for using the noisy received signal

Coefficient estimate


R Code Only Below

# extern crate rand;
#
#use std::f64::EPSILON;
EPSILON <- 10^(-6)
#use rand::distributions::{IndependentSample, Normal};
#
#extern crate gnuplot;
#
#use gnuplot::{Figure, Caption, Color};

#
#fn main() {
#  let trial_count = 100;
trial_count <- 100
#let signal_len = 500;
signal_len <- 500

#  let noise_std_dev = 0.96761;
noise_std_dev <- 0.96761
#  // AR process coefficient [n-1]
#let process = [-0.99];
process <- rep(-0.99, signal_len)

#let mut rand_gen = rand::OsRng::new().unwrap();

#let mut mmse: Vec<f64> = vec![0.0; signal_len];
mmse <- rep(0.0, signal_len)
#let mut avg_filt_err: Vec<f64> = vec![0.0; signal_len];
avg_filt_err <- rep(0.0, signal_len)

#let noise_dist = Normal::new(0.0, noise_std_dev);

 # for _ in 0..trial_count {
for (idx in 1:(trial_count+1 )) 
{
#    let mut sig_vec: Vec<f64> = Vec::with_capacity(signal_len);
    sig_vec <- rep(0.0, signal_len)

#    // Zero padding to start noise process
#    sig_vec.push(0.0);
    sig_vec[1] <- 0.0

    p1 <- rep(0.0, signal_len)

#    for n in 1..signal_len {
    for (n in 1:signal_len)
    {
#      let noise = noise_dist.ind_sample(&mut rand_gen);
      noise <- rnorm(1, 0, noise_std_dev)
#      let p1 = if n > 0 {
#        -process[0]*sig_vec[n-1]
#      } else {
#        0.0
#      };
      # was: 
      #p1[n] <-  if (n > 1) -process[1]*sig_vec[n-1] else 0.0
      p1[n] <-  if (n > 1) process[1]*sig_vec[n-1] else 0.0

#      sig_vec.push(noise + p1);
      sig_vec[n] <- noise + p1[n]
    }

#    let step_size = 0.05;
    step_size <- 0.05

#    let mut pred_vec: Vec<f64> = Vec::with_capacity(signal_len);
    pred_vec <- rep(0, signal_len)

#    let mut filt_coef: f64 = 0.0;    
    filt_coef <-  0.0

#    // Zero padding to start filter process
#    pred_vec.push(0.0);
    pred_vec[1] <- 0.0

#    for sig_ind in 1..signal_len {
    for (sig_ind in seq(2,signal_len))
    {
      #let pred = filt_coef * sig_vec[sig_ind - 1];
      pred <-  filt_coef * sig_vec[sig_ind - 1]
      #pred_vec.push(pred);
      pred_vec[sig_ind] <- pred
      #let err = sig_vec[sig_ind] - pred;
      err <- sig_vec[sig_ind] - pred
      #mmse[sig_ind] += err.powi(2) / trial_count as f64;
      mmse[sig_ind] <-  mmse[sig_ind] + err^2 / trial_count
      #filt_coef += step_size * sig_vec[sig_ind - 1] * err /
      #  (sig_vec[sig_ind - 1].powi(2) + EPSILON);
      filt_coef <- filt_coef + step_size * sig_vec[sig_ind - 1] * err /
        (sig_vec[sig_ind - 1]^2 + EPSILON)
      #avg_filt_err[sig_ind] += (process[0] - filt_coef).abs() / trial_count as f64;
      avg_filt_err[sig_ind] <-  avg_filt_err[sig_ind] + abs((process[1] - filt_coef)) / trial_count ;
    }

#  let x: Vec<usize> = (0..mmse.len()).collect();
#  let mut fg = Figure::new();
#  fg.axes2d()
#  .lines(&x, &mmse, &[Caption("MMSE"), Color("black")]);
#  
#  let mut fg2 = Figure::new();
#  fg2.axes2d()
#  .lines(&x, &avg_filt_err, &[Caption("Average filter error"), Color("red")]);
#  fg.show();
#  fg2.show();
}

plot(avg_filt_err)

Next guess: you don't seem to have a noiseless "desired" signal? You are forming err using the noisy received signal. You should be using a noiseless signal.

See figure 6.1 from Haykin, "Adaptive Filter Theory," Fourth Edition, Prentice-Hall, p321.

enter image description here

Nowhere in that diagram is

...the desired signal is just the actual signal at the time step the LMS is attempting to predict...

Can you please let me know where it says that?

The reason is because the LMS algorithm and its variants are often used in channel identification for mobile: the mobile system sends a known sequence of data at the beginning of the transmission, so that the $d(n)$ in the diagram can be known by both ends of the transmission --- noiselessly.

Using the actual received data is fraught with peril.


I'm thinking that the process is not being calculated correctly. The "difference of 2" sounds suspiciously like the NLMS is getting an answer of 0.99 instead of -0.99.

Try changing this line:

 -process[0]*sig_vec[n-1]

to

 process[0]*sig_vec[n-1]

in the setup of sig_vec and let us know the results.

Peter K.
  • 25,714
  • 9
  • 46
  • 91
  • The MMSE is still much larger than the noise variance with the change in sign, even at the end of the signal: 861.9797497569391, 484.4895182735712, 255.13432649759076. – bright-star May 01 '17 at 23:07
  • re: your second guess, I can't agree with the assumption that a "noiseless" signal is needed. As in Haykin, the desired signal is just the actual signal at the time step the LMS is attempting to predict, namely, the signal at sample n if the filter is tapping n-1, n-2.... For this reason the MMSE is bounded below by the variance of the distribution sampled to drive the AR process under examination. – bright-star May 03 '17 at 19:22
  • Haykin uses the desired, noiseless signal.... – Peter K. May 04 '17 at 00:01
  • Can you provide a citation for the claim that the signal used to compute the error is noiseless in Haykin? I have the 4th edition, so I can review whatever you point me to. – bright-star May 04 '17 at 01:06
  • @bright-star : Updated. Can you please tell me where Haykin says the desired signal is just the actual signal at the time step the LMS is attempting to predict I can't see it anywhere. – Peter K. May 04 '17 at 12:14
  • 1
    @PeterK Haven't fully read the question yet (and I tend to use Widrow, rather than Haykin) but if LMS is configured as an adaptive line enhancer, it's essentially trying to learn a prediction filter, so the "desired" signal comes from the signal itself (in the "future"). It's not noiseless, but the non-noisy part of the signal will be more predictable, and the output of the filter will be denoised. – datageist May 06 '17 at 13:25
  • @datageist Understood... but that's not the way this question was posed: the raw, unprocessed signal is being used in the implementation code so I figured I'd start with the direct approach before any enhancements to deal with the more realistic problem. – Peter K. May 06 '17 at 17:16
  • @PeterK. Gotcha, carry on :) – datageist May 06 '17 at 17:50
  • Now that I've had time to look, my question is actually the same as @datageist's: What if you're just trying to decorrelate the signal process? You're right that Haykin just specifies some $d(n)$, but is there some specific problem with $d(n) := y(n)$ outside of my implementation issues? – bright-star May 08 '17 at 16:12
  • @bright-star : See section 5.3 of Haykin to get a bit more detail for what datageist is talking about. The "desired" signal is now the predicted signal, and this is used to form the error. – Peter K. May 08 '17 at 16:44
  • You're right, I don't have a sinusoidal signal included, but my question is still, what's wrong with $d(n) := y(n)$? (N.B.: I have an OpenCL implementation that is running under this assumption and converging to the filter weight of the noise process.) – bright-star May 08 '17 at 21:08
  • Well, from the diagram in my answer, $d(n) = y(n)$ means $e(n) = 0$... so the adaptive filter won't change as there will be no driving error. – Peter K. May 09 '17 at 02:26
  • I'll be more clear and define $\hat{y}(n)$ to be the output of the adaptive filter, and $y(n)$ to be the output of the noise process, and $e(n) := \hat{y}(n) - y(n)$. How about now? – bright-star May 09 '17 at 14:24
  • This is making less and less sense to me. How do you get the noise process? Or do you mean that $e(n) = \hat{y}(n) - u(n)$ ? I think the best thing will be for me to do an implementation and post the code. I don't have the time now, but will try over the weekend. – Peter K. May 12 '17 at 12:57
  • Ok, implementation of all options done. Let me know if you need any more information. – Peter K. May 14 '17 at 15:19