Implementation of an IIR filter with integer coefficients

Question

I'm trying to figure out the implementation of IIR filters with integer coefficients, but I've run into a problem that I can't figure out.

There is an excellent post on this topic, and I took it as the basis for my implementation. To explain my whole problem, I will try to be consistent.

I needed to make two filters: a 150Hz low-pass filter of 6 orders and a 0.5Hz high-pass filter of 2 orders. Sampling frequency 4 kHz. The input signal has a capacity of 12 bits.

Using the FDAtool package I generated the coefficients for these filters:

150Hz low-pass filter of 6 orders
Section 1:
B0=0.01302789311609
B1=0.02605578623218
B2=0.01302789311609
A1=-1.833933390648
A2= 0.8860449631123
Section 2:
B0=0.01185768264324
B1=0.02371536528648
B2=0.01185768264324
A1=-1.669203142931
A2=0.7166338735042
Section 3:
B0=0.01127306594753
B1=0.02254613189506
B2=0.01127306594753
A1=-1.586906790832
A2=0.6319990546216
0.5Hz high-pass filter of 2 orders
Section 1:
B0=0.9994447938167
B1=0.02605578623218
B2=-1.9988895876334
A1=-1.99888927938
A2=0.9988898958874

Next, I converted these coefficients into Q2.30 format:

int32_t ilp500_sek1_a[2]={-1969170942,  951383551},ilp500_sek1_b[3]={13988593,  27977187,  13988593},
        ilp500_sek2_a[2]={-1792293247,  769479743},ilp500_sek2_b[3]={12732089,  25464179,  12732089},
        ilp500_sek3_a[2]={-1703928191,  678603839},ilp500_sek3_b[3]={12104361,  24208723,  12104361},
        ihp5_sek1_a[2]={-2146291019,  1072549857},ihp5_sek1_b[3]={1073145674, -2146291348,  1073145674};

Calling filters:

lp_filter_int(ilp500_sek1_x,ilp500_sek1_y,ilp500_sek1_a,ilp500_sek1_b,ilp500_sek1_se_out,((int64_t)res_adc));
lp_filter_int(ilp500_sek2_x,ilp500_sek2_y,ilp500_sek2_a,ilp500_sek2_b,ilp500_sek2_se_out,ilp500_sek1_y[0]);
lp_filter_int(ilp500_sek3_x,ilp500_sek3_y,ilp500_sek3_a,ilp500_sek3_b,ilp500_sek3_se_out,ilp500_sek2_y[0]);
itemp[0]=ilp500_sek3_y[0];
lp_filter_int(ihp5_sek1_x,ihp5_sek1_y,ihp5_sek1_a,ihp5_sek1_b,ihp5_sek1_se_out,ilp500_sek3_y[0]);
itemp[1]=ihp5_sek1_y[0];

Next, I tried several options for implementing these filters on a microcontroller. The first option, when I always discarded 30 bits after the decimal point:

void lp_filter_int (int64_t *x ,int64_t *y,int32_t *aa ,int32_t *bb, int64_t *seout ,int64_t in)
{
    x[0]=in;
    y[0]=(x[0]*(int64_t)bb[0] + x[1]*(int64_t)bb[1] + x[2]*(int64_t)bb[2] - (int64_t)aa[0]*y[1] - (int64_t)aa[1]*y[2]);
if (y[0] &gt; 0x1fffffffffffffff)
{
    y[0] = 0x1fffffffffffffff;
}
if (y[0] &lt; -0x2000000000000000)
{
    y[0] = -0x2000000000000000;
}
if (y[0]&lt;0)
{
    seout[0]=-(y[0] &amp; 0x3FFFFFFF);
    y[0]=-y[0];
    y[0]=y[0]&gt;&gt;30;
    y[0]=-y[0];
}
else
{
    seout[0]=(y[0] &amp; 0x3FFFFFFF);
    y[0]=y[0]&gt;&gt;30;
}
x[2]=x[1];
x[1]=x[0];
y[2]=y[1];
y[1]=y[0];

}

Here is the result. This graph shows the output of the low pass filter. For comparison, on the left there will always be a graph of the same filters, but with a float implementation. As you can see, the signal is similar, but due to discarding 30 bits, some information is lost.

High-pass filter output graph: Here you can see that the slight change in the signal is completely lost. After this, I concluded that this type of filter operation does not suit me. Next, I turned to the post to which I referred at the very beginning and took from there an example where the bits after the decimal point (in my case it’s 30 bits) are not simply discarded, but saved and then added with each new calculation of y[0].

void lp_filter_int (int64_t *x ,int64_t *y,int32_t *aa ,int32_t *bb, int64_t *seout ,int64_t in)
{
    x[0]=in;
    y[0]=(seout[0] + x[0]*(int64_t)bb[0] + x[1]*(int64_t)bb[1] + x[2]*(int64_t)bb[2] - (int64_t)aa[0]*y[1] - (int64_t)aa[1]*y[2]);
if (y[0] &gt; 0x1fffffffffffffff)
{
    y[0] = 0x1fffffffffffffff;
}
if (y[0] &lt; -0x2000000000000000)
{
    y[0] = -0x2000000000000000;
}
if (y[0]&lt;0)
{
    seout[0]=-(y[0] &amp; 0x3FFFFFFF);
    y[0]=-y[0];
    y[0]=y[0]&gt;&gt;30;
    y[0]=-y[0];
}
else
{
    seout[0]=(y[0] &amp; 0x3FFFFFFF);
    y[0]=y[0]&gt;&gt;30;
}
x[2]=x[1];
x[1]=x[0];
y[2]=y[1];
y[1]=y[0];

}

Here is the result. This graph shows the output of the low pass filter. As you can see, the low-pass filter graph completely repeats the float implementation.

Several graphs of high-pass filter output: This is where the problems started. The high-pass filter must cut out the DC component from the signal. But with such an implementation, there is some kind of constant component, and also some influence can lead to random behavior of the signal.

Next, I again turned to the post to which I referred at the very beginning and it described an interesting trick to subtract 1 from the coefficient a1. I tried to do the same, but the filter did not work at all. I didn't quite understand how this trick should work, but I tried to do everything as in the example.

void lp_filter_int (int64_t *x ,int64_t *y,int32_t *aa ,int32_t *bb, int64_t *seout ,int64_t in)
{
    aa[0]=aa[0]+0x40000000;
    x[0]=in;
    y[0]=((seout[0]+(y[1]<<30)) + x[0]*(int64_t)bb[0] + x[1]*(int64_t)bb[1] + x[2]*(int64_t)bb[2] - (int64_t)aa[0]*y[1] - (int64_t)aa[1]*y[2]);
if (y[0] &gt; 0x1fffffffffffffff)
{
    y[0] = 0x1fffffffffffffff;
}
if (y[0] &lt; -0x2000000000000000)
{
    y[0] = -0x2000000000000000;
}
if (y[0]&lt;0)
{
    seout[0]=-(y[0] &amp; 0x3FFFFFFF);
    y[0]=-y[0];
    y[0]=y[0]&gt;&gt;30;
    y[0]=-y[0];
}
else
{
    seout[0]=(y[0] &amp; 0x3FFFFFFF);
    y[0]=y[0]&gt;&gt;30;
}
x[2]=x[1];
x[1]=x[0];
y[2]=y[1];
y[1]=y[0];

}

Here is the result. This graph shows the output of the low pass filter. High-pass filter output graph As a result, I'm completely confused and don't understand what I'm doing wrong. I realized that if I discard the part after the decimal point (30 bits) every time, I lose a lot of information. If these 30 bits are saved and then added, then the low-pass filter works fine, but something incomprehensible happens to the high-pass filter... And the implementation of subtracting coefficient 1 from a1 is not clear to me at all. I want to understand these examples, what am I doing wrong?

Or maybe there are any other ways to implement an integer filter as similar as possible to a filter with float coefficients?

If the floating-point IIR filter has poles near the unit circle, the fixed-point conversion may end up shifting those poles to be outside the unit circle, leading to instabilities. One potential solution would be to use a multistage FIR filter instead. — Ash, Dec 14 '23 at 19:12

Implementation of an IIR filter with integer coefficients

0 Answers0