Fixed point restrictions with respect to DSP

Question

I assume that fixed-point arithmetic can handle most of the traditional linear DSP tasks. As far as I know, there is a restriction for the FFT length with respect to the fixed point bit-depth.

Is there any common knowledge regarding other DSP applications that the fixed point can not handle? I assume that non-linear applications such as kurtosis might become sensitive to the rounding error but I am trying to gether a more solid knowledge of this topic.

score 3 · Answer 1 · 2022-10-24T17:03:26.333

3

I don't think that there are any 'DSP applications that the fixed point can not handle.' Digital designers carefully take the accuracy and range of fixed-point numbers into consideration.

Fixed-point numbers are continuously scaled up to avoid bit loss, and down to avoid bit overflow. This happens after every single mathematic operation to maximize the number of bits in use. During this time, the designer keeps track of the scaling up and down. It is a very meticulous, thoughtful, and intentional process.

At the end of the algorithm, whatever remaining scaling hasn't been corrected, the designer can either perform one scaling operation, or put the results in block notation.

Fixed-point is very powerful. It can lead to greater resource effeciency and can even decrease total error. However, it requires much more careful planning and design than floating-point. Which can be slow and costly for a project.

This is why most companies break the job into two positions: algorithm development and digital design. Then there are people like myself who live in the middle between the two.

In conclusion, I don't think your assumption is correct. Not in my experience, at least. And I have worked on some very complex algorithms on very resource restricted platforms.

Hope this helps! Let me know if I can clarify anything.

edited Oct 24 '22 at 17:03

answered Oct 24 '22 at 16:57

you're right as long as what you only want to ensure is that the dynamic range of a signal being preserved. However, if you want to transport the scale whilst still preserving the dynamic range, you'd have to lug around a fixed-point number and the factor by which e.g. the AGC scaled the receive signal. So, that's pretty much the description of a floating point number! So, yes, floating point numbers can do more with the same number of bits – for the right problem class. – Marcus Müller Oct 24 '22 at 18:38
Hi, Marcus. Perhaps I was not clear in my answer. I did not mean to covey that scaling factors are stored. It would be something hard-coded into the design. A very simple example of this is if you know that two unsigned numbers are being multiplied, you can bit-shift left by 1. This gains one bit of accuracy but distorts the final result by a factor of 2. At the end of the algorithm you could shift right by one to restore scaling. – Oct 24 '22 at 21:40
Now when it comes to operations such as accumulation (like a single-term of the DFT), the bits grow at a log scale. This information can be used to increase accuracy while preserving bit a fixed number of bits. Then the final result can now be shifted back using this log factor. – Oct 24 '22 at 21:45
Yes, but if you have signal that might span a large range, then having all of the bits to represent both the strongest as well as the weakest with sufficient signal-to-quantization noise-ratio takes very many bits in fixed point, and few in floating point. – Marcus Müller Oct 24 '22 at 21:46
While that is technically true, I always assume the signal is normalized by the AGC, and I never keep the AGC scaling factor because I'm more interested in its relative strength than its true strength. Even if the AGC scaling factor was kept, it could just be carried to the final result.
I suppose that my point here is that fixed-point doesn't necessarily have these hard limits. It just requires a much more thoughtful design.
– Oct 24 '22 at 22:03
I disagree, as explained above. QSNR matters, and thus, for large range applications, AGC or not, fixed point is not the necessarily the optimal choice, after you put in enough deliberation. – Marcus Müller Oct 24 '22 at 22:05
Fair enough. I perhaps haven't processed enough signals outside of my area of expertise to have the scope that you do. – Oct 24 '22 at 22:06
Didn't mean to insult you! Sorry if I did, with "after enough consideration" I meant that you can look at the bit lengths necessary for a fixed point implementation fulfilling your QSNR requirements and conclude that you can achieve the same or better with fewer bits in a floating point representation. Example: – Marcus Müller Oct 24 '22 at 22:16
Not at all, my friend! I am human and only have so much experience. I am always willing to push my understanding. Sorry, if it came off as passive-aggressive. I meant it sincerely. Text is just hard to convey sincerity. – Oct 24 '22 at 22:20
You receive a signal and an interferer. You know the SINR is at most 40 dB, so you decide that 49 dB SQNR must be enough. For a full-scale signal with difficultly well-behaving amplitude distribution, that means 6 bit fixed point are enough. However, you don't know the strength of this signal that you receive - is it close to using your whole dynamic range, or is it 80 dB weaker die to free space path loss? If it is weak, scaling the output of the band selection filter will amplify the quantization noise as well as the thermal noise, and thus comes with a QSINR degradation. You don't want to! – Marcus Müller Oct 24 '22 at 22:21
So you keep that filter output like it is -the absolute amplitude doesn't matter anyway, you're doing 8-PSK on that, only phase matters. In order to still get the good QSNR, you represent the signal within the filter as a prefactor power of two, times a fixed point number. That is pretty much the idea behind IEEE754 floats. – Marcus Müller Oct 24 '22 at 22:24
Ah don't worry you didn't come across aggressive at all! All good! – Marcus Müller Oct 24 '22 at 22:24
Any book you suggest looking into? I have found https://www.amazon.com/Fixed-Point-Signal-Processing-Synthesis-Lectures/dp/1598292587 but it seems to be pretty shallow – Gideon Genadi Kogan Oct 25 '22 at 08:32
Not really, to be honest. It appears in several of my texts throughout the years but only briefly. A lot of what I learned was online. The only source, in particular, that I would recommend is this: https://zipcpu.com/dsp/2017/07/21/bit-growth.html – Oct 25 '22 at 22:24
If you have an extra $40 I say check out the book. I like the description. – Oct 25 '22 at 22:26

score 2 · Answer 2 · answered Oct 24 '22 at 12:53

There are two things to consider here: Signal To Noise (SNR) ratio and Dynamic range. Floating point offers a constant signal to noise ratio over a very wide dynamic range. Fixed point has a very limited dynamic range and the SNR is a direct funtion of the signal level itself.

Fixed point is problematic wherever SNR and dynamic range are tricky to manage. A good linear example are IIR filters especially with poles close to the unit circle (which is very common in Audio for example).

Fixed point IIR filters require very careful management of second order sections (pole/zero pairing, order, gain staging, section topology, etc), coefficient quantization, rounding strategies to minimize noise but avoid limit cycles, clipping prevention, headroom, etc.

For that reason there are some "hybrid" algorithms and data formats, that are somewhere between fixed point and full floating point and can be optimized for a specific application.

Since you have given it as an example, seems that it was solved using Biquad Cascade IIR Filters - Direct Form I. I am troubled to understand the other scenarios, are all the DSP cards that do not support floating-point assume that I will implement an in-house floating-point type? — Gideon Genadi Kogan, Oct 25 '22 at 07:26
Of course you use cascaded second order section in either Direct Form I or transposed form II but there is way more to be sorted out. You need to prevent each section from clipping and manage noise accumulation which both depend on section ordering. There is also limit cycle prevention, noise shaping, etc. If you can live with regular fixed point, you use that. If you can't, you can use double precision states, block floating point, etc to make it work. — Hilmar, Oct 25 '22 at 12:54

Fixed point restrictions with respect to DSP

2 Answers2