0

Lets take an algorithm as an example, the DCT-II used in JPEG. The formula is as follows:

enter image description here

enter image description here

Now the thing is that, the values in M are not integers and many are even less than 1. We have two options: 1) Fixed point maths and 2) Floating point maths. We can also see that the data input is multiplied with M and also its transpose. This implies a lot of multiplication and addition operations.

Lets assume that we are going to use 16-bit fixed point representation for M, this means that all values in M will be scaled by 2^16 for the arithmetic and then scaled down. This means that we will have a lot of rounding errors in representing M as fixed point number and then also rounding errors when we complete 1/2 of the calculation (either M*V or V*Mt) and scaling up and down, since the output from each matrix multiplication is supposed to be integer. How exactly can a person calculate the precise amount of error fixed point maths using certain number of bits and a rounding method will introduce into the result?

The question is more suitable for a mathematician but I have posted it here.

quantum231
  • 495
  • 2
  • 5
  • 13
  • If you have a closed form, why don't you directly calculate the error? – Gideon Genadi Kogan Jun 30 '23 at 08:26
  • I have absolutely no idea how to calculate error based on fixed point math bit length and chosen rounding methods. The calculalation is applied in two steps, first we do M*V, then we multiply the result with Mt. The output from these two stages is rounded. The M and Mt are both scaled up and rounded to fit integer values. There is so much going on that I don't know where to start and how to do this. – quantum231 Jun 30 '23 at 09:45
  • Are you interested in the final value or in the analytic expression? – Gideon Genadi Kogan Jul 01 '23 at 16:51
  • I am interested to know the process of calculating error for something like this, I am not sure if this is something taught in DSP courses or electronic engineering or computer science or mathematics but I really don't have any idea how to do this propoerly and step by step, arrive at a final value for the error. Then, I want to change the assumptions e.g increase fixed point value bit length or change the rounding method, and see what happens to the final error. I am willing to pay someone to teach me how to do this. – quantum231 Jul 01 '23 at 16:59
  • 1
    man, I am a mechanical engineer :). See my suggestion and me if this mechanics work for you :) – Gideon Genadi Kogan Jul 03 '23 at 06:28

1 Answers1

1

In general, you might want to look into the roundoff error. For your case, you can estimate the error by defining the 2D DCT as $$X_{k_1,k_2}=\sum_{n1=0,n2=0}^{n_1=N_1-1,n_2=N_2-1}x_{n_1,n_2}W_{n_1,n_2}^{k_1,k_2}$$ where $$W_{n_1,n_2}^{k_1,k_2}=\cos\left(\frac{\pi}{N_1}\left(n_1+0.5\right)k_1\right)\cos\left(\frac{\pi}{N_2}\left(n_2+0.5\right)k_2\right)$$ To estimate the error, use the fixed point representation $\left(x_{n_1,n_2}\right)_{fix}$ and $\left(W_{n_1,n_2}^{k_1,k_2}\right)_{fix}$ and calculate the error directly by $$\epsilon_{k_1,k_2}=\sum_{n1=0,n2=0}^{n_1=N_1-1,n_2=N_2-1}x_{n_1,n_2}W_{n_1,n_2}^{k_1,k_2}-\sum_{n1=0,n2=0}^{n_1=N_1-1,n_2=N_2-1}\left(x_{n_1,n_2}\right)_{fix}\left(W_{n_1,n_2}^{k_1,k_2}\right)_{fix}$$

Now, you can see the error is a funciton of the input signal. For the analytical direction can look for upperbounds to this expression. Another direction is looking for the variance of the error, as for many cases the expected error is 0. Both direction are legit and it is up to your application.

Obviously, you can also use computational methods such as monte-carlo or direct estimation for specific case.

Gideon Genadi Kogan
  • 1,156
  • 5
  • 17