Best Approach to Rank Complex Magnitude Comparision Problem

Question

This is in reference to the responses for an efficient algorithm for the comparison of bounded fixed point complex numbers at this post.

Efficient Magnitude Comparison for Complex Numbers

See the details of that post for the objectives of the problem. I am now determining my approach to ranking the algorithms to determine who best met the objectives I was seeking, and welcome debate on the ranking approach before I dive in.

Key qualifying factors:

I will baseline a standard CORDIC approach (rotate both vectors to real axis and compare absolute magnitude), as well as what can be done with the use of multiplier operations. The algorithm must be more efficient than either of these approaches (using the equivalent score for the multipliers - see below).

Algorithm must be 100% correct for for magnitude differences within $|z_2- z_1| \gt e$ for any e, where Where $z_n = \sqrt{I_n^2 + Q_n^2}$ with I and Q as bounded signed integers and e is any positive real number>0. (it is understood that it will likely take more operations as e decreases; in fact it would be attractive to be more efficient for large e). The CORDIC is a good example of this, you can select any error bound at the expense of the number of iterations needed.

Acceptable answers may return incorrect results for $|z_2- z_1| \le e$, but a bonus score is included for implementations that provide equivalence resolutions given by the following definitions, with a higher score for tight equivalence:

Loose Equivalence:

$|z_1| \gt |z_2| + e$ returns 1

$|z_1| \lt |z_2| -e$ returns -1

$|z_2- z_1| \le e$ returns 0

Tight Binary Equivalence:

$|z_1| > |z_2|$ returns 1

$|z_1| < |z_2|$ returns -1

Tight Ternary Equivalence:

$|z_1| > |z_2|$ returns 1

$|z_1| < |z_2|$ returns -1

$|z_1| = |z_2|$ returns 0

The function prototype is

result = CompareMagntitudes( I1, Q1, I2, Q2 )

With return values of either $-1,0,1$ corresponding to $<, =, > $ of the first compared to the second (and $0, 1$ for $<, \ge$ for binary solutions).

Test cases will be run with bit ranges from $b = 8$ to $b = 32$ bits for I and Q but nothing in the algorithm should prevent implementation with any size b.

Consider the following closely spaced complex points A, B, C, D as depicted in the diagram below ($A=3+j4$, $B=4+j4$, $C=3+j5$, $D4+j5$).

The true radii are A = 5, B =5.66, C = 5.83 and D = 6.403. In this case, the algorithm should provide a solution to resolve all 4 with 100% confidence (setting e to be $e \le 0.17$ corresponding to the minimum distance between B and C), however it would be acceptable and even beneficial if the algorithm allowed for larger e with an associated efficiency gain.

For example if $e=0.5$ then the following results must be returned using the format $f(z_1,z_2)$ with relation to the function prototype given above:

$f(A,B) \rightarrow -1$

$f(C,A) \rightarrow 1$

$f(A,D) \rightarrow -1$

$f(B,D) \rightarrow -1$

Since all those points have a difference in magnitude from the origin > 0.5.

However the following would be acceptable:

$f(B,C) \rightarrow X$

Where X can be 1, 0 or -1 since B and C have a difference in magnitude from teh origin < 0.5.

The algorithm must be implementable with only the equivalent of standard Boolean operations, binary shifts and compares. Look-up tables if used would add to buffer size considerations in the scoring.

QUESTION: Please suggest / justify alternative metrics (including alternate scoring the starting numbers I list in my answer, or completely different approaches. It is understood that ultimately there is a trade space and can't be a one size fits all simple score, so consideration is weighted toward the objectives in the original question.

Hi Dan! Hot topic! Aware of the last few days like a tornado... May I kindly ask you this: without doubt Electrical Engineering (DSP) is a best place to ask for efficient computations, however, you know efficient computing is also a central topic of computer science and engineering, applied math etc...don't you think that you should also ask this question in stackoverflow or some related place to get the most comprehensive answer ? In any case I just wonder what their response would be :-)) — Fat32, Jan 02 '20 at 05:18
It would probably be easier to suggest how to rank solutions if we had some more detail about your target implementation. You mentioned it will go in an ASIC, but it's not clear what kind of resources are available (e.g. how cumbersome is a LUT ROM? do you want a combinatorial solution or something that is iterative?, etc.). — Jason R, Jan 02 '20 at 05:20
@Fat32 Yes i had mentioned that (buried in all the comments) that I would likely also post on the mathematics site. I think it would be a good idea — Dan Boschen, Jan 02 '20 at 05:26
@JasonR size and power dissipation are the bigger constraints. Certainly lower iterations are more attractive for higher speed applications but there will be the lowest power applications where update rate will suffer (the typical trade). I see in explaining this to you that there will be a benefit toward being able to iterate (and re-use operations) which isn't really reflected below-- I will add a line to cover that detail. — Dan Boschen, Jan 02 '20 at 05:32
@JasonR But I generally wanted to avoid a Look-up-table solution in search of the best algorithm ----- We could ultimately just make a $b^4$ address 1 bit look-up table to get the 1 shot answer, right? — Dan Boschen, Jan 02 '20 at 05:38
@JasonR Iteration of interest would be similar to the buffer in place operation of the FFT not just re-using adders etc which means simply do the algorithm one step at a time and buffer results---so not really sure how to score that. — Dan Boschen, Jan 02 '20 at 05:44
Thanks @Olli I will fix that to be clearer --- ideally any algorithm does detect the difference but I wanted to allow for the possibility that the result has an error that is less than the quantization of precision and not penalize such a solution (if you need higher precision in that case simply increase the number of bits). It is clear too that the scoring for a software versus hardware implementation can be very different so perhaps we should have two scores for the Total Processing Steps. Adding CLZ thanks. — Dan Boschen, Jan 02 '20 at 14:13
@JasonR I thought about your first point and ideal targets would be either lowest cost microcontrollers or lowest cost FPGAs each with no dedicated multiplier resources. Ideally there could be two scores with two winners since one algorithm may be better suited in each case, but updated my answer below to break out the consideration for each separately so 25 points if better for software and 25 points if better firmware and 50 points if it can be structured to be better in each case. — Dan Boschen, Jan 02 '20 at 14:40
@OlliNiemitalo and Dan, I don't see how this accomodation is any way useful. The return value is either correct or it isn't. You aren't going to be able to include this adjustment in your final solution, so why score with it? $$ $$ Notice that the "Empty" case only scores 0.6% correct as it returns equal every time. My encoding of Olli's is getting 5.6% wrong, so most of it by far is from error and not from ignorig equal cases. $$ $$ Also, if adds are twice as fast as shifts according to your timing, why are they still costing twice as much? — Cedron Dawg, Jan 02 '20 at 14:47
@CedronDawg In practical application we don't care if the difference is less than the quantization. So I would much rather have speed and efficiency over resolving the difference between 10.0000005 and 10.000002 for example. This is speciific to my original direction that the error must be less than e for any given e (user increases the precision required to get to e). — Dan Boschen, Jan 02 '20 at 15:59
So if someone has a solution that just simply doesn't work when the vectors are in very close proximity, but the solution is significantly faster---this would be the approach of choice if those error cases are smaller than the precision used. That said, if one solution with all things being equal is able to resolve those cases- then that would be the better solution, hence I covered it as bonus points but not to exclude a potential solution that doesn't do that. — Dan Boschen, Jan 02 '20 at 16:02
I understand. This is why I think that a routine should return a 0 for too close to call cases. It is effectively equal. This would be most meaningful and a -1 or 1 would be a firm (always correct) answer. In your testing program, you should calculate the true difference, and if it is less than your threshold accept a 0 as a correct answer. That way if Olli insists on guessing and gets it right he won't be penalized. The price he pays is potentially wrong answers. — Cedron Dawg, Jan 02 '20 at 16:12
@CedronDawg yes that is my bonus "Loose Equivalence". Again even if it returned bad answers in those cases, those bad answers would be part of the quantization noise and of no consequence, so didn't want to overly restrict the possible solution space. — Dan Boschen, Jan 02 '20 at 16:19
@OlliNiemitalo Ah yes, thank you. Need to go to work but will think about how to update that. The intention is if the vectors are within a difference less than the quantization to not disqualify a solution that makes incorrect results in those cases in case it is significantly more efficient. — Dan Boschen, Jan 02 '20 at 16:22
@OlliNiemitalo Your graphic is very nice- just looking at that if we shift one 1/2 bit it makes the error ranges consistent---- will think more later. — Dan Boschen, Jan 02 '20 at 16:23
@OlliNiemitalo I have implemented a tolerance threshold in the case where a routine returns a zero. See the code. I think it fits your criteria. I do think that a wrong call of $<$ or $>$ should be penalized. It could have major consequences in a real life application. — Cedron Dawg, Jan 02 '20 at 16:31
@CedronDawg ultimately my interest is use under actual noise conditions so clearly in this case there would not be major consequences. The precision is chosen to exceed the expected minimum SNR such that the errors are inconsequential (the data itself will already be in error much of the time) — Dan Boschen, Jan 02 '20 at 16:38
But I include the bonus points to cover high SNR applications where a valid result is desired. So there is an award for that just didn’t want to disqualify a solution that doesn’t offer that (if I can factor out the steps to provide that feature separately like thresholding I won’t include it in the Bassline if not needed to meet the minimum requirements) — Dan Boschen, Jan 02 '20 at 16:41
In any higher algorithm that needs to operate on actual signals (with noise) that required absolute correct devising in those cases or would have major consequences would be a poor algorithm since with noise that is exactly what would occur regardless of how good the algorithm resolves it. — Dan Boschen, Jan 02 '20 at 16:44
I think you are misunderstanding my point. Suppose I have a control sensor. It fires the rockets one way on less than, the other on greater than, and idles on zero. Firing in the wrong direction is way worse of an error than not firing perhaps when you maybe should have if you were sensitive enough. In other words calling a 0 or a 1 as -1 or 0 or a -1 as 1 is potentially way worse than calling a close 1 or -1 a 0. A little noise shouldn't give you a directly opposite answer. — Cedron Dawg, Jan 02 '20 at 16:59
A little noise will give you an exactly opposite answer for the tight resolution case which is why I have the loose resolution one as well but in either case my overarching point is this feature of thresholding is less important and can be neglected in most applications I am thinking of— hence I didn’t want it to be a requirement but rather a bonus — Dan Boschen, Jan 02 '20 at 17:04
A binary response instead of trinary can be attractive to higher algorithms calling this so don’t want to discount that solution if it is significantly more elegant/efficient. — Dan Boschen, Jan 02 '20 at 17:11
But if your point is the looser equivalence (what I meant in my last comment) should be more valuable than the tight equivalence, I could see that— Just Not at the expense of another test just for that condition (would rather have the scoring work out to just leave it out if that was the case)—- But is it came out their way naturally in the processing, well that would be a cool feature! — Dan Boschen, Jan 02 '20 at 17:15
Maybe we should delete all the transitory back and forth discussion when we are done and have the content bottom lined in the posts and answers - I see how it really makes a mess of the post and why they encourage the chat room for discussion — Dan Boschen, Jan 02 '20 at 17:20
Personally, I don't mind long comment sections. They can be easily scrolled by. You are still missing my point. The testing program knows the truth about how far an answer is off, an application that will be using the routine won't. Tolerating a slight error, calling equals less than, or less than equal, or the vice versa is not a problem. Tolerating a diametrically wrong answer is. Big time. When the routine gets a -1, it has no idea of really close or way less than, but getting that value when it is actual greater than is the biggest error you can make. — Cedron Dawg, Jan 02 '20 at 17:34
I totally get your point— there are many applications where this doesn’t matter or can be separately tested for if it does. (Think Sigma Delta Converter for example). Under noise conditions with independent samples you can test multiple times. So just didn’t want to make it a qualifying requirement but rather a bonus feature. There are indeed big advantages to binary solutions for use in fixed point implementations that may call this — Dan Boschen, Jan 02 '20 at 17:44
This is why I think it is imperative that a routine only return -1 or 1 when it is sure, and return a 0 if it finds equality or can't resolve it. This is the most accurate way to go. My preference is to use a return value of 2 for the latter case. That way a calling program can truly trust a -1, 0, and 1. If it gets a 2, it can decide whether considering that zero is okay or if it needs further determination. It will never get it wrong! — Cedron Dawg, Jan 02 '20 at 17:47
Do you understand my Sigma Delta Converter analogy: it only provides results in +/-1 and not imperative to return 0. There are many similar applications so I was not making it a requirement. Specifically I would have more use out of a routine that only produces a binary result. — Dan Boschen, Jan 02 '20 at 17:52
Honestly, I am not that familiar with the details of ADC and DAC. If you have an application that is only expecting +/-1 and doesn't care if a few a wrong, yeah, you can tolerate a few mistakes. But if I am flying a 737 max with one sensor and a range of values, I want it to be right 100%. A calling routine expecting a +/-1 that gets a zero can simply ignore it, pick a random +/-1, default to either, or whatever. When it gets a 1 that should be -1, that's just wrong. — Cedron Dawg, Jan 02 '20 at 18:03
If you are not familiar with the ADC and if my example of averaging multiple results of independent samples to any precision desired doesn’t help you see it, then trust me that a binary solution would be of high value and understand that I am framing it this way to include the possibility of a more efficient algorithm. Just consider other elegant algorithms out there that operate of binary results that don’t require trinary results. There is an elegance in this with consideration to calling it (multiple times) from a higher binary algorithm. — Dan Boschen, Jan 02 '20 at 18:13
So I hope you see that I am not discounting what you are saying and that a trinary result is indeed useful- my point specifically is that it is not necessary (in many applications where the algorithm can be used) and it would be of great interest if there was a more elegant solution even though it doesn’t have that feature of a trinary result hence the structure of the scoring and the initial guidance on error less than e for any e — Dan Boschen, Jan 02 '20 at 18:31
I'm shooting for an efficient comparator algorithm to be used in any context. If you pass it equal values, or values with equal magnitudes, I don't think it should be its responsibility to pick a +/-1 for that answer. You can write a binary wrapper that makes that determination on whatever dithering criteria you got. If you are scoring so that cases of equality aren't penalized for being called +/-1, that's fine. And if someone wants to code their comparator to never return zero, that's fine too. A "don't care" is quite different than a "wrong". — Cedron Dawg, Jan 02 '20 at 18:32
Yes that’s all fine— Just If one is significantly more efficient without considering this case, that would be of great interest. If they are equally efficient and one provides this case too- that would be the winner hence the approach of having it be a bonus and not a requirement. If you have a way to make it be a lot more efficient without considering this 0 case I would be interested. — Dan Boschen, Jan 02 '20 at 18:43
I don't. Finding the equality is an inherent part of my original. Cutting out the equality check in my pure cordic only resulted in a very slight improvement in time. No penalty in correctness with the relaxed quant scoring. There are many, many applications where a comparison is really being used for an equality test. Yours just isn't one of them. I don't think an equality testing algorithm could be any slimmer than a comparator. In other words, with a correct ternary comparator used as an equality checker, the distinction of +/- is free. (I have to run errands, be back later). — Cedron Dawg, Jan 02 '20 at 19:06

Cedron Dawg · Answer 1 · 2020-01-06T14:57:58.837

Here are some of the latest results:

Algorithm         Correct    Time      Score    Penalties  Eggs
---------------   -------    ------    -------  ---------  ----
  Empty Economy    49.86     2.8104     472849    2378650    0
   Empty Deluxe     0.05     2.8311       1944  474168000  243
Starter Economy    89.75     2.9663     851367     486060    0
 Starter Deluxe    90.68     2.9764    1663118     441920  151

    Dan Beast 4    99.85     3.2622    1750076       7130  151
Cedron Unrolled   100.00     3.2721    1898616          0  243
  Cedron Deluxe   100.00     3.3255    1898616          0  243
 Cedron Revised   100.00     3.2128    1898616          0  243
   Olli Revised    99.50     3.1893    1728065      23880    0
  Olli Original    99.50     3.2464    1728065      23880    0

Cedron Multiply   100.00     3.2042    1898616          0  243
  Matt Multiply   100.00     3.3146    1898616          0  243

The timing for the contenders is too close and too noisy to show a clear favorite. Benchmarking on the target platform would be much more useful now.

The code has been updated. It is as it is.

import numpy as np
import timeit


# The passed arguments to a running horse.
#
#   ( I1, Q1 ) First  Complex Value (or Point)
#   ( I2, Q2 ) Second Complex Value (or Point)

# Its return values are
#
#   ( rc ) Comparison Result (Return Code)
#   (  l ) Locus of the Exit

# The return value can be one of these
#
#    -2  The first is for sure less than the second
#    -1  The first is likely less than the second
#     0  The two are equal for sure
#     1  The first is likely greater than the second
#     2  The first is for sure greater than the second
#
# Routines that only return {-1,1} can be called Economy
# Routines that only return {-2,0,2} can be called Deluxe
#

# How Scoring works
#
#   S Score
#   P Penalties
#   E Egg Count
#   W Wrong
#
#                    Correct      Marginal     Wrong
#       {-1,1}        S+=2         S+=1         W+=1,P+=10
#       {-2,0,2}      S+=4(E+=1)   S+=2,P+=10   W+=1,P+=1000
#
#

#====================================================================
#====================================================================
#    W A L K    O N S
#====================================================================
#====================================================================
def WalkOnOne( I1, Q1, I2, Q2 ):

        return 1, 0

#====================================================================
def WalkOnTwo( I1, Q1, I2, Q2 ):

        return 1, 0

#====================================================================
def WalkOnThree( I1, Q1, I2, Q2 ):

        return 1, 0

#====================================================================
#====================================================================
#    S T A R T E R    C O D E
#====================================================================
#====================================================================
def EmptyEconomy( I1, Q1, I2, Q2 ):

        return 1, 0

#====================================================================
def EmptyDeluxe( I1, Q1, I2, Q2 ):

        return 0, 0

#====================================================================
def StarterEconomy( I1, Q1, I2, Q2 ):

#---- Ensure the Points are in the First Quadrant WLOG

        x1 = abs( I1 )
        y1 = abs( Q1 )

        x2 = abs( I2 )
        y2 = abs( Q2 )

#---- Ensure they are in the Lower Half (First Octant) WLOG

        if y1 > x1:
           x1, y1 = y1, x1

        if y2 > x2:
           x2, y2 = y2, x2

#---- Return Results

        if x1 < x2:
           return -1, 0

        return 1, 0

#====================================================================
def StarterDeluxe( I1, Q1, I2, Q2 ):

#---- Ensure the Points are in the First Quadrant WLOG

        x1 = abs( I1 )
        y1 = abs( Q1 )

        x2 = abs( I2 )
        y2 = abs( Q2 )

#---- Ensure they are in the Lower Half (First Octant) WLOG

        if y1 > x1:
           x1, y1 = y1, x1

        if y2 > x2:
           x2, y2 = y2, x2

#---- Primary Determination

        if x1 > x2:
           if x1 + y1 >= x2 + y2:
              return 2, 0
           thePresumedResult = 1
        elif x1 < x2:
           if x1 + y1 <= x2 + y2:
              return -2, 0
           thePresumedResult = -1
        else:
           if y1 > y2:
              return 2, 1
           elif y1 < y2:
              return -2, 1
           else:
              return 0, 1

#---- Return the Presumed Result

        return thePresumedResult, 2

#====================================================================
#====================================================================
#    C E D R O N ' S
#====================================================================
#====================================================================
def CedronRevised( I1, Q1, I2, Q2 ):

#---- Ensure the Points are in the First Quadrant WLOG

        x1 = abs( I1 )
        y1 = abs( Q1 )

        x2 = abs( I2 )
        y2 = abs( Q2 )

#---- Ensure they are in the Lower Half (First Octant) WLOG

        if y1 > x1:
           x1, y1 = y1, x1

        if y2 > x2:
           x2, y2 = y2, x2

#---- Primary Determination with X Absolute Differences

        if x1 > x2:

           if x1 + y1 >= x2 + y2:
              return 2, 0

           thePresumedResult = 2
           dx = x1 - x2

        elif x1 < x2:

           if x1 + y1 <= x2 + y2:
              return -2, 0

           thePresumedResult = -2
           dx = x2 - x1

        else:

           if y1 > y2:
              return 2, 1
           elif y1 < y2:
              return -2, 1
           else:
              return 0, 1

#---- Sums and Y Absolute Differences

        sx = x1 + x2
        sy = y1 + y2

        dy = abs( y1 - y2 )

#---- Bring Factors into 1/2 to 1 Ratio Range

        while dx <  sx:
              dx += dx

              if dy <  sy:
                 dy += dy
              else:
                 sy += sy

        while dy <  sy:
              dy += dy

              if dx <  sx:
                 dx += dx
              else:
                 sx += sx

#---- Use Double Arithmetic Mean as Proxy for Geometric Mean

        cx = sx + dx
        cy = sy + dy

        cx16 = cx << 4
        cy16 = cy << 4

        if cx16 - cx > cy16:
           return thePresumedResult, 2

        if cy16 - cy > cx16:
           return -thePresumedResult, 2

#---- X Multiplication

        px = 0

        while sx > 0:
           if sx & 1:
              px += dx

           dx += dx
           sx >>= 1

#---- Y Multiplication

        py = 0

        while sy > 0:
           if sy & 1:
              py += dy

           dy += dy
           sy >>= 1

#---- Return Results

        if px > py:
           return thePresumedResult, 2

        if px < py:
           return -thePresumedResult, 2

        return 0, 2

#====================================================================
def CedronUnrolled( I1, Q1, I2, Q2 ):

#---- Ensure the Points are in the First Quadrant WLOG

        x1 = abs( I1 )
        y1 = abs( Q1 )

        x2 = abs( I2 )
        y2 = abs( Q2 )

#---- Ensure they are in the Lower Half (First Octant) WLOG

        if y1 > x1:
           x1, y1 = y1, x1

        if y2 > x2:
           x2, y2 = y2, x2

#---- Primary Determination with X Absolute Differences

        if x1 > x2:

           if x1 + y1 >= x2 + y2:
              return 2, 0

           thePresumedResult = 2
           dx = x1 - x2

        elif x1 < x2:

           if x1 + y1 <= x2 + y2:
              return -2, 0

           thePresumedResult = -2
           dx = x2 - x1

        else:

           if y1 > y2:
              return 2, 1
           elif y1 < y2:
              return -2, 1
           else:
              return 0, 1

#---- Estimate First Multiplied Magnitude

        if y1 < (x1>>1):
           if y1 < (x1>>2):
              m1 = (x1<<8) - (x1<<1) \
                 + (y1<<5) + (y1<<1)
           else:
              m1 = (x1<<8) - (x1<<4) \
                 + (y1<<6) + (y1<<5) - (y1<<2) - (y1<<1)
        else:
           if y1 < (x1>>1) + (x1>>2):
              m1 = (x1<<8) - (x1<<5) - (x1<<2) - (x1<<1) \
                 + (y1<<7) + (y1<<3) - y1
           else:
              m1 = (x1<<7) + (x1<<6) + (x1<<1) \
                 + (y1<<7) + (y1<<5) + (y1<<3)

#---- Estimate Second Multiplied Magnitude

        if y2 < (x2>>1):
           if y2 <   (x2>>2):
              m2 = ( (x2<<7) - x2 \
                   + (y2<<4) + y2 ) << 1
           else:
              m2 = ( (x2<<7) - (x2<<3) \
                   + (y2<<5) + (y2<<4) - (y2<<1) - y2 ) << 1
        else:
           if y2 <   (x2>>1) + (x2>>2):
              m2 = ( (x2<<8) - (x2<<5) - (x2<<2) - (x2<<1) \
                   + (y2<<7) + (y2<<3) - y2 )
           else:
              m2 = ( (x2<<6) + (x2<<5) + x2 \
                   + (y2<<6) + (y2<<4) + (y2<<2) ) << 1

#---- Return Results (1000 is a temp hack value!)

        if m1 > m2 + (m2>>6):
           return 2, 2

        if m2 > m1 + (m1>>6):
           return -2, 2

#---- Sums and Y Absolute Differences

        sx = x1 + x2
        sy = y1 + y2

        dy = abs( y1 - y2 )

#---- X Multiplication

        px = 0

        while dx > 0:
           if dx & 1:
              px += sx

           sx += sx
           dx >>= 1

#---- Y Multiplication

        py = 0

        while dy > 0:
           if dy & 1:
              py += sy

           sy += sy
           dy >>= 1

#---- Return Results

        if px > py:
           return thePresumedResult, 2

        if px < py:
           return -thePresumedResult, 2

        return 0, 2

#====================================================================
def CedronDeluxe( I1, Q1, I2, Q2 ):

#---- Ensure the Points are in the First Quadrant WLOG

        x1 = abs( I1 )
        y1 = abs( Q1 )

        x2 = abs( I2 )
        y2 = abs( Q2 )

#---- Ensure they are in the Lower Half (First Octant) WLOG

        if y1 > x1:
           x1, y1 = y1, x1

        if y2 > x2:
           x2, y2 = y2, x2

#---- Primary Determination with X Absolute Differences

        if x1 > x2:
           if x1 + y1 >= x2 + y2:
              return 2, 0
           dx = x1 - x2
        elif x1 < x2:
           if x1 + y1 <= x2 + y2:
              return -2, 0
           dx = x2 - x1
        else:
           if y1 > y2:
              return 2, 1
           elif y1 < y2:
              return -2, 1
           else:
              return 0, 1

#---- Employ a DanBeast

        L1 = DanBeast_2_8_Level( x1, y1 )
        L2 = DanBeast_2_8_Level( x2, y2 )

#---- Early Out Return

        if L1 > L2 + (L2>>6):
           return 2, 1

        if L2 > L1 + (L1>>6):
           return -2, 1

#---- Sums and Y Absolute Differences

        sx = x1 + x2
        sy = y1 + y2

        dy = abs( y1 - y2 )

#---- Do the Multiplications

        px = UnsignedBitMultiply( sx, dx )
        py = UnsignedBitMultiply( sy, dy )

#---- Account for Swap

        if x1 > x2:
           thePresumedResult = 2
        else:
           thePresumedResult = -2

#---- Return Results

        if px > py:
           return thePresumedResult, 2

        if px < py:
           return -thePresumedResult, 2

        return 0, 2

#====================================================================
def DanBeastFour( I1, Q1, I2, Q2 ):

#---- Ensure the Points are in the First Quadrant WLOG

        x1 = abs( I1 )
        y1 = abs( Q1 )

        x2 = abs( I2 )
        y2 = abs( Q2 )

#---- Ensure they are in the Lower Half (First Octant) WLOG

        if y1 > x1:
           x1, y1 = y1, x1

        if y2 > x2:
           x2, y2 = y2, x2

#---- Primary Determination with Quick Exit

        if x1 > x2:
           if x1 + y1 >= x2 + y2:
              return 2, 0
        elif x1 < x2:
           if x1 + y1 <= x2 + y2:
              return -2, 0
        else:
           if y1 > y2:
              return 2, 0
           elif y1 < y2:
              return -2, 0
           else:
              return 0, 0

#---- Estimate First Multiplied Magnitude

        if y1 < (x1>>1):
           if y1 < (x1>>2):
              m1 = (x1<<8) - (x1<<1) \
                 + (y1<<5) + (y1<<1)
           else:
              m1 = (x1<<8) - (x1<<4) \
                 + (y1<<6) + (y1<<5) - (y1<<2) - (y1<<1)
        else:
           if y1 < (x1>>1) + (x1>>2):
              m1 = (x1<<8) - (x1<<5) - (x1<<2) - (x1<<1) \
                 + (y1<<7) + (y1<<3) - y1
           else:
              m1 = (x1<<7) + (x1<<6) + (x1<<1) \
                 + (y1<<7) + (y1<<5) + (y1<<3)

#---- Estimate Second Multiplied Magnitude

        if y2 < (x2>>1):
           if y2 < (x2>>2):
              m2 = ( (x2<<7) - x2 \
                   + (y2<<4) + y2 ) << 1
           else:
              m2 = ( (x2<<7) - (x2<<3) \
                   + (y2<<5) + (y2<<4) - (y2<<1) - y2 ) << 1
        else:
           if y2 < (x2>>1) + (x2>>2):
              m2 = ( (x2<<8) - (x2<<5) - (x2<<2) - (x2<<1) \
                   + (y2<<7) + (y2<<3) - y2 )
           else:
              m2 = ( (x2<<6) + (x2<<5) + x2 \
                   + (y2<<6) + (y2<<4) + (y2<<2) ) << 1

#---- Return Results

        if m1 < m2:
           return -1, 2

        return 1, 2

#====================================================================
def CedronMultiply( I1, Q1, I2, Q2 ):

#---- Ensure the Points are in the First Quadrant WLOG

        x1 = abs( I1 )
        y1 = abs( Q1 )

        x2 = abs( I2 )
        y2 = abs( Q2 )

#---- Ensure they are in the Lower Half (First Octant) WLOG

        if y1 > x1:
           x1, y1 = y1, x1

        if y2 > x2:
           x2, y2 = y2, x2

#---- Primary Determination with X Absolute Differences

        if x1 > x2:

           if x1 + y1 >= x2 + y2:
              return 2, 0

           thePresumedResult = 2
           dx = x1 - x2

        elif x1 < x2:

           if x1 + y1 <= x2 + y2:
              return -2, 0

           thePresumedResult = -2
           dx = x2 - x1

        else:

           if y1 > y2:
              return 2, 1
           elif y1 < y2:
              return -2, 1
           else:
              return 0, 1

#---- Sums and Y Absolute Differences

        sx = x1 + x2
        sy = y1 + y2

        dy = abs( y1 - y2 )

#---- X Multiplication

        px = 0

        while dx > 0:
          if dx & 1:
             px += sx

          sx += sx
          dx >>= 1

#---- Y Multiplication

        py = 0

        while dy > 0:
          if dy & 1:
             py += sy

          sy += sy
          dy >>= 1

#---- Return Results

        if px > py:
           return thePresumedResult, 2

        if px < py:
           return -thePresumedResult, 2

        return 0, 2

#====================================================================
#====================================================================
#    O L L I    L I K E
#====================================================================
#====================================================================
def MyVersionOfOllis( I1, Q1, I2, Q2 ):

# Returns  ( c )
#
#    c   Comparison
#
#   -1     | (I1,Q1) |  <  | (I2,Q2) |
#    1     | (I1,Q1) |  >  | (I2,Q2) |
#
#    t   Exit Test
#
#    1     (Partial) Primary Determination
#    2     CORDIC Loop + 1
#    6     Terminating Guess

#---- Set Extent Parameter

        maxIterations = 4

#---- Ensure the Points are in the First Quadrant WLOG

        I1 = abs( I1 )
        Q1 = abs( Q1 )

        I2 = abs( I2 )
        Q2 = abs( Q2 )

#---- Ensure they are in the Lower Half (First Octant) WLOG

        if Q1 > I1:
           I1, Q1 = Q1, I1

        if Q2 > I2:
           I2, Q2 = Q2, I2

#---- (Partial) Primary Determination

        if I1 < I2 and  I1 + Q1 <= I2 + Q2:
           return -2, 1

        if I1 > I2 and  I1 + Q1 >= I2 + Q2:
           return 2, 1

#---- CORDIC Loop

        for n in range ( 1, maxIterations+1 ):
            newI1 = I1 + ( Q1 >> n )
            newQ1 = Q1 - ( I1 >> n )
            newI2 = I2 + ( Q2 >> n )
            newQ2 = Q2 - ( I2 >> n )

            I1 = newI1
            Q1 = abs( newQ1 )
            I2 = newI2
            Q2 = abs( newQ2 )

            s = n + n + 1

            if I1 <= I2 - ( I2 >> s ):
               return -1, 1 + n

            if I2 <= I1 - ( I1 >> s ):
               return 1, 1 + n

#---- Terminating Guess

        if I1 < I2:
           return -1, 7

        return 1, 7

#====================================================================
def MyRevisionOfOllis( I1, Q1, I2, Q2 ):

# Returns  ( rc, l )
#
#    c   Comparison
#
#   -1,-2   | (I1,Q1) |  <  | (I2,Q2) |
#    1, 2   | (I1,Q1) |  >  | (I2,Q2) |
#
#    t   Exit Test
#
#    1     (Partial) Primary Determination
#    2     CORDIC Loop + 1
#    6     Terminating Guess

#---- Ensure the Points are in the First Quadrant WLOG

        I1 = abs( I1 )
        Q1 = abs( Q1 )

        I2 = abs( I2 )
        Q2 = abs( Q2 )

#---- Ensure they are in the Lower Half (First Octant) WLOG

        if Q1 > I1:
           I1, Q1 = Q1, I1

        if Q2 > I2:
           I2, Q2 = Q2, I2

#---- (Partial) Primary Determination

        if I1 < I2 and  I1 + Q1 <= I2 + Q2:
           return -2, 1

        if I1 > I2 and  I1 + Q1 >= I2 + Q2:
           return 2, 1

#---- CORDIC Loop Head

        s = 3

        for n in range ( 1, 5 ):

#---- Apply the Rotation

            newI1 = I1 + ( Q1 >> n )
            newQ1 = Q1 - ( I1 >> n )

            newI2 = I2 + ( Q2 >> n )
            newQ2 = Q2 - ( I2 >> n )

#---- Attempt Comparison

            if newI1 <= newI2 - ( newI2 >> s ):
               return -1, 1 + n

            if newI2 <= newI1 - ( newI1 >> s ):
               return 1, 1 + n

            s += 2

#---- Advance the Values

            I1 = newI1
            I2 = newI2

            Q1 = abs( newQ1 )
            Q2 = abs( newQ2 )

#---- Terminating Guess

        if I1 < I2:
           return -1, 7

        return 1, 7

#====================================================================
#====================================================================
#    M A T T   L    L I K E
#====================================================================
#====================================================================
def MattMultiply( I1, Q1, I2, Q2 ):

#---- Ensure the Points are in the First Quadrant WLOG

        I1 = abs( I1 )
        Q1 = abs( Q1 )

        I2 = abs( I2 )
        Q2 = abs( Q2 )

#---- Ensure they are in the Lower Half (First Octant) WLOG

        if Q1 > I1:
           I1, Q1 = Q1, I1

        if Q2 > I2:
           I2, Q2 = Q2, I2

#---- Ensure the first value is rightmost

        swap = 0;

        if I2 > I1:
           swap = 4
           I1, I2 = I2, I1
           Q1, Q2 = Q2, Q1

#---- Primary determination

        if I1 + Q1 > I2 + Q2:
           return 2 - swap, 2
        else:
           DI = I1 - I2
           if DI < 0:
              tmp1 = -UnsignedBitMultiply( I1 + I2, -DI )
           else:
              tmp1 =  UnsignedBitMultiply( I1 + I2,  DI  )

           DQ = Q2 - Q1
           if DQ < 0:
              tmp2 = -UnsignedBitMultiply( Q1 + Q2, -DQ )
           else:
              tmp2 =  UnsignedBitMultiply( Q1 + Q2,  DQ  )

           if tmp1 == tmp2:
              return  0       , 2
           elif tmp1 > tmp2:
              return  2 - swap, 2
           else:
              return -2 + swap, 2

#====================================================================
#====================================================================
#    U T I L I T I E S
#====================================================================
#====================================================================
def UnsignedBitMultiply( a, b ):  # Smaller value second is faster.

        p = 0

        while b > 0:
           if b & 1:
              p += a

           a  += a
           b >>= 1

        return p

#====================================================================
def DanBeast_2_8_Level( x, y ):

        if         y+y   <  x:               # 2 y < x
           if     (y<<2) <  x:               # 4 y < x
              L = (x<<8)   -x-x \
                + (y<<5)   +y+y                  # y/x = 0.00 to 0.25
           else:
              L = (x<<8) - (x<<4) \
                + (y<<6) + (y<<5) - (y<<2) -y-y  # y/x = 0.25 to 0.50
        else:
            if    (y<<2) <  x+x+x:           # 4 y < 3 x
              L = (x<<8) - (x<<5) - (x<<2) -x-x \
                + (y<<7) + (y<<3) -  y           # y/x = 0.50 to 0.75
            else:
              L = (x<<7) + (x<<6)   +x+x \
                + (y<<7) + (y<<5) + (y<<3)       # y/x = 0.75 to 1.00

        return L

#====================================================================
#====================================================================
#    T E S T I N G    H A R N E S S
#====================================================================
#====================================================================
def Test( ArgLimit, ArgThreshold, ArgLane, ArgTestName ):

#---- Set the Parameters

        t = ArgThreshold

#---- Initialize the Counters

        theCount           = 0
        theWrongCount      = 0

        theEggs            = 0
        theScore           = 0
        thePenalties       = 0

#---- Start Timing

        theStartTime = timeit.default_timer()

#---- Test on a Swept Area

        for i1 in range( -ArgLimit, ArgLimit, 10 ):
          ii1 = i1 * i1
          for q1 in range( -ArgLimit, ArgLimit, 7 ):
            d1 = np.sqrt( ii1 + q1 * q1 )
            for i2 in range( -ArgLimit, ArgLimit, 11 ):
              ii2 = i2 * i2
              for q2 in range( -ArgLimit, ArgLimit, 5 ):
                d2 = np.sqrt( ii2 + q2 * q2 )

                D = d1 - d2    #  = |(I1,Q1)| - |(I2,Q2)|

                theCount += 1

#---- The Fast Side Bench Mark Lanes

                if ArgLane == 0:
                   rc, l = EmptyEconomy( i1, q1, i2, q2 )

                if ArgLane == 1:
                   rc, l = EmptyDeluxe( i1, q1, i2, q2 )

                if ArgLane == 2:
                   rc, l = StarterEconomy( i1, q1, i2, q2 )

                if ArgLane == 3:
                   rc, l = StarterDeluxe( i1, q1, i2, q2 )

#---- The Slower Pace Horses

                if ArgLane == 8:
                   rc, l = TwoMultiply( i1, q1, i2, q2 )

                if ArgLane == 9:
                   rc, l = FourMultiply( i1, q1, i2, q2 )

#---- Walk Ons

                if ArgLane == 11:
                   rc, l = WalkOnOne( i1, q1, i2, q2 )

                if ArgLane == 12:
                   rc, l = WalkOnTwo( i1, q1, i2, q2 )

                if ArgLane == 13:
                   rc, l = WalkOnThree( i1, q1, i2, q2 )

#---- Cedron D.'s Lanes

                if ArgLane == 20:
                   rc, l = CedronRevised( i1, q1, i2, q2 )

                if ArgLane == 21:
                   rc, l = CedronDeluxe( i1, q1, i2, q2 )

                if ArgLane == 22:
                   rc, l = CedronUnrolled( i1, q1, i2, q2 )

                if ArgLane == 23:
                   rc, l = DanBeastFour( i1, q1, i2, q2 )

                if ArgLane == 24:
                   rc, l = CedronMultiply( i1, q1, i2, q2 )

#---- Olli N.'s Lanes

                if ArgLane == 30:
                   rc, l = MyVersionOfOllis( i1, q1, i2, q2 )

                if ArgLane == 31:
                   rc, l = MyRevisionOfOllis( i1, q1, i2, q2 )

#---- Dan B.'s Lanes

#                if ArgLane == 41:
#                   rc, l = Dan1( i1, q1, i2, q2 )

#---- Matt L.'s Lanes

                if ArgLane == 50:
                   rc, l = MattMultiply( i1, q1, i2, q2 )

#---- Assess Scores, Penalties, and Eggs

                if rc == -2:
                   if D < -t:
                      theScore      += 4
                   elif D < 0:
                      theScore      += 2
                      thePenalties  += 10
                   else:
                      theWrongCount += 1
                      thePenalties  += 1000

                elif rc == 2:
                   if D > t:
                      theScore      += 4
                   elif D > 0:
                      theScore      += 2
                      thePenalties  += 10
                   else:
                      theWrongCount += 1
                      thePenalties  += 1000

                elif rc == -1:
                   if D < 0:
                      theScore      += 2
                   elif D <= t:
                      theScore      += 1
                   else:
                      theWrongCount += 1
                      thePenalties  += 10

                elif rc == 1:
                   if D > 0:
                      theScore      += 2
                   elif D >= -t:
                      theScore      += 1
                   else:
                      theWrongCount += 1
                      thePenalties  += 10

                elif rc == 0:
                   if abs( D ) <= t:
                      theScore      += 8
                      if D == 0:
                         theEggs    += 1
                   else:
                      theWrongCount += 1
                      thePenalties  += 1000


                else:
                   print "Disqualification -- Invalid c value:", c, "Lane", ArgLane
                   return

#---- Finish Timing

        theDuration = timeit.default_timer() - theStartTime

#---- Calculate the Results

        theCorrectCount = theCount - theWrongCount

        theCorrectPct   = 100.0 * float( theCorrectCount ) \
                                / float( theCount )

#---- Return Results

        return "%15s  %7.2f %10.4f %10d %10d %4d" % \
               ( ArgTestName, theCorrectPct, theDuration,\
                 theScore,    thePenalties,  theEggs )

#====================================================================
def Main():

#---- Set Run Time Parameters

        L = 101   # The Limit
        T = 0     # Threshold

#---- Print Headers

        print "Algorithm         Correct    Time      Score    Penalties  Eggs"
        print "---------------   -------    ------    -------  ---------  ----"

#---- The Calibrators

        print Test( L, T, 0, "Empty Economy" )
        print Test( L, T, 1, "Empty Deluxe" )
        print Test( L, T, 2, "Starter Economy" )
        print Test( L, T, 3, "Starter Deluxe" )

#---- The Walk Ons

#        print
#        print Test( L, T, 11, "Walk On One" )

#---- The Contenders

        print
        print Test( L, T, 23, "Dan Beast 4" )
        print Test( L, T, 22, "Cedron Unrolled" )
        print Test( L, T, 21, "Cedron Deluxe" )
        print Test( L, T, 20, "Cedron Revised" )
        print Test( L, T, 31, "Olli Revised" )
        print Test( L, T, 30, "Olli Original" )

#---- The Pace Setters

        print
        print Test( L, T, 24, "Cedron Multiply" )
        print Test( L, T, 50, "Matt Multiply" )


#====================================================================
Main()

Earlier, I pledged a 50 point bounty to the best horse (fastest time 99%+ correct) that wasn't one of my entries. I'm sticking with that, and right now Olli is leading. (My optimized version is DQ'd)

@DanBoschen The operations should really reflect what your final target destination does. I'm clueless about custom IC timings. I'll accept whatever scoring you want for the aesthetic metrics. I shouldn't have said "I don't agree", instead "a little puzzled by" is more accurate. — Cedron Dawg, Jan 02 '20 at 04:54
@DanBoschen No problem for me. You'll be getting 100% correct results. With the small sizes my binary search is probably less efficient than brute force anyway. I was thinking of having to do very fine resolving, as if they were floats. Olli's solution is very much in that camp. FYI, I got the unappreciated move to chat prompt under your answer. There should be "No, don't bother me about it again option." I won't quit DSP.SE over it, but it's damn annoying. — Cedron Dawg, Jan 02 '20 at 06:01
Maybe shift the inputs to the left to increase precision in your implementation of my CORDIC algo. I'll write my own implementation but that's what I'd try first. 5 % wrong results sounds way too much. — Olli Niemitalo, Jan 02 '20 at 15:14
@OlliNiemitalo I'll be posting my own version soon. That is what I am doing. I was just trying to be true to your code. — Cedron Dawg, Jan 02 '20 at 15:17
@DanBoschen It is my translation of Olli's C code. I am assuming he will improve on the Python version. I have tried (just posted) and got slightly better correctness at a little bit of a speed price. I'm curious about any and all performance and aesthetic metrics. — Cedron Dawg, Jan 02 '20 at 15:44
@OlliNiemitalo I am quite surprised at the results I got. Perhaps you can do better. — Cedron Dawg, Jan 02 '20 at 15:44
@DanBoschen By "coverage" you meant "correctness"? Unless you can find a flaw in my implementation, it would seem so. My implementation of a pure CORDIC does about the same (see the code, second routine from the top). This might put a fork in a pure CORDIC approach. — Cedron Dawg, Jan 02 '20 at 15:51
@OlliNiemitalo I have improved and updated my pure Cordic routine. The big difference is starting with a full length rotation arm rather than a half length one. Still not 100% though. — Cedron Dawg, Jan 02 '20 at 16:06
@OlliNiemitalo My implementation is faulty. I am fixing it, stay tuned. — Cedron Dawg, Jan 02 '20 at 17:04
Point-by-point in Python? Timing this style of code seems irrelevant for any serious number crunching code. Most of the time will be interpreter overhead. How well does it do with array-based numpy code, or in C? — user253751, Jan 02 '20 at 17:13
@user253751 Writing speedy code in Python does sound like a self contradiction. What we are trying to measure is the relative efficiency of algorithms. If one algorithm is faster than another in Python it also likely to be faster on any platform. We are not trying to acheive the fastest Python code, so using any kind of Python specific tweaks is irrelevant. The empty case is provided to give a measure of the testing overhead. This is why I am not advocating this over Dan's scoring system. — Cedron Dawg, Jan 02 '20 at 17:21
@CedronDawg The relative speed of various operations will be different on various platforms. In Python, I would guess that you get substantially more benefits from reducing the number of lines of code executed, than in a compiled language. — user253751, Jan 02 '20 at 17:35
@user253751 That's why this kind of testing is only indicatory. Python is a nice common platform with very readable code. I consider it a dialect of BASIC. Ten adds will take longer than 5 adds on any platform. We are testing various algorithms which execute various operations different numbers of times. Calling a C routine form Python to do heavy lifting would be entirely meaningless here. In an interpreter, combining operations into one statement can maybe make things faster, but it won't make much of an overall difference for the entire routine. — Cedron Dawg, Jan 02 '20 at 17:41
@DanBoschen I've modified the code to be Quant tolerant for all three cases. This should be helpful for folks who have your type of application in mind. Maybe Olli wants his terminating guess restored. IDK. I'm still coming in at 100% and fastest, so "What me, worry?" — Cedron Dawg, Jan 02 '20 at 18:46
@CedronDawg very nice! I'll see if I can get a version of the Sigma Delta Argument Test in Python later this week, may not be until next weekend depending on how the work week goes. — Dan Boschen, Jan 06 '20 at 05:33

Olli Niemitalo · Answer 2 · 2020-01-07T20:20:51.417

Importance sampling

This answer talks about how ranking of algorithms by their average run times can be improved by using importance sampling that emphasizes inputs that will likely result in long run times.

Figure 1. Number of iterations needed to find which of two 8-bit twos complement complex numbers, one with $(|I_1|, |Q_1|) = (95, 45)$ (red) and the other $(I_2, Q_2)$, has a larger magnitude, using a fixed-point CORDIC algorithm. The $(I_2, Q_2)$ that require many iterations have approximately the same magnitude as $(I_1, Q_1)$. Light gray: no iterations, darker: more iterations.

Let $b$ be the number of bits in each of the two's complement integer inputs $I_1, Q_1, I_2, Q_2$. Let those four input variables be independent random variables with full-range $[2^{b-1}, 2^{b-1}-1]$ discrete uniform probability distributions. For any given threshold $t$, the probability $p\left(\left|\sqrt{I_1^2+Q_1^2}-\sqrt{I_2^2+Q_2^2}\right|<t\right)$ of encountering a pair of complex numbers with an absolute magnitude difference less than $t$ tends to zero as $b\to\infty$. For given $I_1, Q_1$, in the event that $\left|\sqrt{I_1^2+Q_1^2}-\sqrt{I_2^2+Q_2^2}\right|<t$, the smaller the given threshold $t$, the longer a typical iterative algorithm would take on average to arrive at a result when averaging over the applicable $I_2, Q_2$. This means that for large $b$ the longest run times are rarely encountered. Fig. 1 illustrates what is explained in this paragraph.

Let's bunch the inputs into a single random variable $X = (I_1, Q_1, I_2, Q_2)$ for notational convenience. Let's call run time or a related approximate complexity measure cost, $f(X)$. The mean cost $\mu$ of an algorithm is the expected value of cost, $\mu = \mathbb{E}[f(X)]$. It can be estimated by the mean cost $\hat\mu$ over a size $N$ statistical sample from the input distribution:

$$\hat\mu = \frac{1}{N}\sum_{i=0}^{N-1}f(X_i)p(X_i),\quad X_i\sim p.\tag{1}$$

Each sample point $X_i$ has the same probability density as the input, as denoted by $X_i\sim p$. As stated earlier, sampling directly from the probability distribution of $X$ samples mostly those runs of the algorithm that have low cost, and only rarely a high cost is encountered. Most of the variance in the estimate $\hat\mu$ may be due to the sporadicity of the high-cost runs, requiring a very large statistical sample and making it difficult to see average cost differences between algorithms.

In this case a better sampling strategy is importance sampling. It is a technique that can give a lower-variance estimate of $\mathbb{E}[f(X)]$, by sampling according to a modified probability $q(X)$ in which important but rare events such as $\left|\sqrt{I_1^2+Q_1^2}-\sqrt{I_2^2+Q_2^2}\right|<t$ with a small $t$ have a higher probability than in the true probability distribution of $X$. In importance sampling, the expected cost $\mu = \mathbb{E}[f(X)]$ is estimated by a weighted mean with a weighting that compensates for the differences between the distributions. The weight is simply the ratio between the probability $p(X_i)$ of the sample point $X_i$ in the true distribution and the probability $q(X_i)$ of the sample point in the modified distribution. The importance sampling estimate $\hat\mu_q$ of the expected cost $\mu = \mathbb{E}[f(X)]$ is then:

$$\hat\mu_q = \frac{1}{N}\sum_{i=0}^{N-1}\frac{f(X_i)p(X_i)}{q(X_i)},\quad X_i\sim q,\tag{2}$$

where each $X_i$ is sampled from the modified distribution with probabilities given by $q$.

The optimal $q$ which minimizes the variance of $\hat\mu_q$ for a given $N$ depends on the algorithm, but we can make an educated guess and hope that the variance is at least significantly reduced. One approach would be to first devise a simple to implement sampling strategy that emphasizes the important but rare events. An approach that I've been investigating is to first choose $I_1, Q_1$ from their true distributions, to then choose the threshold $t$ randomly from a discrete distribution $[2^0, 2^1, 2^2, \ldots, 2^b]$ with equal probabilities, and to finally choose $(I_2, Q_2)$ from a uniform discrete conditional distribution with condition $\left|\sqrt{I_1^2+Q_1^2}-\sqrt{I_2^2+Q_2^2}\right|<t$. Given $I_1, Q_1$, $p(X_i|I_1,Q_1)/q(X_i|I_1,Q_1)$ could be calculated by:

$$\frac{p(X_i|I_1,Q_1)}{q(X_i|I_1,Q_1)} = \frac{\displaystyle\frac{1}{2^{2b}}}{\displaystyle\frac{1}{b + 1}\displaystyle\sum_{k=0}^b q(X_i|I_1,Q_1,k)},\tag{3}$$

with conditional probability conditional to $(I_1,Q_1,k)$:

$$q(X_i|I_1,Q_1,k)=\frac{\begin{cases}1&\text{if }\left|\sqrt{I_1^2+Q_1^2}-\sqrt{I_2^2+Q_2^2}\right|<2^k.\\ 0&\text{otherwise.}\end{cases}}{\displaystyle\sum_{I_2}\sum_{Q_2}\begin{cases}1&\text{if }\left|\sqrt{I_1^2+Q_1^2}-\sqrt{I_2^2+Q_2^2}\right|<2^k\\ 0&\text{otherwise.}\end{cases}}\tag{4}$$

Normally each sum in Eq. 4 would be from $-2^{b-1}$ to $2^{b-1}-1$. In a program implementation, sampling the conditional distribution can be done by rejection sampling from a somewhat larger distribution. The samples that do not meet the condition of the conditional distribution are rejected and picked again until they meet the condition. This approach was implemented to generate Fig. 2:

Figure 2. An importance sampling sample of $(I_2, Q_2, k)$ of size $N = 2000$. In normal use, also $(I_1, Q_1)$ would be picked randomly for each sample point, but it is fixed here for illustration purposes.

A problem with this approach is that for large $b$, it is too much work to count the total relative probability in the denominator of Eq. 4.

Instead of rejection sampling, what we could try instead is to only approximate the desired conditional distribution by a similar approximate distribution for which it is easy to measure the sums in Eq. 5. This can be made easier by including in the approximate distribution also some $X_i$ that have $p(X_i) = 0$ and therefore zero weight. Knowing that the weight is zero, it is not necessary to evaluate the denominator $q(X_i|I_1,Q_1,k)$ of the weight. We choose the following approach (also see Fig. 3):

Real-component complex numbers that are bounded by a bounding square with opposite corners $(-2^{b-1}-1/2, -2^{b-1}-1/2)$ and $(2^{b-1}+1/2, 2^{b-1}+1/2)$ round to integer-component complex numbers with each component in range $[-2^{b-1}, 2^{b-1}]$.
For a given $k$, construct two circles centered at $(0, 0)$: an inner circle with radius $\sqrt{I_1^2+Q_1^2}-2^k$ and an outer circle with radius $\sqrt{I_1^2+Q_1^2}+2^k$.
Define a set $A_k$ as the set of each complex number that is between the two circles and that has an angle at which the inner circle is not outside the bounding square.
Let $q(X_i|I_1,Q_1,k)$ be equal to the ratio of two areas: the area of the subset of complex numbers from $A_k$ that round to $(I_1,Q_1)$, and the area of $A_k$.
Given $(I_1,Q_1,k)$, pick $(I_2, Q_2)$ according to probability $q(X_i|I_1,Q_1,k)$ by choosing a random real-component complex number from a uniform distribution conditional to the number being in $A_k$, and round the number. This is not too difficult to do.
If $(I_2, Q_2)$ satisfies $-2^{b-1}\le I_2\le2^{b-1}-1$ and $-2^{b-1}\le Q_2\le2^{b-1}-1$, then calculate $q(X_i|I_1,Q_1,k)$, which is also not too difficult to do. Otherwise $p(X_i|I_1,Q_1,k) = 0$ and $q(X_i|I_1,Q_1,k)$ need not be calculated.

Figure 3. Illustration of the scheme by which $q(X_i|I_1,Q_1,k)$ is defined, by which it is sampled from, and by which it is calculated. The magnitude $10$ (usually not an integer) of the example $(I_1, Q_1) = (-8, 6)$ (red) together with the example $k = 0$ defines the radii of the inner and outer circles, 9 and 11. In the example case the inner circle intersects with the bounding square (two corners marked with crosses) at eight points. The area defined by the circles is divided into the four subsets bounded by radial lines that go through the intersection points. This is to avoid sampling too many numbers between the circles that are outside the real numbers that round to the set of possible $(I_2, Q_2)$ (gray). The union of the four subsets comprise the set $A$ from which a real-component complex number is picked. In this example the number happens to be in the blue area which rounds to the $(I_2, Q_2)$ (black) shown. The probability $q(X_i|I_1,Q_1,k)$ is equal to the ratio of the blue area to the total area of $A$.

As can be seen from the example in Fig. 3, this definition of $q(X_i|I_1,Q_1,k)$ is not exactly the same as that in Eq. 4 which had only two possible probabilities for each $(I_2, Q_2)$.

The area in $A_k$ that rounds to $(I_2, Q_2)$ has a number of possible shape types which each require a different area calculation method:

Figure 4. Given $(I_1,Q_1,k)$, the subsets of $A_k$ that round to $(I_2, Q_2)$ that is in the first octant, has these possible shape types (blue).

To be continued...

p5.js listing for Figs. 1 & 2

This p5.js program plots Fig. 1 or 2 depending on which parts of it are un/commented. The program can be run at editor.p5js.org.

function random_I2_Q2(I1, Q1, b) {
  let k = Math.floor(Math.random()*(b + 1));
  t = Math.pow(2, k);
  maximum = Math.pow(2, b-1)-1;
  minimum = -Math.pow(2, b-1);
  maxAbs = pow(2, b-1);
  let I2;
  let Q2;
  do {
    let magnitudeLower = Math.sqrt(I1*I1 + Q1*Q1)-t-0.5*sqrt(2)+1/16;
    magnitudeLower = Math.max(magnitudeLower, 0);
    let magnitudeUpper = Math.sqrt(I1*I1 + Q1*Q1)+t+0.5*sqrt(2)+1/16;
    magnitudeUpper = Math.min(magnitudeUpper, Math.sqrt((maxAbs + 0.5)*(maxAbs + 0.5)*2) + 1/16);
    let magnitude = sqrt(magnitudeLower*magnitudeLower + Math.random()*(magnitudeUpper*magnitudeUpper - magnitudeLower*magnitudeLower));
    let angle;
    if (magnitudeLower >= maxAbs) {
      let minAngle = Math.atan2(Math.sqrt(magnitudeLower*magnitudeLower - maxAbs*maxAbs), maxAbs);
      let maxAngle = Math.PI/2 - minAngle;
      angle = Math.PI/2*Math.floor(Math.random()*4) + minAngle + Math.random()*(maxAngle - minAngle);
    } else {
      angle = 2*Math.PI*Math.random();
    }
    I2 = Math.round(magnitude*Math.cos(angle));
    Q2 = Math.round(magnitude*Math.sin(angle));
  } while (I2 < minimum || I2 > maximum || Q2 < minimum || Q2 > maximum || Math.abs(Math.sqrt(I2*I2 + Q2*Q2) - Math.sqrt(I1*I1 + Q1*Q1)) >= t);
  return [I2, Q2];
}

// Return the number of iterations needed
function iterations_cordic_olli(I1, Q1, I2, Q2, maxIterations) {
  let m = 0;
  I1 = Math.abs(I1) << 8;
  Q1 = Math.abs(Q1) << 8;
  I2 = Math.abs(I2) << 8;
  Q2 = Math.abs(Q2) << 8;
  if (Q1 > I1) {
    let temp = I1;
    I1 = Q1;
    Q1 = temp;
  }
  if (Q2 > I2) {
    let temp = I2;
    I2 = Q2;
    Q2 = temp;
  }
  if (I1 < I2 && I1 + Q1 < I2 + Q2) { // Set 2 / @CedronDawg
    return 0;
  }
  if (I1 > I2 && I1 + Q1 > I2 + Q2) { // Set 2 / @CedronDawg
    return 0;
  }  
  for (let m = 1; m < maxIterations; m++) {
    let n1;
    let n2;
    if (Q1 > 0) {
      let diff = Math.clz32(Q1) - Math.clz32(I1);
      n1 = diff;
      if (I1 >= Q1 << diff) n1++;
      if (I1 >= Q1 << (diff + 1)) n1++;
    } else {
      return m;
    }
    if (Q2 > 0) {
      let diff = Math.clz32(Q2) - Math.clz32(I2);
      n2 = diff;
      if (I2 >= Q2 << diff) n2++;
      if (I2 >= Q2 << (diff + 1)) n2++;
    } else {
      return m;
    }
    let n = Math.min(n1, n2);

    let newI1 = I1 + (Q1>>n);
    let newQ1 = Q1 - (I1>>n);
    let newI2 = I2 + (Q2>>n);
    let newQ2 = Q2 - (I2>>n);
    I1 = newI1;
    Q1 = Math.abs(newQ1);
    I2 = newI2;
    Q2 = Math.abs(newQ2);
    m++;
    if (I1 < I2 && I1 + (Q1>>n) < I2 + (Q2>>n)) { // Set 2
      return n;
    }
    if (I2 < I1 && I2 + (Q2>>n) < I1 + (Q1>>n)) { // Set 2
      return n;
    }
  }
  return maxIterations;
}

function setup() {
  count = 0;
  let b = 8;
  let I1 = 95;
  let Q1 = 45;
  let stride = 4;
  let labelStride = 8;
  let leftMargin = 30;
  let rightMargin = 20;
  let bottomMargin = 20;
  let topMargin = 30;
  let maxInt = Math.pow(2, b-1);
  let canvasWidth = leftMargin+maxInt*stride+rightMargin;
  let canvasHeight = topMargin+maxInt*stride+bottomMargin;
  createCanvas(canvasWidth, canvasHeight);
  background(255);
  textAlign(RIGHT, CENTER);
  for (let Q = 0; Q <= maxInt; Q += labelStride) {
    text(str(Q), leftMargin-stride, canvasHeight-bottomMargin-Q*stride);
    line(leftMargin, canvasHeight-bottomMargin-Q*stride, canvasWidth-rightMargin, canvasHeight-bottomMargin-Q*stride);
  }
  textAlign(CENTER, TOP);
  for (let I = 0; I <= maxInt; I += labelStride) {
    text(str(I), leftMargin + I*stride, canvasHeight-bottomMargin+stride);
    line(leftMargin+I*stride, topMargin, leftMargin+I*stride, canvasHeight-bottomMargin);
  }

  /* // Fig. 1
  for (let Q = 0; Q <= maxInt; Q++) {
    for (let I = 0; I <= maxInt; I++) {
      strokeWeight(stride-1);
      stroke(255-32*(1+iterations_cordic_olli(I1, Q1, I, Q, 15)));
      point(leftMargin + I*stride, canvasHeight-bottomMargin-Q*stride);
    }
  }  */

  // Fig. 2
  let N = 2000;
  for (let i = 0; i < N; i++) {
    let I2;
    let Q2;
    [I2, Q2] = random_I2_Q2(I1, Q1, b);
    strokeWeight(stride-1);
    I2 = Math.abs(I2);
    Q2 = Math.abs(Q2);
    point(leftMargin + I2*stride, canvasHeight-bottomMargin-Q2*stride);
  }

  strokeWeight(stride+1);
  stroke(255,0,0);
  point(leftMargin + I1*stride, canvasHeight-bottomMargin-Q1*stride);

  strokeWeight(0);
  textSize(16);
  textAlign(RIGHT, CENTER);
  text('|Q₂|', leftMargin-stride, topMargin+labelStride*stride/2)
  textAlign(CENTER, CENTER);
  text('|I₂|', canvasWidth-rightMargin/2, canvasHeight-bottomMargin-labelStride*stride/2);
  textAlign(LEFT, CENTER);
  strokeWeight(5);
  stroke(255);
  text('(|I₁|, |Q₁|)', leftMargin + I1*stride + stride, canvasHeight-bottomMargin-Q1*stride)
}

Fig 1 is very useful and interesting. Nice job. – Cedron Dawg Jan 04 '20 at 15:26 — Cedron Dawg, Jan 04 '20 at 15:26
I wish I could upvote it again! – Dan Boschen Jan 08 '20 at 02:39 — Dan Boschen, Jan 08 '20 at 02:39

Dan Boschen · Answer 3 · 2020-01-16T04:36:20.937

Suggested Scoring

The respondents need not re-write their algorithms to be specific to the implementation below, the equivalent implementation that would result in the best score will be interpreted from their given approach.

Profile Test: (25 points to whoever gets the fastest profile) Each algorithm will be implemented in Ipython using only the equivalent of standard Boolean operations, binary shifts, branches, and compares on bounded binary signed integers, and profiled using %%timeit%% under test with a large set of uniformly randomly selected point pairs within different precision size b.

Operational Score (A score will be used considering the following aspects):

Total processing steps- Average Software (25 points for lowest per byte (B) cost metric on average) each below is a real operation. The total processing steps is the average given a uniform probability distribution of possible input. "Software": appropriate for an implementation in a low-cost microcontroller with no dedicated multipliers available. B represents the number of Bytes in the operation, for example, to add two 16 bit words would have cost = 8.

(Understood that this is very platform dependent, the attempt is to be representative of the average cost for a software-centric implementation).

Additions, Shifts, Xor, Count-leading-zeros (cost: $2B$)
Complex rotation = swap IQ change sign Q (cost $1B$)
branches: (cost $1B$) (example: An 'if' would be a compare and a branch when true)
and, or, etc, comparisons <, >, =, increment and decrement by 1 (cost: $0.5B$)
Multipliers for baseline (cost: $(8B)^2$)
Buffers: integer variable assignments (cost = $3B$)
Buffers: Boolean variable assignments (cost = 3$0.125B$)

Total processing steps- Average Firmware (25 points for lowest per bit (b) cost metric on average) each below is a real operation. The total processing steps is the average given a uniform probability distribution of input samples. "Firmware": appropriate for implementation in a low-cost FPGA with no dedicated multipliers available.

Additions (cost: $2b$)
Complex rotation = swap IQ change sign Q (cost $1b$)
shifts, and, or, xor etc, comparisons <, >, = (cost: $0.5b$)
Count-leading-zeros (cost $1b$)
Multipliers for baseline (cost: $3b^2$)
Buffers, assignments (cost: $3b$)

Total processing steps peak (5 points to lowest processing steps under worst case condition for that algorithm in a fixed 8 bit precision case)

Loose Equivalence Resolution:(+5 points)

Tight Equivalence Resolution (+5 points) Either binary or ternary

Peak buffer size required while solving (10 points for lowest buffer size, and 10 point penalty for every $2^n$ increase in buffer size over closest competitive solution or $8b$ whichever is larger). "Buffer" refers to data storage required to hold operands and small look-up tables if they exist. The "penalty" is to avoid a simple look-up table solution with $b^4$ addresses containing a <, > or = result for that combination. [buffers have been incorporated into the cost scores to ensure that any solution is more efficient than this trivial solution; so maybe delete this ?].

No, This is more to cover iterative approaches where copy in place buffers may be needed (like the FFT) and discourage over-use of look-up tables. Nothing to do with the code describing the operations but placeholders for data that may be needed while determining the solution or the use of look-up tables to provide arithmetic solutions. (I really don't want look-up tables but didn't want to totally eliminate out-right) — Dan Boschen, Jan 02 '20 at 05:02
So the biggest contributors/factors consistent with what I was looking for is how it would profile in Python (where I would code it to the most simplest form the algorithm could be done in -so no need for you to re-write as I would put it in simplest form), and even more the total processing steps required (hence the biggest score component). I updated the buffer size penalty so that it essentially allows a buffer up to nearly 16b without penalty and 10 points to whoever uses less. — Dan Boschen, Jan 02 '20 at 05:08
Yeah $2^n$ didn't seem right. Very good, I'll still change my X += X to X <<= 1. A C++ optimizer would pick the best. I don't think Python does that. I like to write code that an optimizer can't improve, but pipelines and stuff like that are beyond my consideration these days. — Cedron Dawg, Jan 02 '20 at 05:11
x+=x: 76.2 ns ± 2.19 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) — Dan Boschen, Jan 02 '20 at 05:17
x<<=1: 148 ns ± 5.45 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) — Dan Boschen, Jan 02 '20 at 05:21
Thanks, so I guess I'm keeping my x+=x and you are tweaking your scoring? — Cedron Dawg, Jan 02 '20 at 05:24
Yes it will be tweaking until it converges on the final scoring appraoch I will use (hence the purpose of the post). — Dan Boschen, Jan 02 '20 at 05:45
@OlliNiemitalo I would agree with that. I will add that to the list as well as decrement — Dan Boschen, Jan 13 '20 at 18:15
@OlliNiemitalo what about a rotation (swap I, Q and change sign)? 1 point? Cheaper than an add but more expensive than a decrement? — Dan Boschen, Jan 14 '20 at 01:58
And see my recent edit for buffers, making them slightly more expensive than adding--promoting algorithmic solutions versus storing. — Dan Boschen, Jan 14 '20 at 14:15

Best Approach to Rank Complex Magnitude Comparision Problem

3 Answers3

Importance sampling

p5.js listing for Figs. 1 & 2

Linked