How to measure aliasing?

Question

Aliasing is bad, and we want good filters when downsampling. While what qualifies as aliasing is well-defined mathematically, and we can manually design filters with various tradeoffs, how do we actually measure the "amount of aliasing" that a filter admits or rejects? What about measuring aliasing due to directly subsampling a general signal?

The measure must be a single number, reasonably interpretable, and independent of the sampling rate (else, dependence must be justified). Ideally, it can be interpreted in terms of percentages.

It seems to me that the measure of aliasing would simply be the integral of the power spectrum of the original spectrum *before* sampling, from Nyquist to infinity (normalized by the entire power spectrum).
$$ \frac{\int\limits_{f_\mathrm{s}/2}^{\infty} \big|X(f)\big|^2 , \mathrm{d}f}{\int\limits_{0}^{\infty} \big|X(f)\big|^2 , \mathrm{d}f} $$

Here we assume that $|X(-f)|=|X(f)| \qquad \forall f \in \mathbb{R}$ . - - - — robert bristow-johnson, May 02 '23 at 17:16
While a single-number figure of merit (like Robert's) is probably a good thing in a lot of cases, you have to make assumptions about the input spectrum and the use case. I've certainly worked with systems that had very "bad" anti-aliasing in the "one number" sense, but that performed well because I made sure that any aliasing that did happen was at "don't matter" parts of the spectrum after sampling. — TimWescott, May 02 '23 at 19:25
Well, if the quantity in that expression is zero, it pretty much guarantees there is no aliasing due to sampling. (assuming no dirac delta at Nyquist.) — robert bristow-johnson, May 03 '23 at 01:17
@robertbristow-johnson That looks like a fair measure and resembles what I started with a year ago, but has shortcomings - don't want to throw out phase in the measure. I studied this topic in depth, but not exhaustively; I've not had the chance to write up everything. Since it's HNQ, maybe I'll do it today. (Also it wouldn't be Nyquist to infinity, we're working with DFT, so if you want to involve CFT that requires more definitions and computation and I don't think it's necessary) — OverLordGoldDragon, May 03 '23 at 14:45
@TimWescott "One number" is only a limitation if we require more degrees of freedom, else the failure's in the metric. Binary classification produces one number, which is a vastly more complicated input-output mapping than this context's. I can imagine additional numbers in this context, which would correspond to additional descriptors, but I think one number is quite sufficient for most cases. Problem with >1 number is difficulty in comparing or automated design. Unsure what your "don't matter" means, it'd be good to clarify, but seems a simple adjustment (zeroing) to my (and RBJ's) metric. — OverLordGoldDragon, May 03 '23 at 15:51
@pipe "Aliasing can be lossy or not" - first point of my answer. There's benefit to a question's conciseness, and not repeating Q in A in self-Q&A. And not lossy doesn't mean not bad, that depends on purpose. — OverLordGoldDragon, May 04 '23 at 07:46
@OverLordGoldDragon Self-answering doesn't turn it into your personal blog, it's still open for everyone else to answer the question as it's posted. — pipe, May 04 '23 at 09:08
@pipe Sure, and that doesn't prevent you from making noise with your framing. If a comment like yours was already made, yours would be subject to flagging per redundancy - that the idea is stated in an answer instead doesn't make it not duplicative. The question is open ended yet sufficiently focused, OP doesn't owe disclaiming counter-intentional interpretations. Still, I'd make a comment like yours, just a little differently. Anyway not a big deal. The point's good to make and I already planned to elaborate in an answer and probably in the question. — OverLordGoldDragon, May 04 '23 at 13:00
@pipe It's also a balancing act around here, crap questions get carried hard. I'm one to make your kinds of points, it'd be good if you made more of them. — OverLordGoldDragon, May 04 '23 at 13:04

OverLordGoldDragon · Answer 1 · 2023-05-03T18:05:34.880

Note: it's "Work In Progress", I intend to address some limitations, including "time aliasing". Interested hot visitors may wish to "Follow" the answer.

Motivating the metric

We build a metric by considering the following:

Energy measures information. Aliasing can be lossy or not. Either way there's a change in information, and an excellent quantification of information is frequency-domain energy. By Parseval-Plancherel's theorem, a change $\Delta E_\omega$ in frequency-domain energy corresponds to the same change $\Delta E_t$ in time-domain energy. One can experimentally verify on images that this faithfully quantifies aliasing losses.
Subsampling in time <=> Folding in Fourier. This predicts effects of subsampling on frequency.
Generality: we make an unbiased quantifier by assuming uniform input spectrum, to not favor some frequencies over others and enable fair comparison of different filters.
Best vs worst: the best case is zero aliasing, worst is maximum. Ideally, a signal can be subsampled losslessly without filtering, meaning the original sequence can be recovered perfectly with DFT upsampling; the "best" reference hence shall a signal whose spectrum is, after subsampling, full-band and not changed by subsampling (except for constant scaling). Likewise, "worst" shall be full-band before subsampling.

Let signal length be $N=256$, and subsampling factor $M=8$. We have:

Before subsampling, x_full has x8 as much energy as x_ref. After, it's x64. So the gain due to aliasing is 64/8=8, or 700%.

To show the effects of aliasing on something in-between, we add a sine to x_ref:

Building the metric

So far we have, without percent conversion, and in terms of $x$ per Parseval-Plancherel's:

$$ \texttt{alias}\{x, M\} = r_\text{after} / r_\text{before} \tag{1} \\ $$

where, and $::M$ is subsampling by $M$,

$$ \begin{align} & r_\text{before} = \|x\|^2 / \|x_\text{ref}\|^2 = \|x\|^2 / (N/M) \tag{2} \\ & r_\text{after} = \|x[::M]\|^2 / \|x_\text{ref}[::M]\|^2 = \|x[::M]\|^2 / (N / M^3)\tag{3} \end{align} $$

A problem, that we can see by considering $x=x_\text{full}$, is that for same $(N/M)$, this scales with $N$. Meaning, it differs across sampling rates. We prefer something between 0% and 100% for all. The scaling is proportional, so normalizing is easy. That, together with percent conversion, yields

$$ \texttt{alias}\{x, M\} = 100 \cdot \frac{r_\text{after} / r_\text{before} - 1}{M - 1} \tag{4} $$

Another problem, for our context, is filter-input phase alignment; the worst case is obtained with bin-by-bin alignment of input with filter, so we do frequency-domain subsampling upon absolute values. The complete closed form is shown in "Full math version" section below, along minimal code. The measure has the following properties:

$$ \begin{align} \texttt{alias}\{x_\text{full}\} &= 100\text{%} \tag{5} \\ \texttt{alias}\{x_\text{ref}\} &= 0\text{%} \tag{6} \\ \end{align} $$

The metric isn't flawless and won't handle certain edge case inputs, but it does work well with practical or otherwise full-band/"normal" signals.

Applying the metric

We compare a certain kind of moving average against the Hamming-windowed sinc used by scipy.signal.decimate(ftype='fir'); the moving average is, where $n=[0, 1, ..., N-1]$:

$$ h_M[n] = \begin{cases} 1/M, & -N/M < n \leq N/M \\ 0, & \text{otherwise} \end{cases} $$

or simply, $M$ non-zero samples, as DFT-centered as possible. A motivation is if our filtering stride equals our filter length, i.e. zero overlap. Results, with red horizontal line being the "ideal" reference:

As expected, non-overlapping arithmetic means perform awfully.

To get an idea of what the numbers mean, we compare recovery on each. Do this by filtering with each - that's xfilt; then, subsample xfilt and DFT-upsample it, to get xrecovered. Then, plot xfilt vs xrecovered for each, which will show how faithfully the intended portion of the spectrum is captured after decimating. On white Gaussian noise:

The example isn't cherry-picked, nor is it worst case. In fact it faithfully reflects the average performance on WGN, per 1,000,000 realizations upon $N, M = 256, 8$, according to relative Euclidean distance (see code). The worst case can be found by gradient descent, as I've done in designing my own lowpass filters; I'm not a fan of scipy's decimate in every context.

Full math version + minimal code

$\|\cdot\|^2 = \sum |\cdot|^2$, energy or squared L2 norm. We have

$$ \texttt{alias}\{x, M\} = 100 \cdot \frac{1}{M^2} \cdot \frac{\|x[::M]\|^2 / \|x\|^2 - 1}{M - 1} \tag{7} \\ $$

Using aforementioned and referenced (see $X_\text{sub}$) relations, we have, with $X = \texttt{DFT}\{x\}$:

$$ \texttt{alias}\{x, M\} = \frac{100}{M^4 (M - 1)} \left( \frac{ \sum_{k=0}^{N/M - 1} \left| \sum_{i=0}^{M-1}X[k + \frac{N}{M}i] \right|^2 }{\sum_{k=0}^{N - 1}|X[k]|^2 } - 1\right) \tag{8} $$

However, to account for worst case phase alignment, we instead have

$$ \texttt{alias}\{x, M\} = \\ \frac{100}{M^4 (M - 1)} \left( \frac{ \sum_{k=0}^{N/M - 1} \left| \sum_{i=0}^{M-1}(|\Re e\{X\}| + j|\Im m\{X\}|)[k + \frac{N}{M}i] \right|^2 }{\sum_{k=0}^{N - 1}|X[k]|^2 } - 1\right) \tag{9} $$

It's fairly simpler in code; if energy(x) == sum(abs(x)**2), then it's just

xf_sub = (abs(xf.real) + 1j*abs(xf.imag)).reshape(M, -1).mean(axis=0)
xf_ref_sub = xf_ref.reshape(M, -1).mean(axis=0)
r_before = energy(xf)     / energy(xf_ref)
r_after  = energy(xf_sub) / energy(x_ref_sub)
alias = 100 * (r_after / r_before - 1) / (M - 1)

Full code

Available at Github.

I just up-arrowed this because I recognize it as a good topic to nail down analytically. I'll have to comb through the math in the near future. — robert bristow-johnson, May 02 '23 at 17:06
Hay O, you just taught me a trick to overcome a shortcoming of $\LaTeX$ math markup.
$$ z = \Re e{z} + i , \Im m{z} $$

Nice idea. — robert bristow-johnson, May 02 '23 at 19:39
You've rigged it into HNQ, nice. Unsure on lay tek, though? Referring to my fancy "Re, Im"? My motivation was that the "I" looks a lot like "J" so if there's no "R" somewhere else in the equation, it can be confusing. — OverLordGoldDragon, May 03 '23 at 12:57

Dan Boschen · Answer 2 · 2023-05-04T02:07:39.417

Where it matters to explicitly account for aliasing, my approach (that for context I have often used in the area of wireless communications) is using the correlation coefficient with a reference copy of the alias-free waveform. This provides a high fidelity single metric of distortion and easily converted to a Signal to Noise Ratio directly which is the common metric of interest in wireless Comm. I detail this approach at this other related post in assessing noise from all sources (commonly referred to SINAD):

https://dsp.stackexchange.com/a/87596/21048

Another very similar metric is the Error Vector Magnitude (EVM), where the minimum root mean square error is determined between the distorted waveform and its known clean reference. EVM is reported as the rms error of the (complex) differences between the distorted waveform and known reference waveform, typically just at the samples used for demodulation. As with the correlation coefficient approach above, it is important to remove all static offsets that may occur (notably in the OP’s case it would be time, amplitude and gain offsets especially with resampling architectures that can result in fractional sample delays). Both of these approaches; the correlation coefficient (rho), and error vector magnitude (EVM), are commonly used metrics for communications waveforms and provide a singular metric for comparing distortion from any source.

The above approach would provide a distortion metric for the distortion from all sources; accounting for any deviation in the signal from the reference. If the interest was in isolating distortion from one source such as aliasing, this can be accomplished by making the “reference” the signal as it would exist if there was no aliasing.

Application-tailored metrics are good to have, but I think a bit more explicitness is due, though feel free to un-edit. For the general context, the problem of noise is separate, and is perhaps noise-dependent. Here we wish to exclusively quantify aliasing, and such a metric is very important as anything else creates inter-dependencies; once a focused metric exists, it can be made to account for other tasks, such as noise. A joint metric from get-go like yours could be better for the joint task though. — OverLordGoldDragon, May 03 '23 at 16:06