Strategy / Method for Implementation of the Fastest 1D Linear Convolution / Correlation

Question

I am after the method and its implementation of the fastest convolution of 1d signals:

$$ y \left[ n \right] = \left( x \ast h \right) \left[ n \right] $$

Where $ \ast $ is the linear convolution operator.

Let's assume that we have infinite storage or memory but we have finite time. What strategy or algorithm can we use for a faster discrete time convolution operation?

Could someone compare the known methods for implementing convolution: Direct Convolution, Frequency Domain Convolution / FFT Convolution and Overlap and Save / Overlap and Add methods? Is there something even faster?

If I had an algorithm for faster convolution, I'd publish it in a journal (before posting it here). This is a problem that has been worked on for decades; I'd wager that you won't get many answers. Let us know if you make progress, though :) — MBaz, Oct 21 '18 at 23:14
Please consider also in your comparison of optimized approaches the well understood relationship using fft's: $conv = ifft(fft(a)fft(b))$. Also note that cross correlation is also solved with a similar fft relationship: $xcorr = ifft(fft(a)conj(fft(b))$, where "conj" is the complex conjugate. — Dan Boschen, Oct 22 '18 at 00:23
@DanBoschen Indeed, AFAIK for non-short sequences that approach is the fastest known. Since we don't know that current FFT algorithms are the fastest possible, the implication is that we don't know if there are faster algorithms for convolution either :) — MBaz, Oct 22 '18 at 00:31
@MBaz Yes indeed. That's why I am rooting for Shkodrani and look forward to his FCATW implementation! — Dan Boschen, Oct 22 '18 at 00:42
look at http://people.ece.umn.edu/users/parhi/SLIDES/chap8.pdf — , Oct 22 '18 at 01:49

Royi · Accepted Answer · 2022-03-19T20:06:21.110

I compared 3 implementations for Linear Convolution of 1D signals:

Direct - Using MATLAB's conv() function.
Overlap and Save - Implemented in MATLAB with tuned loop to prevent allocation and optimal choice of the DFT window.
Frequency Domain - Using MATLAB' fft() and proper padding to implement Linear Convolution using Circular Convolution.

For various lengths of the input signal and the kernel I got this result:

I only compared cases the signal is not shorter than the kernel hence the upper triangle (Dark Blue) is invalid. For the lower triangle I coded the fastest method as following:

Bright Blue - Direct.
Green - Overlap and Save.
Yellow - Frequency Method.

So, there is nor practical reason to use Overlap and Save.
For kernels with length up to ~400 samples it is better to use Direct Method.
For longer kernels it is better to use Frequency Domain.

Now, the actual number of samples as the border between Frequency Domain and Direct will be different for different CPU's and Memory Bandwidth (One can run the script on his computer to see).
Yet as a guideline I'd say:

If you can hold all the data in Memory - Don't use Overlap and Save.
Unless your kernel is longer few hundreds of samples use direct.
For kernels with more than few hundreds samples use Frequency Domain.

The full MATLAB code is available on my StackExchange Signal Processing Q52760 GitHub Repository (Look at the SignalProcessing\Q52760 folder).

The MATLAB code allocates a lot of memory so it might be faster even as MEX with direct call to FFTW. — David, Sep 11 '20 at 06:02
Few points for addition: 1. Use partitioned OLS rather than full-length OLS, it not only saves memory, but also has a small latency (while the direct convolution has a minimum of one-sample delay). 2. The FFT of (partitioned) kernel can be pre-calculated if the kernel is time-invariant. 3. Non-uniform partitioned OLS can be much faster and have less latency than uniform partitioned OLS. 4. The symmetry property of FFT for real sequences should be used to further speed up the calculation. — ZR Han, Mar 21 '22 at 02:08

Strategy / Method for Implementation of the Fastest 1D Linear Convolution / Correlation

1 Answers1

Linked