We observe that "2D along 1D" is equivalently: first do 1D horizontally (each row independently), then sum vertically. The complete operation for an output point, except for a shift, is sum(a * b), which is a 2D product and 2D sum.
- 1D convolution for row
m does sum(a[m, :] * b[m, :]) for all shifts of b by i, b_i.
- Summing vertically for a given
i is hence summing sum(a[m, :] * b_i[m, :]) for all m.
- (2) is same as
sum(a * b_i), i.e. sum(a[:, :] * b_i[:, :]).
So, if we let hf = ifftshift(conj(fft(h, axis=1)), axes=1), and prod = fft(x, axis=1) * hf, then it's just sum(ifft(prod, axis=1), axis=0). But we observe, by linearity, we can move sum inside ifft for a great speedup. All together,
$$
\texttt{CC}_{2d1d}(x, h) =
\texttt{iFFT}_{1d}
\left(
\sum_{m=0}^{M - 1}
\left(
\texttt{FFT}_{1d}\big(x\big) \cdot
\overline{\texttt{FFT}_{1d}\big(\texttt{iFFTSHIFT}_{1d}(h)\big)}
\right)[m]
\right)
$$
where 2D indexing is $x[m, n]$, and $\texttt{op}_{1d}$ denotes 1D operation along $n$'s axis.
Thanks to @CrisLuengo and @Royi for pointers.
Example in question
Applying in code (extending the code at bottom)...
import matplotlib.pyplot as plt
from PIL import Image
# load image as greyscale
x = np.array(Image.open("cim0.png").convert("L")) / 255.
h = np.array(Image.open("cim1.png").convert("L")) / 255.
# blank regions default to `1`, undo that
x[x==1] = 0
h[h==1] = 0
# compute
out = cc2d1d(x, h)[0].real
# plot
plt.plot(out); plt.show()
the peak is near center, as expected:
Applications
I used it to identify abrupt changes in audio, by cross-correlating CWT's impulse response with non-linearly filtered version of SSQ_CWT. So one major use is 2D template matching upon underlying 1D structures. Surely there's plenty others.
(Note for those curious in the linked post) But! I by no means did this with images like in this post. An "image" involves up to three major modifications - compression, color-mapping, and clipping (vmin, vmax args in plt.imshow) - which change its numeric representation once loaded from image into array. Instead I operated on the original arrays, and it's clear from the worse results in this post.
Convolution? Vertical?
- Convolution: remove
np.conj
- Vertical:
ifftshift(, axes=1) -> ifftshift(, axes=0), and mean(axis=0) -> mean(axis=1)
- Boundary effects / "time aliasing": pad and unpad exactly same as with 1D convolutions. But note, if $h$ isn't reusable, it's faster to adjust unpad indices instead of doing
ifftshift, as shown in Royi's answer on conv2 (ignore vertical unpad).
Benchmarks (CPU)
For reusable $h$:
def cc2d1d_hf(x, hf):
return ifft((fft(x) * hf).sum(axis=0))
shapes = [(8192, 8192), (256, 262144), (262144, 256)]
for shape in shapes:
x = np.random.randn(shape)
hf = np.conj(fft(ifftshift(np.random.randn(shape), axes=1)))
%timeit cc2d1d_hf(x, hf)
3.01 s ± 122 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
4.21 s ± 138 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.3 s ± 78.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Code
import numpy as np
from numpy.fft import fft, ifft, ifftshift
def cc2d1d(x, h):
prod = fft(x) * np.conj(fft(ifftshift(h, axes=1)))
return ifft(prod.sum(axis=0))
def cc2d1d_brute(x, h):
out = np.zeros(x.shape[-1], dtype=x.dtype)
h = ifftshift(np.conj(h), axes=1)
for i in range(len(out)):
out[i] = np.sum(x * np.roll(h, i, axis=1))
return out
for M in (128, 129):
for N in (128, 129):
x = np.random.randn(M, N) + 1jnp.random.randn(M, N)
h = np.random.randn(M, N) + 1jnp.random.randn(M, N)
out0 = cc2d1d(x, h)
out1 = cc2d1d_brute(x, h)
assert np.allclose(out0, out1)
xorhbeforeffts won't work, and afterifftis slower. Where would you put thesum? – OverLordGoldDragon Apr 17 '23 at 08:42M x N1andM x N2. What I'd do isMcross correlations between theMrows of the images. The decision whether or not do it in the frequency domain should be takes like any other 1D Convolution / Cross Correlation. Once you have the results, which is, let's say, an array ofM x N3(Depends on the boundary conditions of the cross correlation you chose) do the vertical sum (On real numbers usually). It will be faster than summing complex numbers in case you chosefft(). – Royi Apr 23 '23 at 05:12fft. If wefft, thenifftcost will by far outweigh the difference between complex vs real addition. Note even ifN1orN2is small, overlap-add FFT is still an option, and as far as I know, direct convolution only wins for like 32-sized kernel. That's all fine, but should be presented for what it is. @Royi – OverLordGoldDragon Apr 24 '23 at 14:20iffts plus real summation over(M, N)is faster than 1 length Nifftplus complex summation over(M, N)? If not, then I really don't understand what you're saying. If you'll bring in the non-FFT case, that's again handling a separate use case. – OverLordGoldDragon Apr 25 '23 at 12:19