4

I need to calculate cross correlation between 2 images which I read in 2 vectors, both of them uni-dimensional.

As per my understanding I need to do the following operations:

  1. FFT(1D) on vector 1 and vector 2
  2. Multiplication between the outputs of the FFT applied on each vector
  3. IFFT(1D) of the multiplication

If I'd wanted to use 2D FFT in the first step following this method then what would the second step be?

OverLordGoldDragon
  • 8,912
  • 5
  • 23
  • 74

3 Answers3

6

Your 2nd step is wrong, it's doing circular convolution. For circular cross-correlation, it should be:

  1. Multiplication between the output of the FFT applied on the first vector and the conjugate of the output of the FFT applied on the second vector.

Aside from that, the steps are the same whether in 1D or 2D:

$$\texttt{iFFT}_{2d}\bigg(\texttt{FFT}_{2d}\big(V_1\big) \cdot \overline{\texttt{FFT}_{2d}\big(V_2\big)} \bigg)$$

You can zero-pad the inputs $V_1$ and $V_2$ to calculate non-periodic cross-correlation using circular cross-correlation.

Olli Niemitalo
  • 13,491
  • 1
  • 33
  • 61
Jdip
  • 5,980
  • 3
  • 7
  • 29
  • This is true for continuous signals, but for digital signals, conjugating after taking the FFT does not flip the signal in time. It does a modulo N flip. So I think you'd need to shift by one sample (via a linear phase in the frequency domain) first. – Gillespie Dec 29 '22 at 02:21
  • 1
    Nice \text{} alt, I steal – OverLordGoldDragon Dec 29 '22 at 12:22
  • This should be disclaimed for incompleteness, the centering is often more important than conjugation. That or added discussion of padding & unpadding. – OverLordGoldDragon May 04 '23 at 16:45
  • @OverLordGoldDragon feel free to edit :) – Jdip May 04 '23 at 16:51
4

As Jdip correctly pointed out, your implementation is doing circular convolution, not cross-correlation. But I want to add a nuance of discrete circular cross-correlation that makes his answer slightly incorrect.

Frequency Conjugation for Discrete vs. Continuous Signals

There are two key differences between cross-correlation and convolution:

  • In cross-correlation, one of the vectors is conjugated (in the time domain)
  • In convolution, one of the vectors is reversed/flipped

Thus, to perform cross-correlation via FFT-implemented circular convolution, we must pre-flip and conjugate one of the vectors: cross_correlation(x,y) = convolution(flip(conjugate(x)), y).

The conjugation property of the continuous time Fourier transform says that conjugating in the frequency domain conjugates and flips in the time domain: $\mathcal{F}^{-1} (\overline{X(\omega)}) = \overline{x(-t)}$ . So you should be able to perform cross-correlation via FFT by just conjugating one of the FFTed vectors, right? Wrong!

The reason is that for discrete signals such as images, conjugation in the Fourier domain does not equate to time reversal. Rather, it performs modulo N time reversal: $ [x_1, x_2, x_3,... x_N] => [x_1, x_N, x_{N-1}, x_{N-2},... x_2] $.

Solution

Because of this, I recommend doing the conjugation and flipping in the time domain.1 Here are the steps:

  1. Flip and conjugate $V_2$: $W_2 = \overline{\texttt{flip}(V_2)}$
  2. Take the FFT of $V_1$ and $W_2$
  3. Multiply FFTed vectors
  4. Take the IFFT of the product

Thus:
$$ \texttt{IFFT}_{2D}(\texttt{FFT}_{2D}(V_1) \cdot \texttt{FFT}_{2D}(W_2)) $$

Or:

$$ \texttt{IFFT}_{2D}(\texttt{FFT}_{2D}(V_1) \cdot \texttt{FFT}_{2D}(\overline{\texttt{flip}(V_2)})) $$

Edit

OverLordGoldDragon has argued in the comments that we shouldn't reverse/flip $V_2$ to perform cross-correlation via convolution. I offer a simple proof in MATLAB that we should:

%Create vectors and cross-correlate them
N = 8; M = 3; 
v1 = randi(9, N, 1);        %Gives a vector of random integers less than 10
v2 = v1(3:3+M-1);           %Take a small subset of v1 as our second vector
P = N + M - 1;              %Length of full convolution

y = xcorr(v1, v2); %Cross-correlation y = y(N-M+1:end); %Get rid of extra xcorr lags (zeros)

%Now, do this with convolution %NOTE: No time domain conjugation because v2 is real: v2 == conj(v2) y2 = conv(flip(v2), v1); %y2 and y are identical

%Try conjugating in the frequency domain instead of flipping %y3 and y are not the same! y3 = conv(ifft(conj(fft(v2))), v1);

Footnote

1More efficient alternative: Use a linear phase multiply in the frequency domain to circularly shift the time domain vector by 1 sample before (or after, depending on the direction of the shift) conjugating in the frequency domain.

Gillespie
  • 1,767
  • 4
  • 26
  • 1
    subtle nuance! thanks! – Jdip Dec 29 '22 at 09:24
  • Multiplication is much less compute-efficient than shifting (reallocation). – OverLordGoldDragon Dec 29 '22 at 10:39
  • 2
    While this answer is onto the right idea, I don't know that it's correct. Firstly, "modulo N reversal" is time reversal, $x_1 = x[n=0]$ so $x[n] = x[-n]$ holds. More importantly, to cross-correlate or convolve, the operating kernel (so one of the images) must be DFT-centered. flip as in x[::-1] does not achieve this, in fact it makes more of a mess. The correct steps involve ifftshift. – OverLordGoldDragon Dec 29 '22 at 10:53
  • @OverLordGoldDragon, in real-time algorithms implemented in hardware, people generally like to trade frequency phase multiplies for time domain shifts or reversals (assuming they're already in the frequency domain for other reasons). Perhaps the issue is a question of memory efficiency vs. computation. – Gillespie Dec 29 '22 at 13:47
  • @OverLordGoldDragon, also, modulo N time reversal is not time reversal. In MATLAB for example, try ifft(conj(fft(x)) and you will not get flip(x). – Gillespie Dec 29 '22 at 13:51
  • 1
    flip(x) isn't time-reversal for DFT, which is the point you're making anyway, but important to distinguish "reversal we want" vs math, as it suggests $x[n] \rightarrow x[-n]$ isn't achieved. I made your point a while back, got warm reception. – OverLordGoldDragon Dec 29 '22 at 13:59
  • I was actually just about to come comment that you're right in the sense that flip(x) is not necessarily identical with true time reversal. As you say, if the first sample in the vector corresponds to time t=0, modulo N flipping does give time reversal x[-n]. But what we want is flip(x), not x[-n]. – Gillespie Dec 29 '22 at 14:21
  • Why do we want flip(x)? If inputs are normal images, if you try to code it, you'll see it gives wrong results. -- also @-less notifications don't work if >2 ppl comment it appears. – OverLordGoldDragon Dec 30 '22 at 13:05
  • @OverLordGoldDragon because of the difference in definition between cross-correlation and convolution – Gillespie Jan 02 '23 at 14:57
  • I'm afraid you're mistaken. I invite you to demonstrate flip with code, can be a small modification of mine. – OverLordGoldDragon Jan 03 '23 at 10:52
  • @OverLordGoldDragon I've added a short MATLAB demo. – Gillespie Jan 07 '23 at 02:47
  • conv(a, b) != ifft(fft(a) .* fft(b)) – OverLordGoldDragon Jan 07 '23 at 11:39
  • Never said it did – Gillespie Jan 07 '23 at 15:02
  • What I'm saying is, you've shown it for direct convolution, not DFT convolution, which is the subject of your post. Also @-less notifications don't work with >1 responder, but I did see your reply earlier. – OverLordGoldDragon Apr 15 '23 at 13:08
  • @OverLordGoldDragon if you believe "direct" convolution can be performed with FFTs (as it can, under the right conditions), the principle applies. – Gillespie Apr 16 '23 at 00:47
  • @Gillespie you made good points in this post but I am not yet convinced that it is a problem to simply conjugate the DFT when computing the correlation? I am less familiar with image processing but I do this all the time for signal processing: I do view $x[N-1]$ as being equal to $x[-1]$ due to the circular nature of the DFT in time and frequency so consider it correctly described as “time reversal”. Are you saying that one can’t actually compute “fast correlation” as the inverse FFT of the product of the FFT and conjugate FFT? – Dan Boschen May 08 '23 at 12:07
  • 1
    For circular cross-correlation (which the beginning of the answer says you write about) it suffices to do what you call modulo N time reversal. Your approach of using flip effectively just shifts the output of circular cross-correlation to the left by one step. – Olli Niemitalo May 09 '23 at 08:55
  • @OlliNiemitalo So incorrect, yes? – OverLordGoldDragon May 09 '23 at 17:01
  • 2
    @Dan Boschen & Olli, I'm busy at work lately but I'll come back to this eventually and clarify. – Gillespie May 10 '23 at 03:09
  • I had reasons at the time but I'm no longer a fan of my second-last reply, removed. This answer is salvageable, but if nothing changes, it stands as problematic. – OverLordGoldDragon May 15 '23 at 12:11
0

Take an image:

Cross-correlating it with impulse should yield itself, and cross-correlating with itself should peak at center. Key points:

  • The operating kernel must be centered about $t=0$. For a discrete sequence $h$ of length $N$, under the FFT, this means $h[0]$, and the second-half of samples are of negative time: $h[n > N/2]$
  • Assuming inputs are normal images, this means time-centering the "template" image: note $x \star h \neq h \star x$.
  • fft2(x) == fft(fft(x, axis=0), axis=1)

cross correlation between 2 images which I read in 2 vectors, both of them uni-dimensional.

That's a problem since the time-centering step isn't straightforward to replicate on a flattened image, I'm unsure how it'd be done. This answer will assume a 2D array, so you can just reshape it into 2D and then back. Finally, you might want to look into padding and boundary effects.

Putting it together, here's cross-correlation of COVID with image-centered unit impulse, and with itself:

additionally, we move it and its flipped copy, and check that the more intense dot spots the unflipped version; note boundary effects:

$$ \texttt{iFFT}_{2d}\bigg( \texttt{FFT}_{2d}\big(x\big) \cdot \overline{\texttt{FFT}_{2d}\big(\texttt{iFFTSHIFT}_{2d}(h)\big)} \bigg) $$

Centering / performance note

Answer assumes we're after circular cross-correlation with no padding, and $x$ and $h$ are "normal" (non-DFT centered) images. With padding/unpadding, the most compute efficient route may be different.

Complete cross-correlation, with padding, is implemented here.

Meta note

This answer was subject of controversy. If readers aren't sure which answer to use, I encourage reading responses to the followup, where I've shown further demos and offered 1000 bounty to invalidate this answer, instead of trusting the votes.

Full code

import numpy as np
import matplotlib.pyplot as plt
from numpy.fft import fft, ifft, fft2, ifft2, ifftshift
from PIL import Image

def cross_correlate_2d(x, h): h = ifftshift(ifftshift(h, axes=0), axes=1) return ifft2(fft2(x) * np.conj(fft2(h))).real

load image as greyscale

x = np.array(Image.open("covid.png").convert("L")) / 255.

make kernels

h0 = np.zeros(x.shape, dtype=x.dtype) h0[h0.shape[0]//2, h0.shape[1]//2] = 1 h1 = x.copy()

compute

out0 = cross_correlate_2d(x, h0) out1 = cross_correlate_2d(x, h1)

plot

plt.imshow(out0); plt.xticks([]); plt.yticks([]); plt.show() plt.imshow(out1); plt.xticks([]); plt.yticks([]); plt.show()

second example, same imports:

x = np.array(Image.open("covid_target.png"  ).convert("L")) / 255.
h = np.array(Image.open("covid_template.png").convert("L")) / 255.

blank regions default to 1, undo that

x[x==1] = 0 h[h==1] = 0

out = cross_correlate_2d(x, h) plt.imshow(out, cmap='turbo'); plt.xticks([]); plt.yticks([]); plt.show()

OverLordGoldDragon
  • 8,912
  • 5
  • 23
  • 74
  • i do have a question, how do i know if the multiplication (fft2(x) * np.conj(fft2(h)) is done between matrices that can be multipliable (nr of rows first = nr of cols 2nd) ? – user3253067 Jan 02 '23 at 17:34
  • that is, what if my template image is smaller than the original one? – user3253067 Jan 02 '23 at 17:36
  • @user3253067 That extends the scope of the question, StackExchange isn't for extended back & forths. If you have a new question, you should post separately. – OverLordGoldDragon Jan 03 '23 at 10:34
  • will do, sorry about that – user3253067 Jan 03 '23 at 13:13
  • 1
    I'm not sure this is correct. For some values of N (even or odd, not sure which yet) conj(fft(ifftshift(h))) may work, but I don't think it does in all cases. I may try simple cases of this with 1D signals when I get a chance. – Gillespie Jan 07 '23 at 02:54