Phase Difference Calculation in Frequency Domain

Question

I am currently working on an application in which I have a Linear acoustic array, the audio of which is analyzed using FFT. Now I also want to perform phase difference calculation and have also learnt that it is possible to do so in the frequency domain as well by simply calculating the phase using atan function on the peak bin and comparing that with the phase of peak bin in the adjacent channel.

Now the issue is that due to noise some times the peak bin on adjacent channels differ by one or two bins, what correction if any needs to applied when comparing phases of different bins in the adjacent channels and is there any issue with this approach?

Here is an example code and image snapshot of issue I am facing.

clear all;
Nfft=256;
Fs  = 300e3;%KHZ
L = Nfft/Fs;
t = 0:1/Fs :L-1/Fs;
snr=1;
frequency=100.2e3;% input tone frequency 100.2 KHZ
phase= 45;% input phase
phase=deg2rad(phase);
s_1 = cos(2pifrequencyt);% Microphone 1
s_2 = cos(2pifrequencyt+phase);% Microphone 2
s1=awgn(s_1,snr);
s2=awgn(s_2,snr);
sfft_1 = fft(s1,Nfft);
sfft_2 = fft(s2,Nfft);
sfft_1 = sfft_1(1:Nfft/2);
sfft_2 = sfft_2(1:Nfft/2);
peak_ind_1 = find(abs(sfft_1) == max(abs(sfft_1)));
peak_ind_2 = find(abs(sfft_2) == max(abs(sfft_2)));
phase1 =atan2(imag(sfft_1(peak_ind_1)),real(sfft_1(peak_ind_1)));
phase2 =atan2(imag(sfft_2(peak_ind_2)),real(sfft_2(peak_ind_2)));
phase1=rad2deg(phase1);
phase2=rad2deg(phase2);
phase_difference = wrapTo180(phase2-phase1)

Now sometime the peak locations differ by 1 or 2 like:

And I get extremely wrong values of phase difference, whereas when both indices are same I get near the actual value results. How to compensate thanks.

Matlab code modified as per Dan Boschen's answer

NOTE THIS CODE IS TO A PREVIOUS AND INCORRECT ANSWER WHICH WAS CORRECTED -- THIS IS NOT YET UPDATED PER DAN'S ANSWER AS NOW POSTED

clear all;
Nfft=256;
Fs  = 300e3;%KHZ
L = Nfft/Fs;
t = 0:1/Fs :L-1/Fs;
snr=1;
frequency=100.2e3;% input tone frequency 100.2 KHZ
phase= 45;% input phase
phase=deg2rad(phase);
s_1 = cos(2pifrequencyt);% Microphone 1
s_2 = cos(2pifrequencyt+phase);% Microphone 2
s1=awgn(s_1,snr);
s2=awgn(s_2,snr);
sfft_1 = fft(s1,Nfft);
sfft_2 = fft(s2,Nfft);
sfft_1 = sfft_1(1:Nfft/2);
sfft_2 = sfft_2(1:Nfft/2);
peak_ind_1 = find(abs(sfft_1) == max(abs(sfft_1)));
peak_ind_2 = find(abs(sfft_2) == max(abs(sfft_2)));
phase1 =atan2(imag(sfft_1(peak_ind_1)),real(sfft_1(peak_ind_1)));
phase2 =atan2(imag(sfft_2(peak_ind_2)),real(sfft_2(peak_ind_2)));
% ϕΔ=1.8*cos^-1(pact/pmax) here modelled as ratio of highest two peaks
a1=maxk(abs(sfft_1),2);
a2=maxk(abs(sfft_2),2);
ph_adj_1=1.8acos(a1(2)/a1(1));
ph_adj_2=1.8acos(a2(2)/a2(1));
phase1=wrapTo180( rad2deg(phase1+ph_adj_1));
phase2=wrapTo180( rad2deg(phase2+ph_adj_2));
phase_difference_adjusted=wrapTo180(phase2-phase1)

However when I run this code multiple times sometimes I get near the actual value answer but sometimes totally random answers as well. I think I have made a mistake in understanding the answer.

A little more detail would help here: I assume this is an array for microphones (not speakers)? Are you trying to locate a sound source or do something else ? If yes, what do you know about the sound source and it's signal ? (Known signal, concurrent sources, spectral or temporal properties, etc.). What's the source and properties of the noise ? — Hilmar, Jan 20 '23 at 08:49
@Hilmar Yes it is an array of microphones and the idea is to locate the direction of sound using interferometery, for now just considering a single source which will provide a same frequency content. Noise is modelled as Additive White Gaussian Noise (AWGN). — Emerald_Waves, Jan 20 '23 at 09:43
@MarcusMüller, I'm sorry I don't understand your question, the noise does not have a peak bin, the peak bin is that which is caused from the sound source, the noise I assume causes the difference of signal peak locations in the different mircrophones — Emerald_Waves, Jan 20 '23 at 09:57
@Emerald_Waves ah! Ok, yeah. So, you're back to the broadband detection problem :) — Marcus Müller, Jan 20 '23 at 10:00
@MarcusMüller I guess so though I don't really know what you mean :) ,broadband as in the peak frequency can be any frequency in the nyquist band then yes that is what I am assuming — Emerald_Waves, Jan 20 '23 at 10:16
I am assuming you have two (or more) microphones spatially apart and with that you would like to determine the time delay of the same signal received at each microphone, is that correct? — Dan Boschen, Jan 20 '23 at 12:49
To estimate the direction of the source only the relative phase between the two microphones matter. If you have a tone as the source, then the frequency domain is better to estimate the phase. But if your source is a broad-band signal, you might need to do time domain correlation and work on its peak. — learner, Jan 20 '23 at 13:35
@learner correlation can be performed in the frequency domain and then apply some of the well known frequency weighting filters such as SCoT or PhaT (as well as variations of those). In many cases, these provide better results than the cross correlation-function. In fact PhaT (Phase Transform) does exactly what the OP asks for. Uses only the phase information to calculate the cross-correlation function and from that localise the peak and calculate the inter-element delay (you have to know the array characteristics of course). Some perform extra steps such as parabolic interpolation (cont.) — ZaellixA, Jan 20 '23 at 15:56
(cont.ed) to get a "better" estimate of the "true" peak and "refine" the delay value. — ZaellixA, Jan 20 '23 at 15:57
@DanBoschen Yes they are spatially apart and I want to measure the phase difference as I have gathered that its more accurate, the issue is not so much that the peaks are broad but that some times when calculating the phase difference I notice that the peak bins differ by one or two indices so I just wanted to know how to compensate or adjust for that — Emerald_Waves, Jan 20 '23 at 16:13
@ZaellixA Yes I believe you have described the application as I intend to use but are these transforms necessary, I had thought that simple phase comparison would have been sufficient? — Emerald_Waves, Jan 20 '23 at 16:15
I think that as @learner pointed out, you should clarify if we can assume the source as a tone or not? — malik12, Jan 20 '23 at 16:41
Using two points on an FFT would have poor noise rejection, I would recommend a least-squares solution such as this: https://dsp.stackexchange.com/questions/63141/how-determine-the-delay-in-my-signal-practically/63221#63221 Instead of using it to measure the delay in a "filter" use the two signals against each other to measure their relative delay. — Dan Boschen, Jan 20 '23 at 21:44
@DanBoschen Thanks for sharing the link, looks interesting. However for now, just so that I can close this aspect, is there any solution at all for phase difference compensation in the way that I am doing it currently so that I can accurately calculate phase difference, just out of curiosity.. — Emerald_Waves, Jan 21 '23 at 16:33
Can you update your answer to include more details about the sounding waveform you are using (what is it specifically especially with regards to frequency content), the range of delay you are expecting, and clarify if the two digitizers for each microphone are synchronized or not? Knowing this will help to simplify the answer. — Dan Boschen, Jan 21 '23 at 22:55
@DanBoschen Well for now I am just simulating the audio as pure tones and then add noise to degrade the SNR when I ran into this issue, and yes the digitizers will be synchronized. — Emerald_Waves, Jan 22 '23 at 14:22
@DanBoschen I have added example code and snaphot, hope now its clear — Emerald_Waves, Jan 24 '23 at 14:20

Dan Boschen · Answer 1 · 2023-02-21T16:28:30.307

SIMPLE 2 BIN SOLUTION FOR FREQUENCY AND PHASE - BOTTOM LINE

The following is the summary for the 2 bin solution to determine the exact frequency and phase given the largest DFT bin and the next largest adjacent bin for the case of a single tone. Details and resulting accuracy are further outlined below:

Step 1: Select the peak bin and next largest adjacent bin. Compute ratio of the magnitude of the two bins, the smaller bin in the numerator:

$|Y[k_{max}]|$ is the magnitude of the peak bin

$Y|[k_{max}+1]|$ is the magnitude of the next highest adjacent bin. here as abbreviated, the assumption is the bin to the right, but could also be $Y|[k_{max}-1]|$

$$r = \frac{Y[k_{max}+1]|}{|Y[k_{max}]|}$$

Step 2: Use the result $r$ with equation \ref{4} (further in the post and copied below) to compute the bin offset $k_\Delta$, which is the fractional frequency offset from bin center for the tone

$$k_\Delta \approx 0.63883277 - 0.6368504 e^{-1.475173r} $$

$k_\Delta$ will be a value between $0$ and $0.5$ corresponding to the maximum possible error between bins. If the next highest bin was to the right of the peak, $k_\Delta$ is to be interpreted as a positive quantity. However if the next highest bin was to the left of the peak, $k_\Delta$ as computed above should be multiplied by $-1$ to make it a negative quantity.

Step 3: Use the result $k_\Delta$ with equation \ref{2} (further in the post and copied below) to compute the phase correction due to the fractional frequency offset. The resulting $\phi$ is the correction to subtract from the phase for the peak bin to get the phase of the actual tone.

$$\phi = \pi k_\Delta \frac{N-1}{N}$$

DETAILS

Another solution is to simply zero pad the waveforms prior to taking the FFT. This will serve to interpolate more samples in the frequency domain from which the phase of the peak (or improved under noise conditions the average of the phase of the $M$ samples within the peak, where $M$ is a trade based on signal to noise of each of the DFT bins contributing to the answer). If we double the number of samples in the FFT by zero padding (using fft(x, M) where $M=2N$ and $N$ is the original length of the FFT), we will get one more sample in between each of the original samples. If we zero pad out to 5 times the original length, we will get 4 more samples in between each of the original samples, etc.

This isn't an ideal approach to compare the delay of the two signals. Alternatively consider a least squares based approach such as detailed here, which uses every sample optimally toward an estimate of the delay.

If it known that a single tone or narrow band signals with sufficient SNR will be used, delay can be estimated based on carrier phase and for that a simple phase detector approach can be used (multiply the two signals in the time domain and low pass filter). Further below I detail the development of an approach to determine the phase correction based on the DFT bin values.

Prediction of Phase Versus DFT Bin Offset For a Single Tone

The OP has asked for a mathematical relationship of the derived phase versus frequency offsets from bin center as is occurring. Assuming the SNR is high enough, the relationship between magnitude and phase for the samples in vicinity of a DFT bin for the DFT using a simple Rectangular window (no additional windowing) would be dominated by the response given by the Dirichlet Kernel with frequency in units of DFT index as:

$$D(k) = e^{-j\pi (k-k_o)(N-1)/N}\frac{\sin(\pi (k-k_o))}{sin(\pi (k-k_o)/N)} \tag{1} \label{1}$$

Given an $N$ sample DFT, with $k$ as a continuous frequency variable ranging with $k \in [0, N)$ and $k_o$ as the discrete frequency index for a specific bin with $k_o \in 0 \ldots N-1$.

From this, and given a phase offset as the fractional portion of $k_o$, given as $k_\Delta = mod(k_0,1)$ we can get the relationship between bin offset and phase as:

$$\phi = \pi k_\Delta \frac{N-1}{N} \tag{2} \label{2}$$

The above provides the frequency response both in magnitude and phase for any DFT bin $k_o$ given an actual input frequency anywhere between $[0, N)$ on the continuous frequency domain. (With $N$ corresponds to the frequency of the sampling clock). Any additional windowing used would modify this response including spreading out the magnitude of the mainlobe and reducing the sidelobes.

For example the magnitude frequency response for the first bin ($k_o=0$ and sixth bin ($k_o=5$) for a 12 point DFT is plotted below. This shows the predicted magnitude for each of these bins for any single tone (a single tone is of the form $e^{j\omega t}$, not $\cos(\omega t)$!) at any frequency within the given range. Keep in mind that a sinusoid consists of two such tones ($2\cos(\omega t) = e^{j\omega t}+ e^{-j\omega t}$), so the cross interference of the response from each tone in the case of a sinusoid can have a significant effect on the computation for the case of lower and higher frequencies, or low number of total DFT bins.

The magnitude only is shown in the plot above, but significantly there is a considerable phase change as well versus variable $k$ as we move off of bin center toward the adjacent bins (where the magnitude goes to a null). Below is a plot for the result of bin 0 alone on a complex plane, where we can see the variation of both magnitude and phase concurrently:

Note in the unwindowed DFT without zero padding, the magnitude shown above for a single tone is going to the origin within the spacing of one DFT bin. The plot below provides further insight into the linear phase trajectory of the Dirichlet Kernel between bins and validates the formula given above for $D(k)$: The blue stem plot in magnitude and blue dots for phase are the actual results for a 12 point DFT with a single tone at $k=5.1$ as $y = e^{j2\pi(5.1)n/N}$ for $n = 0 \ldots 11$. Notably, the phase from bin center to the next bin is on a slope of $\pi N/(N-1)$, so goes nearly, but not quite $\pi$ radians in the spacing of one bin. With that, if we are able to predict (or estimate) the offset from bin center, we can then determine the phase correction.

Estimating $k_\Delta$

Under reasonable SNR conditions and for a single complex tone the offset in terms of a fraction of a bin can be estimated from the ratio of the two largest adjacent peaks, and then from that the phase correction in the linear range extending approximately from $-\pi/2$ to $+\pi/2$ can be determined. (A real tone can also be used with an insignificant error is not near DC or Nyquist. The real tone consists of two complex tones whose responses will overlap and result in an error more dominant for real tones close to DC or Nyquist. I haven't tested this case but suspect the result would approach that for $N=2$, which shown below also has a very low error.)

The ratio is chosen to be the smaller of the two peaks over the larger. Consider when we are on exact bin center; either adjacent sample will be at zero, so the ratio metric is better as the smaller value divided by the larger to avoid the divide by zero, with a ratio then going from 0 to 1 and mapping to $k_\Delta$. For example in the plot above, the next highest peak is at bin 6, and this magnitude would be divided by the magnitude of the peak at bin 5, and from that ratio (as detailed below) we can determine $k_\Delta$.

Below shows an example of such an approximator based on a simple curve fit that has been shown to provide reasonable accuracy for most applications. The vertical axis shows the actual fractional bin offset $k_\Delta$ from 0 to 0.5, and the horizontal axis is the inverse ratio of the magnitude of the peak bin to the magnitude of the next highest adjacent bin.

The exact solution is derived from equation \ref{1} as:

$$\frac{|Y[k_{max}+1]|}{|Y[k_{max}]|} = |D(k_\Delta-1)|/|D(k_\Delta)| \label{3} \tag{3}$$

Where

$|Y[k_{max}]|$ is the magnitude of the peak bin

$Y|[k_{max}+1]|$ is the magnitude of the next highest adjacent bin (here assuming the bin to the right resulting in a positive $k_\Delta$, or if the bin to the left as $Y[k_{max}-1]$ resulting in a negative $k_\Delta$.

Solving \ref{3} for $k_\Delta$ would be quite complicated, requiring an inverse Dirichlet Kernel. I curve fit the graph shown, which was specific to $N=12$, directly resulting in the following exponential fit, which was also included in the above plot:

$$k_\Delta \approx 0.63883277 - 0.6368504 e^{-1.475173r} \tag{4} \label{4}$$

Where

$r = \frac{|Y[k_{max}+1]|}{|Y[k_{max}]|}$

Thus the two bin solution is to determine $k_\Delta$ using equation \ref{4} and with that determine the phase correction using \ref{2}. To note, $\ref{4}$ is a fit specific to $N=12$, where $N$ is the size of the FFT. A new and more accurate fit could be determined for each $N$, however I tabulated the errors for all $N$ values up to $N=10,000$ using fit given in \ref{4} showing the error quickly going be $1E-5$ and converging to $3E-7$ for $N>6000$.

Zooming in on the range of $N<100$ shows a maximum error for $N=2$ to be $4E-2$.

Verification

I tested the algorithm using the waveform example plotted above, which was generated with a single tone with zero phase at $k=5.1$. The largest bin is at $k=5$ and the next largest is at $k=6$. The magnitude of each were as follows:

$|Y[5]| = 0.98374$ $|Y[6]| = 0.11031$

The angle of $Y[5]$ which is the error from true 0 phase is $0.28798$ radians.

The ratio $|Y[6]|/|Y[5]| = 0.11213$

The predicted $k_\Delta$ using \ref{4} is within 1.2% of the actual value of 0.1:

$k_\Delta =0.0988$

Using this estimate for $k_\Delta$ in equation {2} results in the predicted phase correction which is also within 1.2% of the true phase offset of $0.28798$ radians:

$\phi = \pi k_\Delta(N-1)/N = 0.28446$ radians.

The correction therefore is to subtract $0.28446$ radians from the angle for bin 5, resulting in a phase of:

$0.28798 - 0.28446 = 0.0035$ radians

Note that an rms angle of $0.0035$ is consistent with an SNR of $20Log_{10}(0.0035) = 49$ dB, suggesting the estimator as derived here to usable for a wide range of applications within that SNR limit. As noted earlier, the exponential curve fit given in equation \ref{4} was fitted for $N=12$. Other fits would be similar with simply a modification of the coefficients for equation $\ref{4}$ approaching an asymptotic limit as $N$ grows larger, but I haven't detailed that further.

Even if we do get more samples by zero padding, wouldnt there be a chance of the same issue occurring although I guess its probability would be much lower. Also what you say about averaging. Is it possible I take the phase difference if two to three samples at matching indices where the peak lies and average the results would that help? also is there no mathematical relation which can be used to adjust phase when comparing in this case of index mismatch given if I do not want to increase FFT size — Emerald_Waves, Jan 24 '23 at 16:50
@Emerald_Waves I see your point with your comment about having good results when the bins match and terrible otherwise, which reduced my concern that the errors were lost in noise. I pursued a correction but haven't tested it yet; if you do test this with your code, please let me know. — Dan Boschen, Jan 25 '23 at 05:22
thank you for editing your answer accordingly. I will test and let you know but let me absorb it first :) — Emerald_Waves, Jan 25 '23 at 06:47
Thanks, I think the process is accurate but not confident yet that I didn't make a sign or other math error so want to check it. This other post that explains how the DFT is a bank of filters (with the frequency response I show above) may help provide further intuition if that wasn't already clear: https://dsp.stackexchange.com/questions/82998/can-anyone-explain-how-dft-works-as-a-filter-bank/83002#83002 — Dan Boschen, Jan 25 '23 at 13:03
I have added the code of what I understood from the answer in the question and as I stated there sometimes result is correct but sometimes it is wrong, I think I have made an interpretation error, kindly see if you get the time. Thanks — Emerald_Waves, Jan 25 '23 at 14:22
@Emerald_Waves Yes, I will need to step through it in detail to validate each step...it might be a couple days before I get to that. What you can possibly do in the meantime is confirm the theory using a noise free test tone at a fractional bin offset. There may be an additional linear phase that I didn't capture that also needs to be created (such as a $z^1$ or $z^{-1}$). — Dan Boschen, Jan 25 '23 at 14:35
I did test the above modified code in noise free condition by simply putting the "SNR" to a very high value (~100) and observed that the phase difference measurement was now offset by a nearly fixed amount of about 10 degrees from the actual value at the above settings, the offset also changed slightly as the input phase difference was varied as well. Your thoughts? — Emerald_Waves, Jan 28 '23 at 06:43
Kindly let me know if you got the time to review my code as per your solution or if you can point out what other test I should do to verify this? — Emerald_Waves, Feb 01 '23 at 09:19
@Emerald_Waves I am interested but won’t have time until this weekend — Dan Boschen, Feb 01 '23 at 15:22
Ok, thank you, If I find anything worth mentioning in the meanwhile, I'll update it in the question.. — Emerald_Waves, Feb 01 '23 at 16:08
Hey so I have been waiting to close this post till you have finished making any further additions to the answer, may I close this question by marking your answer as correct if there is nothing further to add although I have not gotten the results but maybe the issue is on my end and not you explanation? — Emerald_Waves, Feb 18 '23 at 06:24
@Emerald_Waves I've validated my post (and corrected it in the process). There was an error in my phase for D(x) (now shown as eq 1) and the resulting phase estimate. I also cleaned up the solution for determining the phase from the two highest bins (peak bin and next adjacent) — Dan Boschen, Feb 20 '23 at 04:57
@Emerald_Waves Let me know if this now works for you. I bottom lined the steps at the top of my post and included a validation examples at the bottom. — Dan Boschen, Feb 21 '23 at 14:30
Thank you for the detailed edits to the answer (and question as well). Please give me some time, i'll digest this, implement and update the code in the question as well — Emerald_Waves, Feb 21 '23 at 15:54

Phase Difference Calculation in Frequency Domain

1 Answers1