Real-time delay between signals

Question

I currently have a real-time noise suppression algorithm based on two inputs from different microphones. It has passed most tests that I have thrown at it, however when the signal I am trying to retrieve has a slight delay one mic relative to the other it works poorly.

I have done preprocessing delay estimation in the input signal through cross-correlation just for testing and it works, however in the final algorithm I can't preprocess the signal as it is supposed to be real-time.

I am taking sub windows of signals and applying my algorithm, what I tried to do is apply the cross-correlation to each sub window, it takes the FFT of the sub windows does the calculation and then the IFFT just to get the delay each time. I later have to take the FFT of the sub window with a different windowing function for my filtering algorithm. This much extra processing will drastically reduce the sub window size my algorithm can keep up with in real-time.

The question is, is there a faster way to do this delay estimation? Maybe something that works with each input sample to get a better estimation each time? Something like a control loop?

Thanks in advance!

if the relative delay difference between the two signals is not rapidly varying, you can still use cross-correlation to estimate the delay difference. the result of this operation need not be updated as often as each audio sample. — robert bristow-johnson, Mar 06 '16 at 06:38
@robert That could be one option however the cross-correlation calculated with just the sub window samples seems a little unstable due to the noise in the input signals, this could lead to an extended period of time without correct delay correction, if another algorithm is possible that would probably be the chosen path to take. However that could be a good way to go around the problem in case of no other solutions. Thanks mate! — João Victor Manke, Mar 06 '16 at 19:53
@robertbristow-johnson actually I just realized that I am doing most of the calculations for cross-correlation anyway in my algorithm. Do you know if there is a way to retrieve the delay directly from the frequency domain rather than having to transform back to the time domain? — João Victor Manke, Mar 08 '16 at 13:48
i don't quite get it. the data original comes to you in the time domain. you say you're already doing most of the cross-correlation anyway. to do cross-correlation in the frequency domain you multiply the spectrum of one times the complex conjugate of the spectrum of the other. but what you look for are maximums in the time domain result. my suggestion is to fix the buffer of one signal (the reference) to a specific time and allow the other to slide one sample per sample. so you are correlating the most recent N samples of the "other" to a fixed segment of the reference. — robert bristow-johnson, Mar 08 '16 at 19:15
do that once per sample. so the non-reference signal window slides one sample for each sample of the input. so you get the result of one point of the cross-correlation per sample. after N samples you can judge which is the maximum correlation and reset the frame of the reference signal. the cost is one N-point FIR per sample. — robert bristow-johnson, Mar 08 '16 at 19:16
@robertbristow-johnson I think I didn't explain very well, all my processing is done in the frequency domain, hence I already have the frames transformed so it is a matter of taking the conjugate of one of them, multiplying and transforming back to the time domain then find the maximum. I think that would be much faster than doing a time domain correlation O(n^2) but I wanted to cut as much steps as possible, so maybe if I could get the delay directly from the frequency domain rather than having to transform back it could be faster. I think however that that is not possible. — João Victor Manke, Mar 09 '16 at 01:52
Another update, I think that cross correlation is not the way to go, I got poor results from testing with it, not caring about making it faster. That is a bad sign. I ended finding another algorithm that may help called Dynamic Time Warping, I'll have to test if it has better results. — João Victor Manke, Mar 09 '16 at 01:58
well, i dunno how to respond. i have done this multiple times with a two-microphone (spaced about 3 meters apart) stereo input recording a noisy truck driving by. applied identical DC blocking filter and a gentle high-cut (LPF) to both channels and repeatedly cross-correlated one against the other in the time domain. picked the lag with maximum cross-correlation. came up with very good results. when the truck was equidistant to the two microphones, the relative delay offset came out to be zero. — robert bristow-johnson, Mar 09 '16 at 04:10
@robertbristow-johnson I must be doing something wrong, I'll redo my code and see if I find better results, but thanks anyway you helped a lot! — João Victor Manke, Mar 09 '16 at 16:50
It ended up working! There was some weird exceptions that I wasn't handling correctly in my algorithm that would cause the transforms to be weird and then the delay calculation would just output nonsense values but now it is corrected and working properly! Thanks man! — João Victor Manke, Mar 09 '16 at 17:41