15

I have recorded 2 signals from an oscope. They look like this: enter image description here

I want to measure the time delay between them in Matlab. Each signal has 2000 samples with a sampling frequency of 2001000.5.

The data is in a csv file. This is what I have so far.
I erased the time data out of the csv file so that only the voltage levels are in the csv file.

x1 = csvread('C://scope1.csv');
x2 = csvread('C://scope2.csv');  
cc = xcorr(x1,x2);
plot(cc);  

This gives this result: enter image description here

From what I've read I need to take the cross correlation of these signals and this should give me a peak relating to the time delay. However when I take the cross correlation of these signals I get a peak at 2000 which I know is not correct. What should I do to these signals before I cross correlate them? Just looking for some direction.

EDIT: after removing the DC offset this is the result I am now getting:
enter image description here

Is there a way to clean this up to get a more defined time delay?

EDIT 2: Here are the files:
http://dl.dropbox.com/u/10147354/scope1col.csv
http://dl.dropbox.com/u/10147354/scope2col.csv

Nick Sinas
  • 253
  • 1
  • 2
  • 8
  • How, exactly, are you doing the cross-correlation? In answer to your direct question, you shouldn't need to do anything to your signals before cross-correlation, though in some instances filtering first helps to get rid of noise that can distort the results. – Jim Clay Apr 06 '12 at 18:28
  • 1
    Please post the code you have used and more importantly a plot of the cross-correlation signal. Some tools/libraries put the (lag = 0) score in the middle of the graph ; I don't remember if Matlab does that. – pichenettes Apr 06 '12 at 18:48
  • @pichenettes: updated post – Nick Sinas Apr 06 '12 at 18:56
  • @JimClay: updated post – Nick Sinas Apr 06 '12 at 18:56
  • @NickS. If your signals are perfectly aligned, you will get a peak in the middle of your cc plot. So peak at 2000 means no delay. Now let us say you have a delay of 10 samples, which means signal2 is 10 samples off from signal1. This will just move your peak in the cc from 2000 to 2010 (or 1990). So your time delay corresponds to your actual peak location, MINUS 2000. – Spacey Apr 06 '12 at 20:04
  • @NickS. That is as good as it gets. For what it's worth, your last picture is what I would expect from your signal plots. The problem is that you are oversampling a digital signal, so even though you have quite a few samples, you really only have about 16 data points in your first signal and about 12 in your second signal. You really need more of those digital points. The cross-correlation indicates that the signals look periodic and, looking at them, they do look periodic. – Jim Clay Apr 06 '12 at 20:23
  • Could you explain what you mean by "you really only have about 16 data points in your first signal and about 12 in your second signal" – Nick Sinas Apr 06 '12 at 20:42
  • Sure. All the "zero" samples (or 0.8 samples, depending on how you look at it) at the beginning of each signal are non-entities. They don't really affect the cross-correlation results at all. The rest of the signal looks very digital, with some noise. You can see that the signals tend to be either at 1.2 or 0.6, with transitions in between those two states. The digital states appear to be a little under 100 samples wide- call it around 80 samples. Thus, you are capturing about (very approximate) 16 digital states in the first and 12 in the second. – Jim Clay Apr 06 '12 at 21:02
  • Like I said, you've got about 80 samples per digital state. Since it isn't changing much just one sample per digital state would capture most of the information. The other 79 aren't doing much for you. If you think the signals are correlated the way to find it is to capture a lot more of the digital states. – Jim Clay Apr 06 '12 at 21:04
  • @NickS. I have created an answer that you might find usefull. Please see. – Spacey Apr 12 '12 at 15:45
  • @Mohammad, thank you. I have responded. This looks like my best option so far. – Nick Sinas Apr 14 '12 at 16:25
  • @NickS and @ Mohammad I'm new to this topic but my work seems to be very similar what you guys have done. so could you please send me the Matlab code. It will be very helpful to me. Thanking you in advance Naveen – navi May 03 '12 at 08:31
  • @naveenbsrinivasa If you make a question about your work here, that would be good as well. – Spacey May 03 '12 at 16:17

5 Answers5

14

@NickS

Since it is far from certain that the second signal in the plots is in fact a solely delayed version of the first, other methods besides the classical cross-correlation have to be attempted. This is because the cross-correlation (CC) is merely a maximum likelihood estimator if your signal(s) are delayed versions of each other. In this case, they are clearly not, to say nothing about the non-stationarity of them either.

In this case, I believe what may work is a time-estimation of the significant energy of the signals. Granted, 'significant' can or cant be somewhat subjective, but I believe that by looking at your signals from a statistical point of view, we will be able to quantify 'significant' and go from there.

To this end, I did the following:

STEP 1: Compute the signal envelopes:

This step is simple, as the absolute value of output of the Hilbert-Transform of each of your signals is computed. There are other methods to compute envelopes, but this is pretty straight forward. This method essentially computes the analytical form of your signal, in other words, the phasor representation. When you take the absolute value, you are destroying phase and only after energy.

Furthermore since we are pursuing a time-delay-estimate of the energy of your signals, this approach is warranted.

enter image description here

STEP 2: De-noise with edge-preserving non-linear Medial Filters:

This is an important step. The objective here is to smooth out your energy envelopes, but without destruction or smoothing out your edges and fast rise times. There is actually an entire field devoted to this, but for our purposes here, we can simply use an easy to implement non-linear Medial filter. (Median Filtering). This is a powerful technique because unlike mean filtering, medial filtering will not null out your edges, but at the same time 'smooth' out your signal without significant degradation of the important edges, since at no time is any arithmetic being performed on your signal (provided the window length is odd). For our case here, I selected a medial filter of window size 25 samples:

enter image description here

STEP 3: Remove Time: Construct Gaussian Kernel Density Estimation Functions:

What would occur if you looked at the above plot sideways instead of the normal way? Mathematically speaking, that means, what would you get if you projected every sample of our denoised signals onto the y-amplitude-axis? In doing this we will manage to remove time so to speak, and be able to study the signal statistics solely.

Intuitively what pops out of the figure above? While noise energy is low, it has the advantage that it is more 'popular'. In contrast, while the signal envelope that has energy is more energetic than the noise, it is fragmented across thresholds. What if we considered the 'popularity' as a measure of energy? This is what we will do with (my crude) implementation of a Kernel Density Function, (KDE), with a Gaussian Kernel.

To do this, every sample is taken and a gaussian function constructed using its value as a mean, and a pre-set bandwidth (variance) selected a-priori. Setting the variance of your gaussian is an important parameter, but you can set it based on noise statistics based on your application and typical signals. (I only have your 2 files to go off on). If we then construct the KDE Estimation, we get the following plot:

enter image description here

You can think of the KDE as a continuous form of a histogram so to speak, and the variance as your bin-width. However it has the advantage of guaranteeing a smooth PDF that we can then perform first and second derivitave calculus on. Now that we have the Gaussian KDEs, we can see where the noise samples peak in popularity. Remember that the x-axis here represents the projections of our data onto the amplitude space. Thus, we can see which thresholds the noise is the most 'energetic' in, and those tell us which thresholds to avoid.

In the second plot, the first-derivative of the Gaussian KDEs is taken, and we pick the abscissa of the first sample after first-derivative after the peak of the mixture of Gaussians to attain a certain value close to zero. (Or first zero-crossing). We can use this method and be 'safe' because our KDE was constructed of smooth Gaussians of reasonable bandwidth, and the first derivative of this smooth and noise-less function was taken. (Typically first-derivatives can be problematic in anything but high SNR signals since they magnify noise).

The black lines show then at what thresholds we would be wise to 'segment' the image at, such that we avoid the entire noise floor. If we then apply to our original signals, we attain the following plots, with the black lines indicating the start of energy of our signals:

enter image description here

This thus yields a $\delta{t} = 241$ samples.

I hope this helped.

Spacey
  • 9,817
  • 8
  • 43
  • 79
  • wow. Thank you so much. These are all new techniques for me that I will start researching. Is there any way I could take a look at the matlab code you used? – Nick Sinas Apr 14 '12 at 03:12
  • So I have steps #1 and #2 done in Matlab, and my results match yours, but I am having issues with step #3. What functions did you use? – Nick Sinas Apr 14 '12 at 15:40
  • @NickS. Ask,.. and you shall receive, shoot me an email and I can send you the code I used. – Spacey Apr 14 '12 at 17:48
  • @Mohammed Could you please post your code to estimate the time delay. I have sent you an email in regard of this matter so please help –  Jun 20 '13 at 09:15
6

There are a few problems doing this with autocorrelation

  1. Huge DC offset (fixed already)
  2. Time window: Matlab's xcorr() has the annoying convention to essentially "zero pad" the signal at both ends as you slide the time lag. I.e. the data window is a function of time lag. This will create a triangular shape for any stationary signal (including sine waves). Better choices are to choose a correlation window so that window size plus max time lag fit into your total data window, or to normalize the cross correlation by the number of non-padded samples.
  3. The two signals do not look particularly correlated to me. The shape is somewhat similar but the specific spacing of peaks and dips is quite different, so I doubt that even a proper auto-correlation would yield a lot of insight here.

A much simpler approach would be to use a threshold detector to find the starting points and simply use the difference between these points as the delay.

Hilmar
  • 9,472
  • 28
  • 35
4

As pichenettes indicated, in this case a peak at the middle of the output indicates 0 lag. The peak's offset from the middle point is your time lag.

EDIT: It concerns me that the correlation is so nearly a perfect triangle. That indicates to me that the cross-correlation is doing no power normalization. That gives an unfair bias to smaller lags over larger lags. I would modify your xcorr call to "cc = xcorr(x1,x2,'unbiased');".

That is not, mind you, a perfect solution because the large-lag results are now more unstable than the low-lag results because they are based on less data. A large peak at the extremities can be bogus for the same reason that you can get 100% heads and no tails on just a few coin tosses, while it is extremely unlikely to happen on many tosses.

Jim Clay
  • 12,101
  • 31
  • 55
  • Meaning that the signals are not delayed? – Nick Sinas Apr 06 '12 at 19:12
  • I'm not sure- where is the peak at? I can see that it is near the middle, but it is not clear that it is actually at the middle. Also, there is a power normalization issue that I will address in an edit of my answer. – Jim Clay Apr 06 '12 at 19:15
  • 'unbiased' parameter definitely makes it look better. more of what I would expect. I will keep looking into this. Thanks. – Nick Sinas Apr 06 '12 at 19:50
  • @JimClay Perhaps Nick S, is correlating the envelopes of his signals and not the actual signals, (Nick is this true?). This would yield (roughly) this triangular shape I imagine. – Spacey Apr 06 '12 at 19:56
  • 2
    @NickS. Mohammad's comment made me look and realize that you have a huge DC offset that is messing up your results. Subtract the mean from both of your signals and then run xcorr on them. I would try it without the "unbiased" option first. – Jim Clay Apr 06 '12 at 20:01
  • @Mohammad Not the envelope, but you did have the right idea that there was something else going on. – Jim Clay Apr 06 '12 at 20:05
  • @Jim Clay Ah yes. Correlating the signals with their means intact would give a triangular cross-corr as we see, as would I imagine demeaning but correlating the envelopes. – Spacey Apr 06 '12 at 20:09
  • @JimClay I have updated the post – Nick Sinas Apr 06 '12 at 20:17
  • @NickS. Can you save those two files off to a dropbox so that we can access them? – Spacey Apr 06 '12 at 20:21
  • @Mohammad file links posted – Nick Sinas Apr 06 '12 at 20:42
  • @NickS. Do you believe that x2(928) is a delayed version of x1(692)? I do not know what you are measuring, but if this is the case then we can frame the problem in a different way, but I need to know that first. – Spacey Apr 06 '12 at 21:42
  • @Mohammad These are 2 microphones hearing the same sound but delayed, so the signal is the same but the data is different. – Nick Sinas Apr 07 '12 at 01:49
  • @NickS. Ok, and what is the channel here that you are dealing with? Air? Water? I am asking to determine the physics of what is hapening. – Spacey Apr 07 '12 at 03:19
4

As the others have pointed out, and it seems you have realized based on your last edit to the question, it doesn't appear that cross-correlation is going to give you a good estimate of time delay for the datasets shown. Correlation measures similarity in shape between two time series by sliding one across the other for a range of time lags and computing an inner product between the two series at each lag. The result will have a large magnitude when the two series are qualitatively similar, or they are "correlated" with one another. This is similar to how an inner product of two vectors is largest when the two vectors are pointed in the same direction.

The problem with the data you showed is that (at least for the snippets we can see) there does not seem to be much similarity in shape. There is no delay that you can apply to one of the signals to make it look like the other, which is exactly what you're doing by calculating their cross-correlation.

There are instances where cross-correlation is useful, however. Say that your second signal was really a time-shifted version of the original, even with some additional noise added:

a = csvread('scope1col.csv');
a = a - mean(a);               % to remove DC offset
b = a(200:end) + sqrt(0.05)*randn(1801,1);
figure; subplot(211); plot(a); subplot(212); plot(b)

enter image description here

It's now not immediately clear that the two signals are related by a time delay. However, if we take the cross-correlation, we get:

[c,lags] = xcorr(a,b);
igure; plot(lags,c); grid on; xlabel('Lag'); ylabel('Cross-correlation');

enter image description here

which shows a peak at the correct lag of 200 samples. Correlation can be a useful tool for determining time delay, when applied to datasets that contain the right type of similarity.

Jason R
  • 24,595
  • 2
  • 67
  • 74
  • Any ideas for what else I could do? Maybe another technique other than cross correlation or maybe some type of filter? Thanks. – Nick Sinas Apr 08 '12 at 06:16
  • @NickS. I have also looked at it and they are not delayed copies of each other. That being said, do you want to estimate the delay of energy? I think that would make more sense in this case, VS delay of the signals. If you tell us more about the underlying channel/experiment you are running we can tell you more about possible paths. – Spacey Apr 09 '12 at 16:02
  • @Mohammad Thanks. The underlying channel is steel. Any idea of how to estimate the delay of energy? – Nick Sinas Apr 09 '12 at 23:42
  • @Mohammad do you think the distortion of the signals could be some type of reverberation that could be cleaned up with filtering? – Nick Sinas Apr 09 '12 at 23:44
  • @NickS. There might be some reverb-cleaning tricks out there (I am not aware of how those would be accomplished), but I have cobbled together something simple that will be an energy estimator if you want to take a look. – Spacey Apr 12 '12 at 16:25
0

Based on Muhammad's suggestion, I tried to make a Matlab script. However, I am not able to deduce if he constructs a Gaussian distribution based on the variances and then takes a KDE estimation or he performs a KDE estimation with Gaussian assumption.

Also it is difficult to infer how he translates the KDE offset time onto the time domain. Here is my attempt to it. Any user who is interested in using the script is free to and update the improved version if possible.

%% Initialising data

Ws1 = data1;
Ws2 = data2;
mWs1 = nanmean(Ws1);
mWs2 = nanmean(Ws2);
sdWs1 = nanstd(Ws1);
sdWs2 = nanstd(Ws2);

%% Computing the signal envelopes
Ws1d = Ws1 - mWs1;
Ws2d = Ws2 - mWs2;
h1 = abs(hilbert(Ws1d));
h2 = abs(hilbert(Ws2d));
figure();
subplot(211)
plot([Ws1d, h1])
subplot(212)
plot([Ws2d, h2])

%% Denoise the signal with edge preserving nonlinear medial filtering
w = 25;
mf1 = medfilt1(h1, w);
mf2 = medfilt1(h2, w);
figure();
subplot(211)
plot(mf1)
subplot(212)
plot(mf2)

<%% Remove time: construct the gaussian kernel density estimation functions>
% Using the kde from Matlab central directly on the filtered data
data1 = mf1;
[bw1, den1, xmesh1, cdf1] = kde(data1, 2^14);
der1 = diff(den1);
data2 = mf2;
[bw2, den2, xmesh2, cdf2] = kde(data2, 2^14);
der2 = diff(den2);
figure();
plot([der1, der2]);
legend('Sig1', 'Sig2')

% the other method as explained in Muhammad's post
for i = 1:length(mf1)
gf1(:,i) = mf1(i) + sdWs1*randn(1000,1);
gf2(:,i) = mf2(i) + sdWs2*randn(1000,1);
end
[bwM1, denM1, xmeshM1, cdfM1] = kde(gf1(:,1), 2.^11);
dd1 = diff(denM1);
[bwM2, denM2, xmeshM2, cdfM2] = kde(gf2(:,1), 2.^11);
dd2 = diff(denM2);
figure();
plot([dd1, dd2]);
legend('Sig1', 'Sig2')
AshG
  • 1
  • 1