0

I have been experimenting with cross-correlation function to verify the presence of speech in a recorded file wrt a source file. I tried the following in Matlab:

source = '/home/skrowten-hermit/Programs/male_8k.wav'
silenced = '/home/skrowten-hermit/Programs/male_8k_silence.wav'
halved = '/home/skrowten-hermit/Programs/male_8k_half.wav'
attenuated = '/home/skrowten-hermit/Programs/male_8k_attenuated.wav'
[y1, fs1] = audioread(source)
[y2, fs2] = audioread(halved)
[y3, fs3] = audioread(silenced)
[y4, fs4] = audioread(attenuated)
plot(y1)
plot(y2)
plot(y3)
plot(y4)

%% autocorrelation function wrt source

% calculate autocorrelation [Rx1, lags1] = xcorr(y1, 'coeff') tau1 = lags1/fs1

% plot the signal autocorrelation function figure(6) plot(tau1, Rx1, 'r')

%% crosscorrelation function wrt source

% calculate correlation and time axis [Rx2, lags2] = xcorr(y1, y2, 'none') tau2 = lags2/fs2 [Rx3, lags3] = xcorr(y1, y3, 'none') tau3 = lags3/fs3 [Rx4, lags4] = xcorr(y1, y4, 'none') tau4 = lags4/fs4

% plot the signal correlation function figure(6) plot(tau2, Rx2, 'r') figure(6) plot(tau3, Rx3, 'r') figure(6) plot(tau4, Rx4, 'r')

[pRx1, idx1] = max(Rx1) [pRx2, idx2] = max(Rx2) [pRx3, idx3] = max(Rx3) [pRx4, idx4] = max(Rx4)

fprintf('Peak value of Rx1 is %f at %f', pRx1, lags(idx1)) fprintf('Peak value of Rx2 is %f at %f', pRx2, lags(idx2)) fprintf('Peak value of Rx3 is %f at %f', pRx3, lags(idx3)) fprintf('Peak value of Rx4 is %f at %f', pRx4, lags(idx4))

The following are the waveforms of my input files as generated by Matlab (male_8k.wav is my source from which others are generated or recorded):

male_8k.wav male_8k_half.wav male_8k_silence.wav male_8k_attenuated.wav

The auto-correlation generates the following output:

ACF of male_8k.wav

The cross-correlation of the source with the other targets generates the following outputs:

CCF of male_8k.wav with male_8k_half.wav CCF of male_8k.wav with male_8k_silence.wav CCF of male_8k.wav with male_8k_attenuated.wav

The output of peak value and the indices (location) of the peaks are as follows:

Peak value of Rx1 is 1.000000 at 0.000000
Peak value of Rx2 is 10.634055 at 0.000000
Peak value of Rx3 is 0.000905 at -21325.000000
Peak value of Rx4 is 48.637631 at 7516.000000

Since I intend to use the above with recorded files over a network by playing the source file (male_8k.wav) at the transmitter (Tx) end and record at a reciever (Rx) in order to verify if there is some speech detected at Rx and calculate the delay (in ms), I would like to quantify them as success or failure for verification and convert the indices (i.e., the time sample) into a value in ms. I understand that the result (i.e., the peak value) could never be 1 as in ACF, but is it possible to fix a threshold for peak and convert sample number index in such a way that:

  1. I could distinguish between silence and some speech data (attenuated is fine - just need to check if data samples exist at Rx).
  2. I could determine there is a delay of d ms at Rx.

The output values of peaks reading 10.634055 for half the speech data samples and 48.637631 for attenuated speech data samples left me a bit confused. How can I do this effectively/efficiently?

skrowten_hermit
  • 215
  • 4
  • 14
  • 1
    Consider using the Wiener-Hopf equations to measure the delay which will also reveal if you are dealing with echo conditions. I detail that approach at this post: https://dsp.stackexchange.com/questions/63141/how-determine-the-delay-in-my-signal-practically – Dan Boschen Sep 18 '20 at 15:12
  • @DanBoschen This is definitely interesting, but the computed equalizer that specifies the relationship between the source and the sink is to be used to convert the source sequence to a desired sequence right? How can it measure similarity or do a simple verification? Please help me understand. – skrowten_hermit Sep 21 '20 at 08:56
  • 1
    I show that specifically here where the delay is computed from the determined channel coefficients (solving for the channel instead of the equalizer by swapping input and output): https://dsp.stackexchange.com/questions/63141/how-determine-the-delay-in-my-signal-practically – Dan Boschen Sep 21 '20 at 14:03
  • Okay. Got it. My tx is going to be y1 and rx would be y2, y3 or y4. How do I choose ntaps, given that I have let's say a sample size of 72000 in tx and rx? And once I get the coefficients, how do I get the delay? – skrowten_hermit Oct 06 '20 at 09:36

0 Answers0