0

I have an audio signal $g(t)$ composed by the sum of my original signal $f(t)$ and a delayed copy of itself:

$$g(t)=f(t)+f(t+\varepsilon)$$

My goal is to recover the original signal $f(t)$ knowing:

  • $g(t)$ from a .wav file
  • $\varepsilon$ from a spectral view in Audacity

spectral view

So now we know it is roughly equal to $55\ ms$.

I am using this method to get $f(t)$. Here is my implementation of that algoritgm with Octave:

[g_t, sample_rate] = audioread('raw_audio.wav');
epsilon = 55 / 1000; % Delay in seconds

n = (0:length(g_t)-1); denominator = 1 + exp(-2ipin/length(g_t)epsilon)

G_fft = fft(g_t);

% Perform element-wise division F_fft = G_fft ./ denominator.';

f_t = ifft(F_fft); audiowrite('fixed_audio.wav', real(f_t), sample_rate);

But the resulting audio is very similar and keeps the same delay. What am I doing wrong?


Edit: fixed the code according to @Cris Luengo's suggestions in the comment section.

nickh
  • 103
  • 3
  • Is sample_rate equal to 1000??? And what is epsilon? Note also that ' is the complex conjugate transpose, you want to use .' instead. – Cris Luengo Jul 23 '23 at 14:05
  • 4
    Anyway, assuming you get those details right, consider a proper solution to the inverse problem rather than just division. Look up for example Wiener deconvolution. – Cris Luengo Jul 23 '23 at 14:08
  • @CrisLuengo epsilon is the time duration of the delay, as seen in $g(t)=f(t)+f(t+\varepsilon)$. I divide it by 1000 to transform the 55 milliseconds into 0.055 seconds. Thank you for suggesting the Wiener deconvolution, I will check it out. – nickh Jul 23 '23 at 14:10
  • I also assume that, the way your impulse response is defined, the computed delay must be really precise. You might need to include the delay as a parameter to estimate. – Cris Luengo Jul 23 '23 at 14:25
  • Ok, after this edit, I see you’re not using sample_rate at all. n is in units of samples, so epsilon must be in units of samples as well, not in seconds. The FFT has no motion of seconds. – Cris Luengo Jul 23 '23 at 14:28
  • Thank you, I just fixed that typo. I was calculating epsilon in two lines originally but condensed it in one to post it here. About your second comment: I could compute the IFFT for many values of epsilon and keeping the one with the least power, meaning it was the best at cancelling the other signal. – nickh Jul 23 '23 at 14:29
  • I also tried tried epsilon = sample_rate * 55 / 1000 but there was no audible difference in the result, but I will fix that now. About the Wiener deconvolution: since the SNR=1 and I have no sizable reference signal, I believe that this won't be a viable solution here. – nickh Jul 23 '23 at 14:31
  • Power could work, I’m not sure. I don’t know what the best measure is. Maybe look at the number of negative values after the inverse transform? – Cris Luengo Jul 23 '23 at 14:32
  • The delay property of the DFT is exp(-2i*pi*n/length(g_t)*epsilon), with epsilon is samples. See https://dsp.stackexchange.com/questions/509/what-effect-does-a-delay-in-the-time-domain-have-in-the-frequency-domain (you were missing the imaginary number in there!) – Cris Luengo Jul 23 '23 at 14:34
  • How can you have SNR=1? That’s a lot of noise! – Cris Luengo Jul 23 '23 at 14:35
  • Thank you again @CrisLuengo, I will adjust it! I was getting it from https://math.stackexchange.com/a/4727253/422531 Also, since my "noise" is a delayed copy of the original sound ($g(t)=f(t)+f(t+\varepsilon)$) then we are almost comparing the signal with itself, hence the SNR=1. – nickh Jul 23 '23 at 15:02
  • Using the proper exp(-2i*pi*n/length(g_t)*epsilon) I get a very loud noise: https://imgur.com/a/tMKZu2Y – nickh Jul 23 '23 at 15:09
  • Exactly. This is why you need to use Wiener or some other form of regularization. The Wiener deconvolution filter has a form where you guess a scalar for the parameter K, instead of inputting the expected power spectrum. See here for example: https://diplib.org/diplib-docs/deconvolution.html#dip-WienerDeconvolution-dip-Image-CL-dip-Image-CL-dip-Image-CL-dip-Image-CL-dip-Image-L-dip-StringSet-CL – Cris Luengo Jul 23 '23 at 15:36
  • See this example for implementation of the Wiener filter. https://dsp.stackexchange.com/questions/31318/compensating-loudspeaker-frequency-response-in-an-audio-signal This works when the original signal is known (a "sounding signal" which is used to correct the channel) and that signal occupies the spectrum that is then used generally for arbitrary signals. The more white the sounding signal is (flat spectrum) the better as it will optimize the SNR through an even energy distribution across all frequencies. – Dan Boschen Jul 25 '23 at 02:47

1 Answers1

2

denominator = 1 + exp(-2*pi*epsilon*n); Firstly you are missing the sample rate here since 'n' is in samples and not in time. Secondly, if the argument of the exp() becomes $\pi$, the whole expression becomes zero and you can't divide by it any more. Finally you are using an FFT which is an algorithm for the Discrete Fourier Transform (DFT), not the Continuous Fourier Transform.

The DFT is periodic in nature, i.e., you are implementing a circular delay and circular de-convolution, not linear ones.

Let's look at this analytically. Your discrete sequence can be written as

$$g[n] = f[n] + k\cdot f[n-M]$$

where $M$ is the delay in samples and $k$ the gain of the delayed copy. At this point we assume that the continuous delay can be rounded to the nearest sample without too much error. If that's not the case, things get a lot more complicated.

We can look at the transfer function and it's inverse

$$H[z] = 1 + k\cdot z^{-M} \\ H^{-1}[z] = \frac{1}{1 + k\cdot z^{-M} } $$

and can directly write the difference equation of the inverse operation as

$$ f[n] = g[n]-k \cdot f[n-M]$$

which is a recursive filter. As we can see from the transfer function, the filter is unstable for $k>=1$. You cannot completely remove an added copy if the gain of the copy is equal or larger than that of the original signal.

So your best shot is to do this directly in the time domain using the best possible estimates for $M$ and $k$ you can get. Eyeballing this in Audacity is not going to work you will need time accuracy down the sample. The exact gain can probably determine with a least square optimizer.

Jdip
  • 5,980
  • 3
  • 7
  • 29
Hilmar
  • 44,604
  • 1
  • 32
  • 63
  • Thank you @Hilmar. I guess since I have a signal with $k=1$ I will generate several examples varying $M$ and choose the resulting signal with the least power. Also, I think you missed an $i$ inside exp, I committed the same mistake. – nickh Jul 24 '23 at 19:14