4

I am measuring a room's acoustic impulse response by playing a log sine-sweep through a speaker from 20hz to 24 khz, and then recording it using a microphone.

The sweep is 10 seconds long, followed by 4 seconds of silence. There is some fading at the end of the sweep.

In order to compute the room's impulse response, I am taking

IFFT(FFT(Recording)/FFT(Sweep))

And I am taking the real part of this.

I am getting these strange peaks at around sample 625000 (13.02 seconds in). They always happen at around the same time:

enter image description here

Here it is zoomed in: enter image description here

I am wondering if anyone can help me explain why I am getting these for all recordings at around the same time.

Mason Wang
  • 93
  • 4
  • 2
    Looks like time domain aliasing at first glance, which would occur when using the FFT to determine the transfer function. – Dan Boschen May 23 '23 at 00:21
  • Plot separately: Recording, Sweep, FFT(Recording), FFT(Sweep), IFFT(1 / FFT(Sweep)). Either "Recording" should have a spike on the same time, or FFT(Sweep) should have values dangerously close to zero. That last IFFT is what you're filtering with, also worth a look. Use RFFT for more compact plots and faster compute if all inputs are real-valued. Also helps to share the data (Google Drive, ufile.io, Dropbox...) – OverLordGoldDragon May 23 '23 at 01:15
  • 1
    What I would do, with your sweep, is taper the amplitude a little on both ends. It wouldn't have to be a traditional window, but just some simple tapering so that, at the beginning and at the end, where your sweep is appended to silence, you don't have some nasty edge effect. Then when you send this to the FFT, make the FFT twice as big and zero pad both the driving function and the output with silence. – robert bristow-johnson May 23 '23 at 04:00
  • 1
    Another thing, as long as your driving function is full of frequency content (no dead zones in the spectrum), it doesn't have to be a linear or log sweep. It could be a Shepard tone. – robert bristow-johnson May 23 '23 at 04:02
  • 2
    Repeated sweeps at odd location are often signs of some moderate non-linear distortion and/or some sort of a time variance. Have you run some LTI tests on your system ? – Hilmar May 23 '23 at 08:04
  • Are you padding appropriately before taking the FFT of your signals? This is a prerequisite for the correct calculation of the Impulse Response. Otherwise you'll get some aliasing (circular convolution) in the time domain and this could possibly be the artefact you see there (like Dan Boschen stated in their comment). Furthermore, it would be beneficial to share data and code so that people can reproduce your work and search for the issues (like OverLordGoldDragon stated in their comment). – ZaellixA May 23 '23 at 08:37
  • 1
    What the hell, I'm certain I edited my comment to say "zero pad input and filter"... so: try ifft(fft(pad(recording)) * fft(pad(ifft(1 / fft(sweep))))), if no improvement then no boundary effects / "time aliasing" – OverLordGoldDragon May 23 '23 at 16:05
  • 1
    @OverLordGoldDragon please correct me if I am wrong but I believe the following is equivalent to yours: ifft(fft(pad(recording)) * (1/fft(pad(sweep))). What I means is that I believe you can pad the initial sweep before inversion and get the same results (i.e. pad then invert in the frequency domain). Could you provide some insight on that? – ZaellixA May 23 '23 at 16:24
  • @ZaellixA Careful with assuming commutativity from symbols. Maybe padding sweep has other uses, but not in preventing boundary effects. – OverLordGoldDragon May 23 '23 at 18:38
  • @OverLordGoldDragon I have been using this method for quite some time now and I've always been doing it like that (I mean I padded the sweep, taken its FFT, and then inverting it). Do you think this is not correct? I'm gonna test it for sure but do you have an intuitive explanation why the padding should be done after the inversion? Isn't the padded version supposed to be the same (information-wise) to the unpadded (which means that the inversion could be done to the padded sweep)? – ZaellixA May 23 '23 at 19:26
  • 1
    @DanBoschen Time aliasing cannot cause peaks. Convolution is a locally stable operator. That was my intuition, and I spent some time and proved it. If you or anyone's interested, I'm up for answering a question about it. (If by "when using FFT" you refer to numeric instability, that's fair but should be stated) – OverLordGoldDragon May 23 '23 at 20:56
  • @ZaellixA It's in fact very different. But I like when people talk information. I'm up for answering a question if you wish to open it, otherwise let's just say it's a sort of nonlinearity that we cannot commute through. Also see my comment on peaks. – OverLordGoldDragon May 23 '23 at 20:57
  • @ZaellixA So, I don't do deconvolution so I didn't think this through. What I said is if we treat ifft(1/fft(sweep)) as a filter in convolution - but that's not what's happening. We seek to undo a convolution by sweep. Then, it's all about the forward transform - was it "full", "same", "valid"? Certainly not circular. Whatever it was, we first replicate it with FFT convolution, and then it becomes a simple division. I think for most cases, 1 / fft(pad(sweep)) is correct, and my version isn't. Sorry for the confusion – OverLordGoldDragon May 28 '23 at 08:42
  • 1
    @OverLordGoldDragon that's fine but I did get confused for a while. To be honest, I didn't have time to test it but since this is clarified I won't have to :D. Thanks for the update though. – ZaellixA May 28 '23 at 17:54

1 Answers1

6

I would agree on Hilmar's comment, it's probably harmonics generated by nonlinearities of the system in combination with how the logarithmic frequency sweep response is deconvolved into an impulse response.

Your sweep has instantaneous frequency $f$ at time $t$, following:

$$f(t) = \exp(0.7090076835\,t/\text{s})\times20\text{ Hz}.$$

This is the logarithmic sweep (a linear sweep in a logarithmic frequency scale) that satisfies $f(0\text{ s}) = 20\text{ Hz}$ and $f(10\text{ s}) = 24\text{ kHz}$.

We can solve from this the time $t$ at which frequency $f$ appears:

$$\Rightarrow\quad t(f) = 1.41042195\ln(f/\text{Hz})\text{ s} - 4.225246556\text{ s}.$$

The frequency domain division in your deconvolution formula "IFFT(FFT(Recording)/FFT(Sweep))" shifts in time any frequency $f$ appearing at time $t(f)$ in the sweep, by $-t(f)$, back to time 0 in the impulse. However an $n$th harmonic of any frequency $f$ will be shifted by $-t(nf)$ from time $t(f)$, to time:

$$t_n(f) = t(f) - t(nf).$$

Due to the frequency sweep being logarithmic, this turns out to be independent of $f$:

$$\Rightarrow\quad t_n = t_n(f) = -1.410421949\ln(n)\text{ s}.$$

We can tabulate some values:

$$\begin{align} t_1 &=& 0,\\ t_2 &=& -0.9776299980\text{ s},\\ t_3 &=& -1.549506886\text{ s},\\ t_4 &=& -1.955259996\text{ s},\\ t_5 &=& -2.269986558\text{ s}.\end{align}$$

Add to those any delay due to delays in the system and that's where you'd see in the deconvolution result the peak corresponding to each harmonic. Multiplication or division in the discrete frequency domain does circular convolution or deconvolution. I don't know for sure what size discrete Fourier transform (DFT) and inverse DFT you used, but for a vector spanning 14 seconds you'd see a 2nd harmonic peak at $14\text{ s} + (-0.9776299980\text{ s}) = 13.02237000\text{ s}$ or a bit later due to any delays in the system. That seems to agree with your finding of "13.02 seconds", although I would expect your number to be closer to 13.03 s if you have the mic at least 1 m away from the speaker. If you look at a spectrogram of your sweep response, the harmonics should be visible there above the fundamental sweep.

More generally, for a rising sweep going from frequency $f_0$ at time 0 to frequency $f_1$ at time $T$:

$$f(t) = \exp\left(\frac{\ln\left(f_1/f_0\right)}{T}\times t\right)\times f_0,$$ $$\Rightarrow\quad t(f) = \frac{T\ln(f/f_0)}{\ln(f_1/f_0)},$$

the impulse response of an $n$th harmonic will be time-shifted by deconvolution to: $$t_n = -\frac{T\ln(n)}{\ln(f_1/f_0)}.$$

Olli Niemitalo
  • 13,491
  • 1
  • 33
  • 61
  • 1
    Olli that's a great explanation. Assuming appropriate zero-padding took place, the harmonic distortion products would have to be present before the causal part of the impulse response though. So do you suggest that time aliasing brought those products to the end of the IR? – ZaellixA May 23 '23 at 15:08
  • 1
    Yeah apologies, this is what I meant. I am not sure circular (de)convolution and time aliasing are all that different. Circularly convolving a signal sequence/vector through the Frequency Domain can and will result in wrapping - i.e. aliasing - in the Time Domain. This is why I am asking for more info. To my mind and according to Farina's paper, the harmonic products should reside in the anti-causal part of the Impulse Response. – ZaellixA May 23 '23 at 17:56
  • 2
    @ZaellixA Yes, we mean the same thing, and your understanding sounds correct to me. – Olli Niemitalo May 23 '23 at 18:35
  • 1
    That this agrees with data within four significant figures is good. Suspiciously good. In fact to me it's self-invalidating until explained - does OP work in a silicon factory? – OverLordGoldDragon May 23 '23 at 21:31
  • Olli solid. Like brickwall filter. – robert bristow-johnson May 24 '23 at 04:48
  • yep, answer "saved" obviously – Jdip May 24 '23 at 05:57
  • Hmm, if it were 10 hours long, it'd become 8 significant figures, but I don't find that meaningful. Likewise, if we suppose tens of seconds can be taken for granted, we're left with 3 sig figs - much better. But let's say 3.5 - still reasonable, but still seems fairly clean to me, though I'm certainly not experienced with audio. Is such precision reasonable with a clean recording in this context? – OverLordGoldDragon May 26 '23 at 00:33
  • 2
    @OverLordGoldDragon If all digital sources of delay are correctly compensated for, the biggest source of variation is probably from room temperature which affects the speed of sound. A 1 m distance results in a delay difference of about 0.00006 s between 20 deg C and 30 deg C temperatures. Clock drift can increase the variation by a further 0.0000002 s or so, taking into account the type of test signal and that the playback and record sample clocks are typically synchronized. It's not meaningful to look at significant figures in this context. – Olli Niemitalo May 26 '23 at 17:41
  • I'll have to think a little but that's good to know, thanks. – OverLordGoldDragon May 28 '23 at 08:56