11

For STFT, we impose window of certain size onto the original signal, then we perform fft on each window. The uncertanty about frequency and time is determined by the width of the window, however, I can't understand what is the point of having overlap windows...

If we have a signal, for instance, why can't we just divide the signal into 6 trunks (non-overlapping window), and then we perform fft on each of those trunks?

Maybe let me make it more clearly in my application. I am going to mostly dealing with 60Hz power line wave, and occationally, we want to monitor the 180Hz transient effect at the power line. Since the signal will be mostly periodic, should I use window then?

OverLordGoldDragon
  • 8,912
  • 5
  • 23
  • 74
kuku
  • 273
  • 1
  • 4
  • 8
  • Naw, if it's for analysis only and not analysis-synthesis, there really isn't any convolution going on except in the Goertzel sense of the word. If you're doing fast convolution that's sorta related by not directly the same thing as the STFT although you *could* do overlap-add fast convolution using a complementary Hann window on the input. But it wouldn't be as efficient as regular-old fast convolution. – robert bristow-johnson Jun 03 '23 at 00:46
  • @robertbristow-johnson You need to stop talking smack behind people's backs. STFT is fully equivalent to bandpass convolutions (rather cross-correlations), and I proved it in code in the linked post, and it can be proven mathematically fairly easily. Not just equivalent but more accurate in time-frequency. If you're not after time-frequency and have alternate uses, that's fair, but STFT is by chief motivation time-frequency. Again, gatekeeping terminology like a politician - looks much like the beef with Jazz. The way you object more impartially is, "not necessarily convolution". – OverLordGoldDragon Jun 05 '23 at 09:57
  • //"You need to stop talking smack behind people's backs."// - - - what smack behind whose back? please explain. - - - //" STFT is fully equivalent to bandpass convolutions"// - - - as is the Goertzel algorithm. - - - //" (rather cross-correlations)"// difference between cross-correlation and convolution is time-reversal of one of them. – robert bristow-johnson Jun 05 '23 at 15:36
  • @robertbristow-johnson Arguing against others without notifying them. I also saw the periodicity jab with Hilmar, but that doesn't matter, it just adds up with this and two other @-less replies. -- Ok, I thought Goertzel is some sarcasm reference. It's sort of worse because it implies, even after seeing my linked post, you don't get the role of convolutions, or how time-frequency works. STFT can be evaluated at any frequency, FFT is just for speed and convenient inversion. -- If you think conv vs CC has just to do with compute here, you're strongly mistaken. – OverLordGoldDragon Jun 09 '23 at 11:50

3 Answers3

12
  1. We always want to apply some kind of a window function in order to minimize the effect of leakage. This makes rectangular window (lack of any windowing) case never used, this is why:

  2. Any tapering function used is almost always decreasing to zero at boundaries. enter image description here

This is why we are losing some data. In order to retrieve that somehow you will usually do 50% of overlap when processing. This will retrieve whatever was in between.

enter image description here

  1. Another thing is that if you apply the Inverse STFT, you should use complementary window, that is summing to 1, i.e. Hanning with 50%.

Finalising - yes, you should pretty much always use windowing in your applications.

For more comprehensive informations please refer to great white-paper:

Heinzel G. - Spectrum and spectral density estimation by the DFT, including a comprehensive list of window functions and some new flat-top windows

jojeck
  • 11,107
  • 6
  • 38
  • 74
  • Thank you very much for your help! The matlab algorithm that I am looking at actually start 1st window (with its centre at the beginning of the time axis, but in your drawing it is the detail of first window. Can I assume that the reason they do this because they don't want to lose the information at the beginning of the time axis ? – kuku Nov 25 '14 at 16:00
  • Picture is only exemplary. You can see that there are no values on time axis. Way you describe is perfectly OK. Anyway it's just an addition of one extra frame. – jojeck Nov 25 '14 at 16:59
  • Read the paper. Read the paper. Read the paper. Section 12. That is all. – Andy Piper Aug 20 '19 at 17:11
  • @AndyPiper: you mean 10, right? – jojeck Aug 20 '19 at 18:06
  • Section 12 is a cookbook for all of this stuff and the best place to start in my view. Section 10 is specifically about overlap. But it's all great :) – Andy Piper Aug 21 '19 at 08:12
  • This link to the white-paper doesn't work any more, but I guess this should be it. – bluenote10 May 29 '23 at 13:43
  • Yes! this is the very one! – jojeck Jun 02 '23 at 17:45
1

why overlapping the window?

Because otherwise loses information, a ton of it. STFT is equivalently convolutions (rather, cross-correlations) with windowed complex sinusoids, i.e. bandpass filtering. For:

  • Spectrograms (modulus): the loss is greatest and in every sense. Otherwise, with maximum overlap, the STFT is invertible within a global phase shift, which is a strong inversion, unlike DFT/FFT modulus. This strong inversion, and STFT's robustness properties, are exclusively enabled by overlapping in both domains.
  • Extracting phase/amplitude/frequency vs time: there's tremendous aliasing, worst for phase.
  • Non-time-frequency uses (phase vocoding, analysis/synthesis): without modulus, the STFT is perfectly invertible as long as hop_size <= window_size (NOLA), which makes a lot of algorithms possible, but said algorithms may still require analytic information (phase, amplitude, etc), which is aliased or not mapped.

hop_size or window_size - noverlap - gap between windows - is the subsampling factor along time. n_fft - size of frequency dimension - is inversely hop_size along frequency. Simple example:

The spectrogram hence wrongly suggests a pure sine where we have strong F.M., despite hop_len=64 STFT being perfectly invertible.

Further reading

OverLordGoldDragon
  • 8,912
  • 5
  • 23
  • 74
0

You could think of a N-point windowed block DFT/FFT (STFT) as a set of N complex FIR filters (running convolution) where we keep only every Mth output. The question then becomes, not «why do we have overlap», but rather «why do we decimate filter outputs».

Because we can. Because we have sufficient information. And because even with the complexity reduction of using using FFTs, running a new one for each input sample would often add too much compute cost.

Knut Inge
  • 3,384
  • 1
  • 8
  • 13
  • "Because we have sufficient information" No we don't. Title and body ask why there's overlap at all, not why there's decimation at all. Being invertible doesn't mean being useful. – OverLordGoldDragon Jun 06 '23 at 16:35
  • Not sure that I understand where you are coming from there. Convolution is «maximum overlap». Back to back block processing is zero overlap. Partially overlapped block processing could be viewed as either decimated convolution, or overlapped block processing, both views are equally valid? – Knut Inge Jun 06 '23 at 20:46
  • If I didn't have a post on this very Q&A that directly refutes your answer and answers the questions you're asking me, I'd respond differently. What you're asking is for me to put extra time, repeating what I said in a compressed form - "free work". That's neither reasonable, nor does it change the fact that, me or who else, put out easily comprehensible information that's pointed to you that you're not consulting. And generally the way to ask such questions where expecting an explanation makes sense is by opening a new post. – OverLordGoldDragon Jun 12 '23 at 20:31
  • But ok, let's do this once: OP asks why overlap at all, meaning why not hop_size = window_size. You say "we have sufficient information". Amplitude, phase, frequency over time - gone. Spectrogram (not STFT) invertibility, gone. There's a ton that is lost, and there's always some loss with non-unity hop (and for spectrogram, even unity), and this loss can be measured. To imply it's just about compute cost is way off. Your answer goes from "incorrect" to "incomplete" by specifying you don't refer to max hop. It's also not "decimate filter outputs" but subsample/downsample; no extra filtering. – OverLordGoldDragon Jun 12 '23 at 20:31
  • The term «decimation» does not always imply filtering, but convoluting by a N-sample convolutive filter (bank) instead of using an FFT you would have to do decimation in order to obtain equivalence. I can sketch it in Matlab code if my words are unclear? – Knut Inge Jun 12 '23 at 20:54
  • Yes, decimation can be just subsampling, but that's unnecessary ambiguity / potential misunderstanding. The convolution outputs are only subsampled (keep every Mth output, as you correctly wrote). So, "why do we downsample filter outputs". I favor "subsample" since "downsample" is more general and can even mean nonlinearities, but it's fine in this context. – OverLordGoldDragon Jun 12 '23 at 21:13