scipy.signal.spectrogram() with noverlap=nperseg-1, what are the possible side-effects?

Question

I'm using a simulated data set which can be downloaded from the link in the fist paragraph on this page (it's an Excel file):

https://www.analyticbridge.datasciencecentral.com/forum/topics/challenge-of-the-week-detecting-multiple-periodicity-in-time-seri

import pandas as pd
from matplotlib import pyplot as plt
from scipy import signal
import numpy as np
plt.rcParams["figure.figsize"] = (16, 9)

I'm looking to visualize all spectral components of the signal, in an exploratory way.

Loading the dataset:

df = pd.read_excel('stars.xlsx', sheet_name=1, usecols=[1, 2], index_col=0, names=['obs', 'mag'])
df.plot();

I am trying to figure out all spectral components, including possible frequencies with a very long period. With scipy.signal.spectrogram() I can increase nperseg to catch the long periods.

But if I increase it too much, I get nothing, the plot is empty. To work around that, I need to also increase noverlap to the highest value possible.

sig = np.array(df['mag'].tolist())
sf = 1
nseg = 2000
f, t, Sxx = signal.spectrogram(sig, sf, nperseg=nseg, noverlap=nseg-1)
flen = len(f)
fshort = f[1:flen//10]
Sxxshort = Sxx[1:flen//10]
per = np.rint(1 / fshort).astype(int)
plt.yticks(ticks=fshort[::2], labels=per[::2])
plt.ylabel('period')
plt.xlabel('obs')
plt.pcolormesh(t, fshort, np.log10(Sxxshort), shading='gouraud')
plt.show()

My question is: what are some possible side-effects from increasing noverlap a lot? In what ways could the spectrogram be "wrong"? (for lack of a better term)

Would it create artifacts? (information that doesn't exist in the original)

Would it lose any information from the original? (other than not being able to visualize the beginning and the end of the time interval, which is fine by me, because the spectrum doesn't change a whole lot in time)

OverLordGoldDragon · Accepted Answer · 2021-08-27T12:21:25.650

noverlap = nperseg - 1 provides maximum possible information - it is the 'ideal' configuration. A spectrogram is $|\text{STFT}|$, and $\text{STFT}$ is input convolved with windowed complex sinusoids.

noverlap is surrogate for hop_size: hop_size == nperseg - noverlap
hopsize is the stride of convolution

But if I increase it too much, I get nothing, the plot is empty.

This should only happen if the window is too time localized (narrow), which isn't the case with scipy's default of ('tukey', .25).

Update: plt.pcolormesh is to blame. Output shape is (nfft, 1): there's only single time step, and pcolormesh tries to plot effectively 1D data. That's because hop_size == 2000 - 2000//8 == 1750 and the entire input is ~2200 samples long, so there can only be one hop. It can still be plotted with e.g. plt.imshow(..., aspect='auto').

My question is: what are some possible side-effects from increasing noverlap a lot?

Longer compute and greater memory use

In what ways could the spectrogram be "wrong"?

None; greater noverlap is superior in every way. Less overlap bears risks -- see here.

For time-frequency analysis, over scipy I recommend librosa, or ssqueezepy for advanced methods and hardware acceleration.

The code you see is all I have. I start with noverlap undefined, and nperseg small (cca 200). Then I increase nperseg. At some point (nperseg approx 1000) I get a blank. That's when I need to tweak noverlap. — Florin Andrei, Aug 27 '21 at 08:27
Your explanations made the proverbial lightbulb go off in my head. This is stuff I've studied years ago and the details were a bit soft - but now it's clear again. Thank you so much! — Florin Andrei, Aug 27 '21 at 15:39

scipy.signal.spectrogram() with noverlap=nperseg-1, what are the possible side-effects?

1 Answers1