0

I'm using a simulated data set which can be downloaded from the link in the fist paragraph on this page (it's an Excel file):

https://www.analyticbridge.datasciencecentral.com/forum/topics/challenge-of-the-week-detecting-multiple-periodicity-in-time-seri

import pandas as pd
from matplotlib import pyplot as plt
from scipy import signal
import numpy as np

plt.rcParams["figure.figsize"] = (16, 9)

I'm looking to visualize all spectral components of the signal, in an exploratory way.

Loading the dataset:

df = pd.read_excel('stars.xlsx', sheet_name=1, usecols=[1, 2], index_col=0, names=['obs', 'mag'])
df.plot();

graph

I am trying to figure out all spectral components, including possible frequencies with a very long period. With scipy.signal.spectrogram() I can increase nperseg to catch the long periods.

But if I increase it too much, I get nothing, the plot is empty. To work around that, I need to also increase noverlap to the highest value possible.

sig = np.array(df['mag'].tolist())
sf = 1
nseg = 2000
f, t, Sxx = signal.spectrogram(sig, sf, nperseg=nseg, noverlap=nseg-1)

flen = len(f) fshort = f[1:flen//10] Sxxshort = Sxx[1:flen//10] per = np.rint(1 / fshort).astype(int)

plt.yticks(ticks=fshort[::2], labels=per[::2]) plt.ylabel('period') plt.xlabel('obs') plt.pcolormesh(t, fshort, np.log10(Sxxshort), shading='gouraud') plt.show()

spectrum

My question is: what are some possible side-effects from increasing noverlap a lot? In what ways could the spectrogram be "wrong"? (for lack of a better term)

Would it create artifacts? (information that doesn't exist in the original)

Would it lose any information from the original? (other than not being able to visualize the beginning and the end of the time interval, which is fine by me, because the spectrum doesn't change a whole lot in time)

OverLordGoldDragon
  • 8,912
  • 5
  • 23
  • 74

1 Answers1

2

noverlap = nperseg - 1 provides maximum possible information - it is the 'ideal' configuration. A spectrogram is $|\text{STFT}|$, and $\text{STFT}$ is input convolved with windowed complex sinusoids.

  • noverlap is surrogate for hop_size: hop_size == nperseg - noverlap
  • hopsize is the stride of convolution

But if I increase it too much, I get nothing, the plot is empty.

This should only happen if the window is too time localized (narrow), which isn't the case with scipy's default of ('tukey', .25).

Update: plt.pcolormesh is to blame. Output shape is (nfft, 1): there's only single time step, and pcolormesh tries to plot effectively 1D data. That's because hop_size == 2000 - 2000//8 == 1750 and the entire input is ~2200 samples long, so there can only be one hop. It can still be plotted with e.g. plt.imshow(..., aspect='auto').

My question is: what are some possible side-effects from increasing noverlap a lot?

Longer compute and greater memory use

In what ways could the spectrogram be "wrong"?

None; greater noverlap is superior in every way. Less overlap bears risks -- see here.

For time-frequency analysis, over scipy I recommend librosa, or ssqueezepy for advanced methods and hardware acceleration.

OverLordGoldDragon
  • 8,912
  • 5
  • 23
  • 74