Given a spectrogram calculated using the following code:
import matplotlib.pyplot as plt
import numpy as np
import scipy
from scipy import signal, fft
from scipy.io import wavfile
from skimage import util
import librosa
import pylab
# from scipy.fftpack import fft
def stft(x, fs, framesz, hop):
framesamp = int(frameszfs)
hopsamp = int(hopfs)
w = scipy.hanning(framesamp)
X = scipy.array([scipy.fft(w*x[i:i+framesamp]) for i in range(0, len(x)-framesamp, hopsamp)])
return X
def istft(X, fs, T, hop):
x = scipy.zeros(Tfs)
framesamp = X.shape[1]
hopsamp = int(hopfs)
for n,i in enumerate(range(0, len(x)-framesamp, hopsamp)): x[i:i+framesamp] += scipy.real(scipy.ifft(X[n]))
return x
audio_data = f'../recordings/SDRuno_20220309_140138Z_1kHz.wav' # y movement as test...
fs, data = wavfile.read(audio_data)
data = np.mean(data, axis=1) # convert to mono by avg l+r channels
N = data.shape[0]
L = N / fs
amp = 2 / N * np.abs(data)
print(f'Audio length: {L:.2f} seconds')
f, ax = plt.subplots()
ax.plot(np.arange(N) / fs, data)
ax.set_xlabel('Time [s]')
ax.set_ylabel('Amplitude [unknown]');
plt.show()
M = int(fs * 0.001 * 20) # originally we had 1024, however now we use this for 20ms window resolution
amp = 2 * np.sqrt(2)
stft_sig_f, stft_sig_t, stft_sig_Zxx = signal.stft(data, fs=fs, window='hann', nperseg=M, detrend=False)
plt.pcolormesh(stft_sig_t, stft_sig_f, np.abs(stft_sig_Zxx), vmin=0, vmax=amp, shading='gouraud')
plt.title('STFT Magnitude')
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.show()
freqs, times, Sx = signal.spectrogram(data, fs=fs, window='hanning', nperseg=M, detrend=False, scaling='spectrum')
f, ax = plt.subplots(figsize=(10,5))
ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx), cmap='inferno')
ax.set_ylabel('Frequency [kHz]')
ax.set_xlabel('Time [s]');
plt.show()
How can one use the spectrogram's "cleaner signal" to filter the audio. I did read somewhere that the Sx is effectively the Zxx**2 for which the Zxx can be used for an ISTFT to reconstruct audio, however my knowledge is pretty lacking. Any help would be appreciated!
EDIT: The current code for plotting the STFT just shows a black window.
The IQ WAV file used in this example can be found here: https://www.dropbox.com/s/at31myl5oufvfa3/SDRuno_20220309_140138Z_1kHz.wav?dl=0
scipy.stft/istftto keep it simple - no point re-inventing the wheel heh. So correct me if I am wrong, but for say a file of lengthnseconds (i.e. 120s), I should split this up into say 5s chunks (or even my operation intervals of 2s) and then do the STFT + ISTFT on each of these "audio"/signal chunks (i.e. via PyDub) and then extract the features from the result of the ISTFT? – rshah Mar 11 '22 at 14:21x + 1 - 1. Otherwise yes. But it depends what "features" mean; if you have a large enough dataset, then a raw waveform (ISTFT) may work;abs(STFT)(spectrogram) is a "feature", and a better one with limited data. – OverLordGoldDragon Mar 11 '22 at 14:53abs(STFT)is a more robust and discriminative alternative to a raw waveform if data's limited. – OverLordGoldDragon Mar 11 '22 at 16:07