I have a collection of signals (IQ wav) split up into ~2s samples of sampling rate 2MHz, and can collect the STFT information from these samples through the following code:
# The following is in a for loop over the directories which hold the samples
####
#
fs, x = scipy.io.wavfile.read(f'../category/signal_sample_{i}.wav')
# Once the recording is in memory, we normalise it to +1/-1
x = x / np.max(np.abs(x))
# We convert to mono by averaging the left and right channels.
x = np.mean(x, axis=1)
x = np.asarray(x, dtype=np.float32) # np.float32)
# Create a 20ms [hanning] hop window
M = int(fs * 0.001 * 10)
# Number of samples
N = x.shape[0]
L = N / fs # audio length
f, t, Zxx = signal.stft(x, fs=fs, window='hann', nperseg=M)
From what I understand the STFT info is found in Zxx which for my case, typically takes the shape of (10001, 401). Unfortunately, while a subset of my entire sample set for each category can be stored in memory, the collection as a whole is too big to do this!
I've looked into using CVNNs etc. for classifying the complex ndarrays (Zxx), which is fine, however I am still struggling to figure out the approach to take for training (and ultimately using some for testing/validation).