0

I am building an one dimensional Convolutional Neural Network and wish to feed it raw audio but it turns out that the number of samples is very larger. The audio was sampled at 48 kHz and the duration is 20 sec each which makes the number of samples = 960000. I was wondering if there was a way to reduce the number of samples.

From what I understand, the Nyquist rate governs the maximum frequency in a signal content. If I were to reduce the number of samples by resampling, the sampling rate changes so does the maximum frequency cutoff. If I resample the audio, I think that I will lose the high-frequency content.

Is my thinking correct? Are there others things that I need to consider?

1 Answers1

1

I resample the audio, I think that I will lose the high-frequency content.

If you just resample you will get aliasing, i.e. the high frequency content will wrap around and added as noise to the low frequency content. The better procedure is to low pass filter first and control the way how the high frequency content is eliminated and than resample.

The best design for this low-pass filter is highly dependent on the specific requirements of your application.

Assuming that CNN refers to a Neural Network, you may be better off doing some sort of feature extraction from the audio before feeding it to a neural network. Could be different types of spectrograms, could be envelope, could be pitch, could be formant maps, etc. Raw audio data is rarely useful in machine learning since it contains huge amounts of information most of which is often irrelevant. Some application specific pre-processing can help a lot.

Hilmar
  • 44,604
  • 1
  • 32
  • 63