Python: How to generate log-frequency spectrogram from an audio?

Question

I want to make a log-frequency spectrogram out of this audio. Later, I need this spectrogram for pitch sequence analysis.

This is a sample sequence I want to achieve:

[ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0 18 18 19 19 19 19 19 19 19 19 18  0  0  0  0 19 18 18 18 19
 19 19 19 19 19 19 20 20  0  0  0 25 26 26 26 26 26 26 26 26 25  0  0  0
  0  0 26 25 25 25 25 25 25 25 25 25 25 26 26 27 27 27 27 28 28 28 28 28
 28 28 28 28 27 26 27  0 28 28 28 28 28 28 28 28 28 27 26  0  0 26 26 26
 26 26 26 26 26 26 26 26 26 26 26 26 25 25  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0 24 24 24 24 24 24 24 24 24 24 24 24 24 24  0  0  0 25 24
 24 23 23 23 23 24 24 24 24 23 23 23 23 23 22 22 22 23 23 23 23 23 23 23
 22 23 23  0  0  0 23 23 23 22 23 23 23 23 23 23 22 22 23 22 23 22 21 21
 21 21 21 21 21 21 21 21 22  0  0  0 21 21 21 21 21 21 21 21 21 21 21 20
  0  0  0  0 19 19 19 19 19 19]

From what I am able to do now, I can only plot the audio using a linear scale spectrogram, with this code:

def make_spectrogram_b(songname, titles, filename):
  x, sr = librosa.load(songname, sr=None) #sr=None, buat dapet aslinya
  duration = librosa.get_duration(filename=songname)
print("Audio shape: ", x.shape)
  print("Sample rate: ", sr)
  print("Duration of audio: ", duration)
compute stft
window = np.hanning(window_size) # window size = 1024; hop_length = 256
  stft= librosa.core.spectrum.stft(x, n_fft = window_size, hop_length = hop_length, window = window)
  out = 2 * np.abs(stft) / np.sum(window)
plot result
plt.figure(figsize=(12, 4))
  ax = plt.axes()
  ax.set_axis_off()
  librosa.display.specshow(librosa.amplitude_to_db(out, ref=np.max), y_axis='log', x_axis='time',sr=sr)
  plt.savefig(f'spectrogram_data_B/{titles}/{filename[:-3].replace(".", "")}.png', bbox_inches='tight', transparent=True, pad_inches=0.0)
  plt.clf()

This here is an output sample from code above. Output My testing result is not so satisfying, as it detects too many zero pitch values, and I think I want to change the spectrogram type.

I read from a book source (Muller, Fundamentals of Music Processing, 2015), that if we want to make a spectrogram for music analysis, we must make a log-frequency spectrogram, as quoted:

To emphasize musical or tonal relationships, the frequency axis is often plotted in a logarithmic fashion, which yields a log-frequency representation. A logarithmic frequency axis also accounts for the fact that human perception of pitch is logarithmic in nature. Finally, in the case of audio signals, the amplitude values are also often visualized using a logarithmic scale, for example, by using a decibel scale. In this way, small intensity values of perceptual relevance become visible in the image. In the following, if not specified otherwise, we use in our visualizations a linear frequency axis and a logarithmic scale to represent amplitudes. The specific scale is not of importance, but only serves the purpose of enhancing the qualitative properties of the visualization.

In Python, how can I plot this log-frequency spectrogram? Or, is there any better way to 'convert' audio given above to a visual representation for pitch analysis?

librosa.display.specshow(librosa.amplitude_to_db(out, ref=np.max), y_axis='log', x_axis='time',sr=sr) in this line you have converted the output to db or log.
Your y-axis is logarithmic also.

Then, how do you say that "I can only plot the audio using a linear scale spectrogram"? — Duck Dodgers, Feb 24 '21 at 10:12
@DuckDodgers: The problem is about the frequencies, not the amplitudes. — JRE, Feb 24 '21 at 10:23
@JRE, indeed. He has both things in log-scale. No? The amplitudes, as represented by the intensity of the color on the image, as well as the y-axis (which is not labelled/shown there in the image), but going by the code, it should be in "log-frequency". (My bad, I should not have mentioned the amplitudes at all. It is not relevant to the OP's question. I agree.) — Duck Dodgers, Feb 24 '21 at 10:30
The display uses log frequencies. The underlying spectrogram data uses linearly spaced frequencies. Ideally, you'd want log spaced frequencies feeding a log spaced display. That's the real question here: How can you calculate a spectrogram with log spaced bins? — JRE, Feb 24 '21 at 10:40
Previously I got commented on that spectrogram like this: "Your spectrogram does indeed use a log axis when plotted, but it appears that the actual underlying data is still denominated in a linear frequency scale, meaning that there's not actually any more resolution around the fundamental frequencies". That's why I said "I can only plot the audio using a linear scale spectrogram". I am actually afraid that I have some kind of misunderstanding that I can't clarify, so I post a question here, in hope I can improve the audio visualization. — Dionisius Pratama, Feb 24 '21 at 10:48
I agree with @JRE on "How can you calculate a spectrogram with log spaced bin", is it possible to do it with Python? If possible, how? Or, another question, aside from spectrogram, what kind of "good" visualization I can use to substitute spectrogram, if I can't calculate spectrogram with log spaced bin? — Dionisius Pratama, Feb 24 '21 at 10:52
@DionisiusPratama, the good Prof. Dr. Müller (who you refer to in your post as well) provides such an example on this (probably his) website :). From the 2nd heading onwards. I hope I understood correctly and this is what you were looking for. — Duck Dodgers, Feb 24 '21 at 11:07
Alternatively, there's this question here on the DSP exchange. — JRE, Feb 24 '21 at 11:10
It boggles my mind that people who are doing music stick to the FFT with its linear spacing. Why not simply calculate the Fourier transform for just the frequencies corresponding to the center frequencies of the notes? Instead, everyone does an FFT at an extremely high resolution then tries to map the bins to fit the notes. Just make the bins fit the notes to start with. That's 88 bins for the whole piano range. As efficient as the FFT is, at some point computing fewer bins wins out. — JRE, Feb 24 '21 at 11:16
As an additional win, computing each bin individually lets you tailor the width of bin and the time resolution to each tone individually. Wider bins with better time resolution for high frequencies, narrower bins with worse time resolution for the lower tones. — JRE, Feb 24 '21 at 11:18

Python: How to generate log-frequency spectrogram from an audio?

compute stft

plot result

0 Answers0