I have a series of tones, that are no more than a second long each (around 0.6/7 seconds each), I am recording them through a soundcard and want to be able to match which tone has been played. I am quite new to this but have tried implementing a pre-made library using audio fingerprinting but do not think my clips are long enough for their implemented approach, I think the only way to get reliable results is going to be a more hands on approach. I was hoping someone with more audio expertise would be able to outline the best approach I could take? I have attached an image of the an example audio file (I cant seem to upload the file itself)
-
Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Sep 06 '21 at 12:32
2 Answers
Not really an answer but to too long for a comment:
You will need some sort of feature detection. Which ones to use depends a lot on what your specific sounds are: What's different about them and what's the same ? Once you understand what's different about them, you can write an algorithm to detect the different features.
Depending on what exactly the features are and how different they are, this can be trivial or turn into a major science project.
EDIT after listening
The first two are easily distinguishable. The consists of two tone C5 (524Hz) and G5 (784 Hz). The first one is pitch up the other one is reversed, i.e. pitch down.
The third one sounds weird. It's constant pitch but it's not a music scale pitch (it's between C5 and C#5) and it seems heavily clipped distorted and with a fair bit of noise as well. It's not similar to the other two.
- 44,604
- 1
- 32
- 63
-
Thanks for your response :) I have two tones that are fairly similar (although I believe distinguishable) I attached a dropbox link containing the tones, would you be able to give them a listen and let me know if this is going to be a headache or not? XD I think I might have some luck if I calculate the fft and compare ? or potentially a dtw (or maybe I'm overcomplicating it :/ )? – Charlie Sep 06 '21 at 09:16
-
why listen to something that you could actually describe with numbers, Charlie? We'll need numbers, anyway. Which tones are they, which duration do they have? What's different about them and what is the same? Please answer these questions in your question. – Marcus Müller Sep 06 '21 at 11:48
-
Hi Marcus, what numbers from the audio files would you require that would help you answer my question? I will add some more detail about the sound files, Cheers – Charlie Sep 06 '21 at 12:02
OK, so let's import your tones into Python and have a look.
import matplotlib.pyplot as plt
fig, axs = plt.subplots(3, figsize=(6, 8))
axs[0].plot(tone1)
axs[1].plot(tone2)
axs[2].plot(tone3)
We can then plot the spectrograms of the tones and zoom in on the appropriate (low) part of the spectrum.
import numpy as np
import matplotlib.pyplot as plt
fig, axs = plt.subplots(3, figsize=(6, 8))
output1 = axs[0].specgram(tone1)
axs[0].set_ylim([0,0.1])
output2 = axs[1].specgram(tone2)
axs[1].set_ylim([0,0.1])
output3 = axs[2].specgram(tone3)
axs[2].set_ylim([0,0.1])
Then the first two can be seen to be a sinusoid that changes frequency and the third can be seen to be a single (noisy + harmonics) tone.
My favorite frequency estimation technique is the Quinn-Fernandes method. I had a Matlab implementation, but decided to translate it to Python. Note this is an early version and it has yet to be debugged (included below).
So I plotted the QNF estimate on top of each of the spectrograms and it seems to do the right thing.
import numpy as np
import matplotlib.pyplot as plt
fig, axs = plt.subplots(3, figsize=(6, 8))
output1 = axs[0].specgram(tone1)
axs[0].set_ylim([0,0.1])
axs[0].plot([2000,6000], qnf(tone1[4000:12000])[1/np.pi,1/np.pi],'r')
axs[0].plot([8000,12000], qnf(tone1[16000:24000])[1/np.pi,1/np.pi],'k')
output2 = axs[1].specgram(tone2)
axs[1].plot([2000,6000], qnf(tone2[4000:12000])[1/np.pi,1/np.pi],'r')
axs[1].plot([9000,15000], qnf(tone2[18000:30000])[1/np.pi,1/np.pi],'k')
axs[1].set_ylim([0,0.1])
output3 = axs[2].specgram(tone3)
axs[2].plot([9000,120000], qnf(tone3[18000:240000])*[1/np.pi,1/np.pi],'r')
axs[2].set_ylim([0,0.1])
All that is required is to convert the qnf estimate (which will be normalized) to something that involves the appropriate sampling frequency, and this will give a frequency in Hz.
Quinn-Fernandes in Python
#
# QNF - The Quinn - Fernandes frequency estimator.
#
# Inputs: signal - T x N matrix where
# T = data length
# N = number of signals
# (i.e. N signals in columns).
#
# Outputs: est - N Quinn-Fernandes frequency estimates.
#
# [1] B.G. Quinn & J.M. Fernandes, "A fast technique for the estimation of frequency,"
# Biometrika, Vol. 78(3), pp. 489--497, 1991.
$Id: qnf.m 1.1 2000/06/07 18:57:16 PeterK Exp PeterK $
File: qnf.m
Copyright (C) 1993 CRC for Robust & Adaptive Systems
This software provided under the GNU General Public Licence, which
is available from the website from which this file originated. For
a copy, write to the Free Software Foundation, Inc., 675 Mass Ave,
Cambridge, MA 02139, USA.
import numpy as np
import scipy.signal as signal
def qnf(sig):
if type(sig) is not np.ndarray:
raise Exception("signal must be an np.ndarray")
#
# Initializations
#
shape=sig.shape;
t = shape[0]
if t < 4:
raise Exception("signal must be longer than 1 point")
if (len(shape)==1):
ns = 1
sig = np.array([sig]).T
else:
ns = shape[1]
xb = np.mean(sig, axis=0)
if (len(xb.shape) == 1):
xbm = np.multiply(np.ones([t,1]),xb)
else:
xbm = np.matmul(np.ones([t,1]),xb)
sig=np.subtract(sig,xbm)
t3 = t+1
y = np.fft.fft(sig, n=2*t, axis=0)
z=np.multiply(y,np.conj(y))
z=z[2:t3,]
[m,j]= z[2:t-1,].max(axis=0), z[2:t-1,].argmax(axis=0)
j=j+1 # TODO: Not needed because of zero indexing?
a=2*np.cos(np.pi*j/t)
y=y[1:2*t:2]
#
# Quinn-Fernandes method
#
b=[1];
nm=t-1;
for jjj in [1,2]:
for q in np.arange(ns):
c=[1,-a[q],1];
y[:,q] = signal.lfilter(b,c,sig[:,q])
v = np.sum(np.divide(np.multiply(sig[2:t,],y[1:nm,]),np.sum(np.multiply(y[1:nm,],y[1:nm,]))))
a = np.add(a,2*v);
return np.real(np.arccos(a/2));
# Author: SJS 1992; Adapted from code within ttinpie.m (author PJK)
#
# Based on: P.J. Kootsookos, S.J. Searle and B.G. Quinn,
# "Frequency Estimation Algorithms," CRC for Robust and
# Adaptive Systems Internal Report, June 1993.
- 25,714
- 9
- 46
- 91



