pitch extraction for speech

Question

I am building an LPC analysis tool, and am at the point of needing to do pitch period analysis. I was initially interested in using the Gold & Rabiner algorithm for this, and got a hold of a FORTRAN implementation. Before I start converting that implementation to a modern programming language, I was curious if anyone here had any suggestions for other pitch detection algorithms that are maybe a bit more modern and robust... And even better, if there are any open source libraries that have apis for pitch period analysis that my tool can hook into?

EDIT: I am building an LPC analysis tool for speech waveforms.

robert bristow-johnson · Answer 1 · 2015-01-12T19:54:12.643

it's not "modern", but if done right, correlation methods are quite robust. i personally like Average Squared Difference Function (ASDF) $$ Q_x[n_0,k] = \frac{1}{2L+1} \sum\limits_{n=n_0 - \lfloor k/2 \rfloor - L}^{n_0 - \lfloor k/2 \rfloor + L} \left( x[n]-x[n+k] \right)^2 $$

but the more familiar Average Magnitude Difference Function (AMDF) $$ Q_x[n_0,k] = \frac{1}{2L+1} \sum\limits_{n=n_0 - \lfloor k/2 \rfloor - L}^{n_0 - \lfloor k/2 \rfloor + L} \Big| x[n]-x[n+k] \Big| $$ might be more to your liking.

$x[n_0]$ is the middle of the neighborhood of samples where you are determining the pitch. $2L+1$ is the number of terms being averaged. you probably do not need to divide by $2L+1$ if you don't want to, since the values of $Q_x[n_0,k]$ are relative to each other. $\lfloor k/2 \rfloor = k/2$ for even $k$ and $\lfloor k/2 \rfloor = (k-1)/2$ for odd $k$.

look for a value of $k$ that minimizes $Q_x[n_0,k]$. you will likely want to interpolate around that integer value of $k$ to find a fractional value that is the "true" minimum. i usually just use simple quadratic interpolation. if $Q_x[n_0,k_m]<Q_x[n_0,k_m-1]$ and $Q_x[n_0,k_m] \le Q_x[n_0,k_m+1]$ then the interpolated minimum (and a candidate for the period) is

$$ P_m = k_m + \frac{1}{2} \frac{Q_x[n_0,k_m+1] - Q_x[n_0,k_m-1]}{2Q_x[n_0,k_m] - Q_x[n_0,k_m+1] - Q_x[n_0,k_m-1]} $$

if the input $x[n]$ is extremely periodic, there might be many values of $k$ that locally minimize $Q_x[n_0,k]$ because

if $x[n+P]=x[n] \quad \forall n$

then $x[n+2P]=x[n]$ or $x[n+3P]=x[n]$

and $Q_x[n_0,P]$ or $Q_x[n_0,2P]$ or $Q_x[n_0,3P]$ are all equally minimum (and close to zero). so then usually you want to pick the minimum that has the smallest $k$.

martinweiss · Answer 2 · 2015-01-13T08:40:48.727

0

If you have university access, Christensen and Jakobsson - Multi-Pitch Estimation (2009, Morgan & Claypool) is a great book [1]. The focus is on statistical methods. There's also a MATLAB toolbox. Since you didn't specify the type of signal that you're working with, it's a bit difficult to recommend a specific method. Auto-correlation methods (some references can be found in Section 1.2) are simple, but the solution may not be unique, see for instance the YIN method [2].

References
[1] http://www.morganclaypool.com/doi/abs/10.2200/S00178ED1V01Y200903SAP005
[2] http://www.cs.tut.fi/~digaudio/htyo/lahteet/2002_JASA_YIN.pdf

edited Jan 13 '15 at 08:40

answered Jan 12 '15 at 10:18

martinweiss

111
3

I edited my question above.. The signal I am working with is speech. – patrick Jan 12 '15 at 19:06
Martin, it's more likely that you'll get the error of finding $2\frac{1}{f_0}$ when the period is $\frac{1}{f_0}$. getting an octave high error (or $2 f_0$) happens in the case where the energy in the odd-numbered harmonics is very low compared to the total energy. so, it's more likely that you'll get a $\frac12 f_0$ error than a $2 f_0$ error. – robert bristow-johnson Jan 12 '15 at 20:11

pitch extraction for speech

2 Answers2

Linked