3

So here is the "seminal" YIN paper:

Cheveigne A, Kawahara H. - YIN, a fundamental frequency estimator for speech and music

and the new, improved probabilistic YIN:

Mauch M, Dixon S. - PYIN: A fundamental frequency estimator using probabilistic threshold distributions

Now, I am already at a point where I am unimpressed with the only new idea in YIN which is this "cumulative mean normalized difference function" (CMNDF) which seems to have, for its only purpose, to avoid picking the minimum of the "average squared difference function" (ASDF) that always exists at zero lag. Big deeel. There are other, better ways of avoiding that obvious error.

What I am trying to figure out with this pYIN is the pitch tracking and pitch candidate evaluation model. I understand Markov processes and I think I understand what the Hidden Markov Model (HMM) is. And I believe that they are trying to connect pitch candidates of the previous frame to pitch candidates of the current frame. I have done this with my pitch detectors (but not so much using an HMM). We do this to avoid glitches and bleeps arising from an octave error when either the presence of a subharmonic causes the period to appear to be twice as long as the desired period, or weak odd-harmonics causes the period to appear to be half as long as the desired period. I am quite familiar with this problem and have attacked it with my own bonehead technique which seems to work pretty well, is simple, and makes very few assumptions about the model of the quasi-periodic signal of which we wanna know the period.

I haven't groked from the pYIN paper how a finite (hopefully small) number of pitch candidates are chosen in the current frame and how they are related to pitch candidates of the previous frame.

And then I wanna grok how they eventually score each pitch candidate for the final decision.

In my pitch detectors, I use a variant of autocorrelation computed directly from the ASDF function, and maxima of the autocorrection always correspond to minima of the ASDF. Avoiding the peak at zero lag is trivial if you guarantee that the input to the pitch detector is DC free, but then of each peak that exceeds a very gross threshold, I handicap peaks at larger lags only slightly so that I do not spuriously pick the lag at two periods simply because it looks slightly better than the lag at one period.

Not shown in that SE answer of mine is how each peak that I consider a candidate to be a pitch candidate (these are peaks that exceed a gross threshold), determine the exact location (to fractional sample precision) and height of the peak using parabolic interpolation, and if that peak is one of the, say, five highest peaks, I call it a "pitch candidate" and associate that candidate with a candidate in the previous frame closest to the current candidate in log frequency (which is pitch). Then a score is made to each current candidate that is a heuristic function of the current peak height and the previous frame assuming this current peak is tied one of the previous frame. And for "stickiness", whatever candidate wins in the previous frame, if there is a candidate in the current frame that is tied to the previous frame winner, there is an extra boost to that candidate score. A slight preference is give to the candidate that I had previously chosen.

But, what is Mauch and Dixon doing? If there was some code outlining this, I would be interested. I want to ditch the YIN thing but consider their candidate scoring method. But I cannot decode it from the paper.

robert bristow-johnson
  • 20,661
  • 4
  • 38
  • 76
  • I ain't got no answers for you, but I think if ever there were evidence that journals have no business publishing algorithm papers without attached working implementations at least linked to a zip file on a university server (let alone a GitHub page), this is it. On a side note, I had good success using the Harmonic Product Spectrum, it's effective for ruling out octave errors, and the implementation is dead simple. – panthyon May 06 '19 at 02:29
  • HPS is -> take 3 or 4 four successively downsampled frames (frame, decimate x2, decimate x3, decimate x4), multiply them, and the harmonics should line up and reinforce each other. Take the argmax of that product. Multiply by the frame size over the sampling rate and that should give you the fundamental f0. – panthyon May 06 '19 at 02:39
  • Question: In your 3rd to last paragraph, first sentence, you have"maxima of the autocorrection always correspond... " Maybe you got autoincorrected? I'll delete this if you meant 'autocorrelation' and make the change :) – BrianO May 04 '22 at 08:02
  • The Python library librosa implements both YIN and pYIN. Clean, commented code: https://github.com/librosa/librosa/blob/main/librosa/core/pitch.py – BrianO May 04 '22 at 08:44
  • PPS The link for the pYIN paper isn't good anymore. It's easy to find by title. Here's a probably stable URL: https://www.eecs.qmul.ac.uk/~simond/pub/2014/MauchDixon-PYIN-ICASSP2014.pdf – BrianO May 05 '22 at 18:57
  • Finally, P*S, the pYIN article authors published their C++ source code — "We make the method freely available online1 as an open source C++ library for Vamp hosts" — at http://code.soundsoftware.ac.uk/projects/pyin – BrianO May 07 '22 at 18:28
  • Any new input for reading the paper? I'm also finding it very difficult to decipher, because there are some symbols that have not been declared, like $P^{*}_k$ in eq. 6, and the implementation details seem to get hidden in discussion about details. – mavavilj May 10 '22 at 09:14
  • I'm only interested in the pitch candidate scoring method. i.e. when there are several pitch candidates and, associated with each one, a vector containing several "goodness of fit" parameters and then the "p" part of the algorithm makes a choice. – robert bristow-johnson May 10 '22 at 19:01

1 Answers1

2

If there was some code outlining this, I would be interested

I don't have the expertise to describe pYin precisely, but I can point you to the original code for that paper https://code.soundsoftware.ac.uk/projects/pyin/repository

There's also a more recent implementation which made it into the librosa library since this question was asked. The code and pull request discussing the implementation are here: https://github.com/librosa/librosa/pull/1063/files#diff-f4245b657ccbb33a7bf56a5f7754585dfbbb80d2e545e840f678aceb548aa206R735

xavriley
  • 121
  • 2