I am trying to implement a karaoke scoring algorithm and, for a particular song, have the note information (done by a professional) that indicates what note occurs in which part of the lyric.
When I use a pitch detector and plot the pitch values for the portion of the lyric where a single note is held, the pitch varies a lot in the beginning (for the first few pitch values), hits a steady value (that roughly corresponds with the note that needs to be sung) and again shows a lot of variation before it transitions to the next note. I want to derive a single pitch value that is representative of the pitch value when the note is held steadily.
How do I go about it? Is this a problem of pitch smoothing (median)?
I am using a pitch algorithm based on the G.729 pitch detector with a pitch smoothing algorithm that seems to work really well (so far) on speech and on male singers singing in the presence of background music (very low volume).
windowed-modeoperation to isolate the steady regions. It seems to work decently well given that the window size and shift are figured out. – Sriram Oct 19 '11 at 15:19