The Zero-Crossing rate threshold for a voiced/unvoiced decision

Question

I've implemented a function that calculates the zero-crossing rate for a given signal. I've used this same function to calculate the pitch. To differentiate voiced signals from unvoiced with reference to ZCR: A high ZCR means that the signal is unvoiced and a low ZCR means that it is voiced. My question is whether there is a threshold above which we can consider that a signal is unvoiced.

score 0 · Accepted Answer · answered Mar 04 '20 at 14:18

0

Plain ZCR criterion is not enough for robust and accurate voiced/unvoiced separation. That being said, your threshold should be adaptive, there is no fixed threshold that works well for all speech waveforms. It depends on the approach you follow but a statistical threshold should do the work most of the time. You can google your question to find relative papers and see if their approach suits your purpose.

Just a note, ZCR is not the best choice for pitch estimation - it just gives a rough estimation.

answered Mar 04 '20 at 14:18

GKH

1,087
6
13

Thank you. In fact, I'm supposed to do a comparison between this method and other methods: autocorrelation, short-time energy and pitch detection algorithms (Yin and SWIPE). – Anass Naqqad Mar 04 '20 at 15:12
I see. That's fine then. – GKH Mar 04 '20 at 17:21

The Zero-Crossing rate threshold for a voiced/unvoiced decision

1 Answers1