I'm working with speech signals and my aim is to estimate the fundamental frequency
$\ F_0$ of this signal often called as "pitch".
The main idea is taking small blocks of the speech signal such that stationary can be assumed. Then calculating the autocorrelation function (ACF) of this block of a speech signal and finding index the global maximum of the ACF (except at zero) which refers to fundamental frequency.
But in the text it is stated that :
The global maximum might not be at the lag corresponding to the true fundamental frequency but can possibly be an integer multiple of it. Due to this, the maximum can jump in consecutive frame between lags corresponding to multiples of T0 leading also to jumps in the F0-estimate. These effects are called octave-jumps.
My questions arise at this point: How does octave jumps occur? What is the possible reason? I know that ACF is a periodic function since original time sequence is periodic and in my opinion, this period equals to the block length of the original speech signal we are working on. When I investigate the interval, first period of the ACF, how can I decide whether the maximum is refers to the pitch or it is a maximum shifted from the consecutive period (block)? How can ı prevent from this effect?