Audio time scaling using a phase vocoder

Question

I've implemented audio time scaling using a phase vocoder approach because I wanted to answer this question here (and learn something in the process). Since that question has been closed, I may as well ask a new question about audio time scaling.

Here are some examples of what you might get if you use the approach I did (the format is 8kHz, 32bit float, 1 channel). I can post the source code here too if someone's interested.

Examples (these are all examples of time scaling without pitch shifting - magnitude/phase modified or raw fft modified say what I did to achieve time scaling)

original

2x slowdown -magnitude/phase modified

2x slowdown - raw fft modified

2x speedup - magnitude/phase modified

2x speedup - raw fft modified

I still have a couple of questions though

1) Does anybody know of software that does time scaling right using a phase vocoder (by right I mean with as few digital artifacts as possible)?

2) How do we preserve the phase coherence/continuity while time scaling the fft spectrum? Some code or pseudo code would be great.

Regarding question #2, I know there're a lot of papers on this but they're usually so convoluted and I'm having a hard time understanding them.

You may want to clarify in your question what you mean by "time scaling right" I understand from your other question that you want to stretch the time without changing the pitch. And I assume from your links that you have managed to change the pitch without changing the time. So I think you are most of the way there - Can you get to where you want by now by resampling your stretched waveform? For example, if you are able to change pitch to 2x while keeping the time with the vocoder, can you then resample that result to twice the number of samples, and then play that at the same sampling rate? — Dan Boschen, May 04 '20 at 23:19
@Dan Boschen, the audio files pointed to by those links are actually examples of time scaling, not pitch shifting so I'm merely interested in improving the quality of time scaling that I've already achieved. — dsp_user, May 05 '20 at 05:55
Did you do it the way i suggested or another way? If so can you detail how you did the resampling? — Dan Boschen, May 05 '20 at 11:34
@Dan Boschen, no pitch shifting was done at all. The time scaling was done purely using a relatively simple interpolation / decimation procedure and then performing an inverse fft using the new "scaled" time basis ( the interpolation/decimation is basically equivalent to resampling). If interested I can post some code ( c++) — dsp_user, May 05 '20 at 11:48
oh ok---yes that is what I meant by resampling. Why not implement the phase vocoder specifically since that has all been worked out? — Dan Boschen, May 05 '20 at 11:50
@Dan Boschen, I still might do that but the question of preserving the phase coherence still remains. — dsp_user, May 05 '20 at 11:55
That would be cleared up by understanding how the phase vocoder works- Have you already seen this: http://www.panix.com/~jens/pvoc-dolson.par if so what specifically confuses you, maybe you can narrow it down to that? — Dan Boschen, May 05 '20 at 12:00
@ederwander, there's this equation in your link $q[i] = \frac{N}{2πH} princarg \left[φl[i] − φ{l−1}[i] − \frac {2πH}{N}i \right]$. What's the purpose of the first term $ \frac{N}{2πH}$ ? I'm already doing everything else so I'm pretty close — dsp_user, May 06 '20 at 06:47

Audio time scaling using a phase vocoder

0 Answers0