Real time variable speed audio playback with interpolation

Question

I'm trying to correctly implement the real-time variable speed playback of my buffer, to achieve the pitch shifting. I don't need to stretch to preserve the original length.

Buffer is calculated with 48000 sampling rate.

The code I have so far works well until some high-frequency audio is stored. Then when pitching up or down it starts to produce awful aliasing.

float MyClass::GetOutputSample() {
    int integral = static_cast<int32_t>(interpolation_index_);
    float frac = interpolation_index_ - static_cast<float>(integral);
    interpolation_index_ += pitch_;
    if(interpolation_index_ > BUF_LEN) {
        interpolation_index_ = 0;
    }
int t = (integral + BUF_LEN);
float y0 = output_buffer_[(t) % BUF_LEN];
float ym1 = output_buffer_[(t - 1) % BUF_LEN];
float ym2 = output_buffer_[(t - 2) % BUF_LEN];
float y1 = output_buffer_[(t + 1) % BUF_LEN];
float y2 = output_buffer_[(t + 2) % BUF_LEN];
float y3 = output_buffer_[(t + 3) % BUF_LEN];

// 6-point, 5th-order Hermite (x-form)
float eighthym2 = 1/8.0*ym2;
float eleventwentyfourthy2 = 11/24.0*y2;
float twelfthy3 = 1/12.0*y3;
float c0 = y0;
float c1 = 1/12.0*(ym2-y2) + 2/3.0*(y1-ym1);
float c2 = 13/12.0*ym1 - 25/12.0*y0 + 3/2.0*y1 - eleventwentyfourthy2 + twelfthy3 - eighthym2;
float c3 = 5/12.0*y0 - 7/12.0*y1 + 7/24.0*y2 - 1/24.0*(ym2+ym1+y3);
float c4 = eighthym2 - 7/12.0*ym1 + 13/12.0*y0 - y1 + eleventwentyfourthy2 - twelfthy3;
float c5 = 1/24.0*(y3-ym2) + 5/24.0*(ym1-y2) + 5/12.0*(y1-y0);
float result = ((((c5*frac+c4)*frac+c3)*frac+c2)*frac+c1)*frac + c0;

return result;

}

BUF_LEN is 144000 samples of f32.

How I can mitigate aliasing in my case? If I understand correctly before doing the interpolation the original buffer should be upsampled to something like 96000 with another interpolation. I can do that at the time of buffer calculation so it can be performed once.

So what next? Should the upsampled buffer be low-passed filtered? I do not fully understand this step.

Also, how correctly downsample in real time after the pitch-shifting interpolation? Should another interpolation be executed?

The important note is that this is being executed in the embedded platform so computational resources are limited.

I have some questions about what your intended end result is. Are you trying to vary the speed of playback without changing pitch? Or is it your intent to change the pitch to somehow accommodate the change in playback speed? Or do you want the pitch to change along with the playback speed as it would with analog tape? — robert bristow-johnson, Sep 08 '23 at 16:15
@robertbristow-johnson my intended result is to change the pitch of the buffer in the real-time playback, at least around +-2 times up/down. So variable playback is pretty much == pitch change in this case, and that is what I'm trying to do. — coldmind, Sep 08 '23 at 16:19
So would the pitch change be what naturally occurs by speeding up or slowing down playback? I.e. if you sped it up to 2x, the tempo would be double fast and the pitch an octave higher? — robert bristow-johnson, Sep 08 '23 at 16:26
@robertbristow-johnson Yes, and with the interpolation I have now that is happening and is intended. But when the audio buffer contains some high-frequency content it starts to produce very apparent aliasing when speeding up or slowing down — coldmind, Sep 08 '23 at 16:30
Okay, so now I know there isn't splicing or "time-compression" involved. So it is only a matter of interpolation. There should be no problem of aliasing when you're slowing down, but there can be one when you're speeding up. If aliasing is a problem, you need to low-pass filter those high frequency components before striding through the audio samples with your interpolator. — robert bristow-johnson, Sep 08 '23 at 16:34
In this answer I discuss the math in agonizing detail in using windowed-sinc based interpolation to accomplish an arbitrary delay with fractional-sample precision. You can vary that delay to speed things up or slow them down. This does not deal with any pre-filtering you may want to do to prevent aliasing when speeding up. — robert bristow-johnson, Sep 08 '23 at 16:42
@robertbristow-johnson lowpassing is a part I don't fully understand. If my sampling rate is 48000, should I calculate the new sampling rate for the new Nyquist and then place a low-filter with the cutoff NewNyquist/2? So in case of 2x speed, the new SR would be 24000, and the cutoff 12000? — coldmind, Sep 08 '23 at 16:44
Yes. I believe you have that correctly. Because if you're speeding up to 2x, your 10 kHz sinusoid will become 20 kHz. You want to kill anything above 12 kHz, because it will be shifted to above 24 kHz and that will alias to below 24 kHz. — robert bristow-johnson, Sep 08 '23 at 16:46
@robertbristow-johnson thank you! But will it be better to oversample the buffer, do pitch shift and downsample it? It sounds like a solution, but I’m not sure how to downsample correctly — coldmind, Sep 08 '23 at 23:14

Real time variable speed audio playback with interpolation

0 Answers0