Background
About half a year ago, while learning about spectograms as part of an Image Processing course I took, I was told you can speed up audio using spectograms as follows:
- Calculate the spectogram of the signal (using short-time fourier transform).
- get rid of every 2nd column in the spectogram (if you want to double the speed for example, otherwise use a different ratio).
- calculate the inverse transformation to turn the spectogram back to a signal.
Since we deleting every 2nd column, we delete values only in the time domain, and not the frequency domain, and therefore the pitch won't change (this is in contrast to methods such as deleting every second sample from the original audio, which would make the pitch higher, since the frequency is increased).
Even with that, we were told this method is not complete since we also have to correct (fix) the phase.
The Qustion
Why do you need to correct the phase, and how do you do that (algorithm with mathematical explanation)? I tried looking online but couldn't find anything about this! Also, below is the code from my course, which may be helpful, and I'd like to know how it works
If I remember correctly, we were given this image saying that the following:
- The top signal is the original signal
- The middle signal is after deleting columns from the spectogram
- The last signal is after fixing the phase. (I believe this is called phase vocoding or something).
Code Example
I found this piece of code we were given which performs the speeding down/up a signal.
specis the spectogram of the signal.ratiois the ratio by which we speed up/slowdown the sound.
def phase_vocoder(spec, ratio):
num_timesteps = int(spec.shape[1] / ratio)
time_steps = np.arange(num_timesteps) * ratio
# interpolate magnitude
yy = np.meshgrid(np.arange(time_steps.size), np.arange(spec.shape[0]))[1]
xx = np.zeros_like(yy)
coordiantes = [yy, time_steps + xx]
warped_spec = map_coordinates(np.abs(spec), coordiantes, mode='reflect', order=1).astype(np.complex)
# phase vocoder
# Phase accumulator; initialize to the first sample
spec_angle = np.pad(np.angle(spec), [(0, 0), (0, 1)], mode='constant')
phase_acc = spec_angle[:, 0]
for (t, step) in enumerate(np.floor(time_steps).astype(np.int)):
# Store to output array
warped_spec[:, t] *= np.exp(1j * phase_acc)
# Compute phase advance
dphase = (spec_angle[:, step + 1] - spec_angle[:, step])
# Wrap to -pi:pi range
dphase = np.mod(dphase - np.pi, 2 * np.pi) - np.pi
# Accumulate phase
phase_acc += dphase
return warped_spec
