alright, conceptually, for one frame (which is centered at $n=0$ and has non-zero width of $N-1$ samples):
$$\begin{align}
X[k] &= \mathcal{DFT}\Big\{ x[n] w(n) \Big\} \\
\\
&= \sum\limits_{n=-\tfrac{N}{2}}^{\tfrac{N}{2}-1} \big( x[n] w(n) \big) \, e^{-j \frac{2 \pi}{N} nk} \qquad |k| \le \tfrac{N}2 \\
\end{align}$$
here, our frame hop is $\tfrac{N}2$ samples, which is half of the window width.
$w(t)$ is a continuous-time complementary window with non-zero width of $N$ samples and centered at $t=0$, like a Hann window:
$$ w(t) = \begin{cases} \tfrac12 + \tfrac12 \cos\left(\tfrac{2\pi}{N}t \right) \qquad & |t| < \tfrac{N}2 \\
0 \qquad & |t| \ge \tfrac{N}2 \\
\end{cases} $$
the output spectrum is a scaled copy of the input spectrum. this is what shifts the pitch.
$$ Y[k] = \begin{cases}
X(\alpha^{-1} k) \qquad & |k| \le \tfrac{N \alpha}{2} \\
0 \qquad & |k| > \tfrac{N \alpha}{2} \\
\end{cases} $$
where $\alpha$ is the output-to-input frequency ratio. $X(f)$ is a continuous-frequency function that is interpolated from the discrete-frequency spectrum $X[k]$ in such a way that equality exists when the argument is an integer.
$$ X(f) \bigg|_{f=k} = X[k] \qquad k \in \mathbb{Z} $$
the discrete-time-domain output for this frame (before adjusting for stretching or scrunching the window) is:
$$\begin{align}
y[n] &= \mathcal{iDFT}\Big\{ Y[k] e^{j \phi[k]} \Big\} \\
\\
&= \tfrac{1}{N} \sum\limits_{k=-\tfrac{N}{2}}^{\tfrac{N}{2}-1} \big( Y[k] e^{j \phi[k]} \big) \, e^{j \frac{2 \pi}{N} nk} \qquad |n| \le \tfrac{N}2 \\
\end{align}$$
and the frame of output (centered at time $n=0$) is
$$ y[n] \frac{w(n)}{w(\alpha \, n)} $$
the $\frac{w(n)}{w(\alpha \, n)}$ factor is undoing the stretched or scrunched input window and re-applying the original window at the same scale (since we're not time-scaling and the frame hop length remains unchanged from input to output).
$\phi[k]$ is a phase adjustment (with odd symmetry in $k$) that is constant for each sinusoidal component (a la Miller Puckette) rather than a changing phase adjustment for each FFT bin (a la Portnoff). that phase adjustment for each frequency component is what is required to make each sinusoid component continuous because of the frequency shift. that's how the phase vocoder does glitch-free pitch shifting, even if the input is not periodic or a single harmonic note.
this is why this phase adjustment is needed. consider a single sinusoid with normalized angular frequency $\omega_0$.
$$\begin{align}
x[n] &= \cos(\omega_0 n + \theta_0) \\
&= \tfrac12\big(e^{j (\omega_0 n + \theta_0)} + e^{-j (\omega_0 n + \theta_0)} \big) \\
&= \tfrac12\big(e^{j\theta_0}e^{j \omega_0 n} + e^{-j\theta_0}e^{-j \omega_0 n} \big) \\
\end{align}$$
now let's consider only the positive frequency complex component.
$$ \hat{x}[n] = e^{j\theta_0}e^{j \omega_0 n} $$
this has an instantaneous angle of $\theta_0$ at time $n=0$. now the instantaneous value of this sinusoid in the center of the previous frame is
$$ \hat{x}[-\tfrac{N}{2}] = e^{j\theta_0}e^{-j \omega_0 \frac{N}{2}} $$
and the instantaneous angle is $\theta_0 - \tfrac{N}{2} \omega_0$. and it shouldn't surprise us that halfway between the two frame centers (which is exactly in the middle of the crossfade from the previous frame to the current frame) the instantaneous angle $\theta_0 - \tfrac{N}{4} \omega_0$
now if the output spectrum is mapped from the interpolated input spectrum as
$$ Y[k] = X(\alpha^{-1} k) $$
then this component frequency at the input, $\omega_0$ gets mapped to $\alpha \omega_0$. and the current frame output (before the phase adjustment) is
$$\begin{align}
y[n] &= \cos(\alpha\omega_0 n + \theta_0) \\
&= \tfrac12\big(e^{j (\alpha\omega_0 n + \theta_0)} + e^{-j (\alpha\omega_0 n + \theta_0)} \big) \\
&= \tfrac12\big(e^{j\theta_0}e^{j \alpha\omega_0 n} + e^{-j\theta_0}e^{-j \alpha\omega_0 n} \big) \\
\end{align}$$
and the positive frequency component is
$$ \hat{y}[n] = e^{j\theta_0}e^{j \alpha \omega_0 n} $$
and the value at the left edge of the current frame where it is joined to the right edge of the previous frame
$$ \hat{y}[-\tfrac{N}{4}] = e^{j\theta_0}e^{-j \alpha \omega_0 \frac{N}{4}} $$
now the instantaneous angle of the output sinusoid at the center of the previous frame is the same instantaneous angle of the input sinuosoid at the center of the previous frame which is $\theta_0 - \tfrac{N}{2} \omega_0$. this makes the output sinuosoid (the positive frequency component) of the previous frame, when "justified" or expressed in terms of the time index of the current frame
$$ \hat{y}[n] = e^{j(\theta_0 - \frac{N}{2}\omega_0)}e^{j \alpha \omega_0 (n + \frac{N}{2})} $$
(note that when $n=-\tfrac{N}{2}$, which is the center of the previous frame, the instantaneous angle is $\theta_0 - \tfrac{N}{2} \omega_0$.) now at the output sample midway between the previous frame and the current frame (which is $n=-\tfrac{N}{4}$), the sinusoid of the previous frame is at angle
$$\theta_0 - \tfrac{N}{2}\omega_0 + \alpha \omega_0 (-\tfrac{N}{4} + \tfrac{N}{2})
= \theta_0 + (\alpha - 2)\omega_0\tfrac{N}{4}$$
but, at that same sample, the angle of the sinusoid of the current frame is
$$ \theta_0 - \alpha \omega_0 \tfrac{N}{4} $$
note that when there is no pitch shift and $\alpha = 1$, then the two angles are the same and the splice is seamless. but when $\alpha \ne 1$, the phase of one of the frames must be adjusted to make the splice seamless. the previous frame is already a "done deal" and will not be modified, so it's the phase of the sinusoid of the current from that must be adjusted to make the splice seamless. that phase is modified for all DFT bins associated with this particular sinusoid as
$$\begin{align}
\theta_0 - \alpha \omega_0 \tfrac{N}{4} + \phi[k] &= \theta_0 + (\alpha - 2)\omega_0\tfrac{N}{4} \\
-\alpha \omega_0 \tfrac{N}{4} + \phi[k] &= (\alpha - 2)\omega_0\tfrac{N}{4} \\
\phi[k] &= (\alpha - 1)\omega_0\tfrac{N}{2} \\
\end{align}$$
so the phase adjustment for the current frame necessary to align it with the previous frame is the normalized angular frequency of the sinusoid, $\omega_0$, times the frame hop displacement $\tfrac{N}{2}$ times the factor $(\alpha - 1)$. you should apply same phase adjustment to all DFT bins of $Y[k]$ that are associated with this particular sinusoidal component at frequency $\omega_0$. the values of $k$ will be around $\tfrac{\omega_0}{2 \pi} N$.
so Miller Puckette says
$$\phi[k] = (\alpha - 1) \omega_0 \tfrac{N}{2} \qquad \text{for } k \approx \tfrac{\omega_0}{2 \pi} N$$
and Portnoff would say that the adjustment would be
$$\phi[k] = (\alpha - 1) \tfrac{2 \pi k}{N} \tfrac{N}{2} = (\alpha - 1) \pi k$$
but, in my opinion, Puckette is correct and Portnoff is wrong.