from the POV of sinusoidal modeling or identifying sinusoids, two very good basic reasons why a Gaussian window is good are:
- The Fourier Transform of a Gaussian is a Gaussian. (and Gaussians have essentially no side lobes.)
$$ \mathscr{F} \{ e^{-\pi t^2} \} = e^{-\pi f^2} $$
- The Gaussian function is just like the linearly-swept chirp, except for an imaginary unit, so they share lotsa stuff in common and can be modeled together elegantly in the math.
just FYI, the exact definition of (unitary) Fourier Transform used is:
$$ X(f) \triangleq \mathscr{F} \{ x(t) \} = \int\limits_{-\infty}^{\infty} x(t) \, e^{-j 2 \pi f t} \ dt $$
and inverse Fourier Transform:
$$ x(t) \triangleq \mathscr{F}^{-1} \{ X(f) \} = \int\limits_{-\infty}^{\infty} X(f) \, e^{+j 2 \pi f t} \ df $$
i am taking advantage of the symmetry of the forward and inverse Fourier Transforms. they are identical except on replaces $-j$ for $j$, but $-j$ and $j$ are qualitatively identical. they both have equal claim to squaring to be $-1$ and to call themselves "the imaginary unit".
this old paper of mine spells out some of this but i might add to this answer a little mathematical expression to the reasons 1 and 2.
Gaussian window of width $\sqrt{\frac{1}{\alpha}}$ :
$$ \mathscr{F} \{ e^{-\pi \alpha t^2} \} = \sqrt{\frac{1}{\alpha}} e^{-(\pi/\alpha) f^2} $$
we gotta restrict $\alpha > 0$.
Linear-swept chirp with sweep rate of $\beta$ :
$$ \mathscr{F} \{ e^{j \pi \beta t^2} \} = \sqrt{\frac{j}{\beta}} e^{-j (\pi/\beta) f^2} $$
so here's a linearly-swept chirp windowed with a Gaussian window:
$$\begin{align}
\mathscr{F} \{ e^{-\pi \alpha t^2} e^{j \pi \beta t^2} \} &= \mathscr{F} \{ e^{-\pi (\alpha - j \beta) t^2} \} \\
\\
&= \sqrt{\tfrac{1}{\alpha - j \beta}} e^{-\pi \frac{1}{\alpha - j \beta} f^2} \\
\\
&= \sqrt{\tfrac{\alpha + j \beta}{\alpha^2 + \beta^2}} e^{-\pi \frac{\alpha + j \beta}{\alpha^2 + \beta^2} f^2} \\
\end{align}$$
now here's a linearly-swept chirp windowed with a Gaussian window that has, in the center of the window a specific frequency $f_0$ for the sinusoid:
$$\begin{align}
\mathscr{F} \{ e^{-\pi \alpha t^2} e^{j \pi \beta t^2} e^{j 2 \pi f_0 t} \} &= \mathscr{F} \{ e^{-\pi \alpha t^2} e^{j \pi \beta t^2} \} \Bigg|_{f \leftarrow f-f_0} \\
\\
&= \sqrt{\tfrac{\alpha + j \beta}{\alpha^2 + \beta^2}} e^{-\pi \frac{\alpha + j \beta}{\alpha^2 + \beta^2} (f-f_0)^2} \\
\end{align}$$
finally we can generalize it a little more by adding a ramp in the amplitude in addition to linearly swept frequency. we can think of it as sorta a linear ramp, but we're really gonna use an exponential ramp because it makes the math do much easier.
$$ 1 + 2 \pi \lambda t \ \approx \ e^{2 \pi \lambda t} \qquad \text{for } |\lambda t| \ll 1 $$
$$\begin{align}
\mathscr{F} \{ e^{-\pi \alpha t^2} e^{j \pi \beta t^2} e^{j 2 \pi f_0 t} e^{2 \pi \lambda t}\} &= \mathscr{F} \{ e^{-\pi \alpha t^2} e^{j \pi \beta t^2} e^{j 2 \pi (f_0 - j \lambda) t} \} \\
\\
&= \mathscr{F} \{ e^{-\pi \alpha t^2} e^{j \pi \beta t^2} \} \Bigg|_{f \leftarrow f-(f_0-j\lambda)} \\
\\
&= \sqrt{\tfrac{\alpha + j \beta}{\alpha^2 + \beta^2}} e^{-\pi \frac{\alpha + j \beta}{\alpha^2 + \beta^2} (f-f_0 + j\lambda)^2} \\
\\
&= \sqrt{\tfrac{\alpha + j \beta}{\alpha^2 + \beta^2}} e^{-\pi \frac{\alpha + j \beta}{\alpha^2 + \beta^2} (f-f_0)^2} e^{\pi \frac{\alpha + j \beta}{\alpha^2 + \beta^2} \lambda (\lambda - j 2 (f-f_0) )} \\
\end{align}$$
so, if you use a Gaussian window, you can model each sinusoidal component with frequency $f_0$ and sweep rate of $\beta$ and ramp rate of $2 \pi \lambda$. and you have a function of the very same form in the frequency domain. the paper pointed to above says how you can extract $f_0$ and $\beta$ and $\lambda$ out of the $\log(\cdot)$ of each Gaussian lobe in the frequency domain data.
this is why you might consider using the Gaussian window with the Short Time Fourier Transform.
what's really tits is that this is basically true for any exponential raised to a quadratic power:
$$ \mathscr{F} \{ e^{a t^2 + b t + c} \} = e^{A f^2 + B f + C} $$
where $A, B, C$ are constants that are some deterministic functions of $a, b, c$.