Not a veteran in Signal Processing, it would be extremely appreciated to help me understand the idea/heuristic behinde the idea in STFT that
$$\text{nfft} \ge \text{window length}$$
At least from a statistical perspective, I believe it's an overfit to the estimate of each coefficient of each frame of STFT given the following reason:
For $X_1,X_2,X_n\sim D\in [-1, 1]$, we can apply an basis approach to estimate its probability density function: $$p(x)=\sum\limits_{i=0}^{ \infty}\theta_i \phi_i(x)$$ where $\phi_i(x)$ is an orthogonal function, which satisfies: \begin{cases} \int_{-1}^{1}\phi_i(x)\phi_j(x) dx=0 & i\neq j\\ \int_{-1}^{1}\phi_i(x)\phi_j(x)dx=1 & \text{otherwise} \end{cases}
At the essence of STFT (one frame), each coefficient represents an estimation of coefficient ($\hat{\theta}_i$) of the first $\text{nfft}$ basis functions, given the $\text{window length}$ number of audio sample.
Let $N = \text{window length}$ and $\hat{p}(x)=\sum\limits_{i=0}^{\text{nfft}}\hat{\theta}_i\phi_i(x)$ where each $\hat{\theta}_i$ is estimated by $$\hat{\theta}_j = \frac{1}{N}\sum\limits_{i=1}^{N}\phi_j(X_I)$$ It can be shown that this is an unbiased estimator of $\theta_j$ and has a nice property: if choosing $\text{nfft}=C_0(N)^{1/5}$ then the convergence of the mean square error between $p(x)$ and $\hat{p}(x)$ is optimal $O(N^{-4/5})$
And surely, when
$$\text{nfft} \ge \text{window length}$$
it will be either $C_0$ is quite large (so that $\text{nfft}=\text{window length}$ actually makes sense) or we may not follow the best relationship between $\text{nfft}$ and $\text{window length}$ ($\text{nfft}=C_0(\text{window length})^{1/5}$) for the optimal estimation of $p(x)$
$\text{nfft}$, more here. Also revelant. Notenfft >= win_lenisn't a preference but necessity. – OverLordGoldDragon Nov 04 '22 at 04:59revelantpost directly to this question: any further clarification would be appreciated. – LambdaDelta34 Nov 04 '22 at 07:27