I need to summarize a set of samples representing an audio to achieve a smooth zoom animation for a spectrogram in my app.
In the time domain (waveform) I achieve this by resummarizing the samples each time the zoom changes, where my current summarization goes like this: I know the audio's sampling frequency and I have a target summarized samples count that depends on the zoom level, so that there will be $$\text{hop} = \frac{\#\text{\{audio samples\}}}{\#\text{\{target samples\}}} = \frac{\#\text{\{audio samples\}}}{\text{screenWidthInPt}*\text{zoom}}$$ samples of original audio for each summarized sample. Also the summarized sample $i$ is obtained through averaging the original samples in a window of a certain amount of samples, that I set to $w=128$, so that:
$$\text{Summarized}[i] = \sum\limits_{k=i*\text{hop}}^{k=i*\text{hop}+w}\text{samples}[k],\quad\quad i\:\lvert\: i*\text{hop}+w < \#\{\text{audio samples}\}$$
Now, I was wondering what operation does this correspond in the frequency domain. If this was simply a sampling without averaging over that window, this would be a multiplication for a Dirac comb of period $T = \text{hop}$ in the time domain, and therefore a convolution for a Dirac comb of period $T = \frac{1}{\text{hop}}$ in the frequency domain (less expensive than summarizing in the time domain and recomputing the DFT from scratch), but how does the averaging in the time domain affect the frequency domain? Or in other words, what does summarizing in the time domain correspond to in the frequency domain? (I'm a bit rusty as it's been years since I took my signals theory class)
EDIT: Of course I'm keeping the original audio samples in memory and the downsampling isn't done in place, but on a different buffer.
– Baffo rasta Mar 12 '23 at 16:07