My question has to do with the difference between the frequencies of a single note, and the frequencies of an entire song.
If I have a 5 second signal of the form:
$x(t)=\sin(8\pi t)$, here is the frequency response with zero-padding:

For a signal of the form $x(t)=\sin(8\pi t)\sin^2(\pi t/5)$, here is how this looks:
Here is the frequency response with zero-padding:
My intuition about the second signal is that it is a 4Hz tone that gets louder and then quieter again. In both cases, the highest frequency contained in the signal is higher than 4Hz, even though it was a 4Hz tone. This indicates to me that the frequencies that we hear are not the same as the frequencies in the Fourier transform. This further indicates that just because the highest note of a song may be at 16kHz does not mean that the bandwidth of the song is at -16kHz to +16kHz and that 32kHz sampling rate is sufficient. Music is typically recorded at 44.1kHz, but is the bandwidth of a song really -22.05kHz to +22.05kHz, even if every individual note is? I took the FFT of Bad Habits by Ed Sheeran out of curiosity. The highest frequency component actually appears to be at 22.05 kHz. Doesn't this indicate that if we had a higher sampling rate than 44.1kHz, we would have seen higher frequency components of the music? In other words, the FFT looks like higher frequency components got "cut off" by a low sampling rate.
My second question is about understanding how a note played during a song will affect the FT of the song. Without zero-padding, the first signal is purely 4Hz and the second signal has nonzero components at 4Hz and the two adjacent bins. With zero-padding, it appears there are a large number of nonzero bins in each, and the second signal actually appears as the first with sidelobe suppression. This seems significant to me because if an 8kHz tone, played for one second, appears in a 3 minute song, it would not affect the FT of the song by adding a pure 8kHz tone in. I think it would appear as an 8kHz tone of one second duration, zero-padded to a 3 minute duration (and therefore including the sidelobes), since the sidelobes are important to destructively interfere with the note outside of the timeframe it is supposed to be played at. Is this correct?
Edit: I just remembered something probably critical. Any signal of finite time necessarily has infinite bandwidth. If the highest tone in a song is 16kHz, then the highest frequency component of the whole song would be a "smeared" 16kHz, and some sidelobes will be cut off when sampling at 44.1kHz. Therefore the DFT is lossy. Part of my confusion is probably because I read elsewhere on the internet that the DFT is lossless, but I am thinking now that must be wrong since all real signals are of finite time/infinite bandwidth, therefore all real signals must have an infinite sampling rate to be truly lossless. Is this correct?
Edit #2: Envidia pointed out that I had forgotten to fftshift Bad Habits. It definitely looks better now.




