Which (Fourier transform?) processing method to use for time variant audio processing? (newbie)

Question

Edit: I've since been able to understand the how the relvant parts of Fourier transform work and how they relate to my problem / what I want to do. And have come to the conclusion that Fourier transform is not the best suited to my problem and that I have an alternative way to do what I want that is much simpler to control / behave in the way I want in practice.

Since I'm new here I don't know what's the right thing to do with my question on this site. Just in case anybody was following this / was preparing an answer, I'm leaving this question up for one more day with this edit and after that I'll delete the question.

I've devloped a basic algorithm for processing audio. I have some programming experience but none with Fourier transforms (but will dedicate all of 2020 to learn this full time if need be). My question relates to how best to program my algorithm in practice.

What I need to do is the following:

Analyze the frequency content at exact locations of a 44.1kHz PCM audio stream / file.
Detect and modify the amplitude of specific frequencies at specific time locations (I do not need to modify phase or have knowledge of phase to do this).
Then turn this modified version back into audio without audible artifacts from processing errors/limitations.

As I said, I currenty do not know much about Fourier transform. So far my understanding is that an FFT (with windowing) will allow me to analyze the frequency spectrum of a block, detect and modify the amplitude of specific frequencies. But not modify the amplitude of specific frequencies at a specific time in that block and not the same frequency at a different time in that same block? For instance if for 44.1kHz sample rate and block size 8192 as far as I understand it normal FFT will not allow me to detect for instance two seperate 10kHz transients in that block as seperate and modify only one of them?

If my above understanding is correct. Can anybody suggest proper (Fourier transform like?) processing method(s) to use / investigate to achieve my goal?

I've throught of a rough way to do what I want with FFT/IFFT by dividing the frequency spectrum (5Hz to Nyquist for instance) into seperate octave bands all processed seperately, for instance first band 11.025kHz to 22.05kHz under 44.1kHz sample rate and block size 4 for instance not sure of the right block size just an estimate (so I get the transients in one block and still sufficient frequency resolution for that frequency band), then 5.5125kHz to 11.025kHz band under 22.05kHz sample rate (downsampled) also with block size 4 as an estimate, etc with a total of 12 bands down to about 5Hz for instance. But this will be a lot of work and will take me a lot of time to develop and surely someone must have done something similar and smarter a long time ago to do the type of processing I wish to do :) If anybody can point me in the right direction (and perhaps if I got some assumptions wrong correct me) I will be very thankful!

Some questions before answering:

Does it have to be real time?

You mention in the beginning that you have developed a basic algorithm. What is

the basic algorithm you have developed? Can you describe it?

Is it mandatory to be FFT-based?

Finally, what exactly do you want to do? I mean, what is the purpose of modifying the amplitudes? What is the target application (let's say)? — GKH, Jan 04 '20 at 22:31
Hello @GKH

In the end I do want it to be real time (though a large latency is acceptable), it may be processor intensive. If it takes a whole computer or a seperate powerful DSP unit to run it that's ok.

I cannot describe the algorithm in detail as it is something new. If I describe it there's a good chance someone else will write it before me. I'm certain though that the algorithm itself is correct.

No, it does not have to be FFT based as long as I can modify the amplitude of specific frequencies at specific points in time. — Pythagorean, Jan 05 '20 at 14:14
@GKH 4. Part of my algorithm needs to 'substract' certain audio from one file from another. If a certain frequency is in audio file 1 and at the same time in audio file 2 I need to reduce the volume of that frequency in audio file 1 by the amount of that frequency being present in audio file 2. This is regardless of phase of those frequencies, and if the specific frequencies are (individually) louder in audio file 2 than in audio file 1 then they should be silent (not reverse phase or anything). — Pythagorean, Jan 05 '20 at 14:22
@GHK Think for instance two independent channels of pink noise. If channel one and channel two have an equal frequency at the same time I want to be able to reduce the amplitude of this in channel 1 by the amount it is present in channel 2. I think to be able to do this there's a balance between time resolution and frequency resolution (and possibly amplitude detection accuracy)? I think I will need it to have fairly high time resolution and only moderate frequency resolution? (maybe even as wide as a quarter octave would do?) I do really need the artifacts of the processing to be inaudible. — Pythagorean, Jan 05 '20 at 14:27
@GHK Nevermind my comment "(maybe even as wide as a quarter octave would do?)" that doesn't make sense. I have no idea yet what frequency resolution I would need or how FFT handles this. Thinking about it, for short transients the frequency resolution wouldn't need to be small but for long sounds the resolution would need to be small as the ear/brain can then detect them as individual frequencies? — Pythagorean, Jan 05 '20 at 14:42
how are you expecting to implement your idea? in what programming language? — robert bristow-johnson, Jan 08 '20 at 03:01
@robertbristow-johnson I'm not sure yet which language. Probably C/C++, maybe I'll start in the Juce environment for ease of use (but no idea yet). I haven't done any programming in a long time (used to have an internet business 15 years ago) and first time I'm programming for audio now so I'm orienting myself. After succes I'll likely want to make a version that runs on some DSP as well as a standalone commercial product but that's for later. — Pythagorean, Jan 08 '20 at 03:32
well JUCE is not free. when i used JUCE, alls i used was the methods for getting audio from a file or from a stream, and then i used the class AudioSampleBuffer(). other than that, all of the code was my own C code contained in a C++ method. regarding the Phase Vocoder, i have some MATLAB code, but no C/C++ code for the Phase Vocoder and STFT. about 18 years ago i implemented this in C, but it was for a company of which i was an employee. hence the company owns the C code, not me. but i own my MATLAB code. — robert bristow-johnson, Jan 08 '20 at 03:59
@robertbristow-johnson Ah Matlab yes I was looking at that too. Maybe that would be the best environment for me to develope this. Matlab seems to help to export to DSP code too and to Juce. I do not mind paying some for software (the home version of Matlab that is). I'll be spending a lot of time developing this into a good commercial product if the result is what I hope for. Should I not get success with the classic / channel vocoder method (though I'm pretty sure now it will work perfectly for what I'm doing) I will try to bug you for STFT code ;-) — Pythagorean, Jan 08 '20 at 04:34
well, i can send you MATLAB code that does a phase vocoder (with mods) that does time-scaling, but it won't export easily to C++ using JUCE or not. You will have to rewrite it. — robert bristow-johnson, Jan 08 '20 at 16:52
Thank you for the offer/help but I'm going with a perfect reconstruction FIR filterbank. Should this somehow not work out (though I don't see why it wouldn't) then I'll contact you again :) — Pythagorean, Jan 08 '20 at 17:26

score 0 · Accepted Answer · answered Jan 07 '20 at 23:00

0

I believe the answer to your question is Short-time Fourier Transform or Short-term Fourier Transform. There is the wikipedia article. I tried to show the essential math in this answer.

Both the Phase-Vocoder and Sinusoidal Modeling are done with the STFT of some form or another. The two methods sorta merge in concept at the STFT level. Remember that the analysis window for STFT need not be the same as the synthesis window in constructing the processed audio for output.

answered Jan 07 '20 at 23:00

robert bristow-johnson

20,661
4
38
76

Thank you for your answer! Did you see the edit on the top of my question? I have since found an answer suited to my specific application that seems best suited to the problem for me (audio). Fourier transform based solutions are not so well suited naturally and means I must fight things like spectral leakage and do strange things to get the right frequency bandwidth for detection (need it to be very large I realized) etc etc. I will simply use a FIR filterbank and split the signal into many bands and detect the amplitude of them and modify the amplitude of the other signal based on this. – Pythagorean Jan 08 '20 at 03:13
That way I have precise control over the bandwidth of the detection and can make it fit the more logarithmic way our ears hear instead of the linear resolution of Fourier based solutions. I do not need all that detection resolution in the treble (which comes at a great cost in other areas) I realized. Will delete the question tomorrow as I made it a bit of a newbie mess sorry. But thank you again very much for your answer! – Pythagorean Jan 08 '20 at 03:19
But you are right, my solution is exactly the same as how a classic vocoder works! :) Since I do not need to do anything with phase for my particular algorithm I think I'm best served to do it in this way instead of in a Fourier transform related way. – Pythagorean Jan 08 '20 at 03:24
//" I will simply use a FIR filterbank and split the signal into many bands and detect the amplitude of them and modify the amplitude of the other signal based on this."// ----- That sounds like what we used to call a channel vocoder. – robert bristow-johnson Jan 08 '20 at 04:02

Which (Fourier transform?) processing method to use for time variant audio processing? (newbie)

1 Answers1