How to best modify FFT bin amplitudes before IFFT (DFT, windowing)?

26 views (last 30 days)
I wish to do the following:
Read a mono 44.1kHz audio file.
Chop this audio in short overlapping (windowed?) segments.
Do FFT on these segments.
Read best as possible the amplitudes of the frequency bins.
Modify some of the amplitudes of some of these frequency bins (based on an algorithm I wrote).
With IFFT reconstruct the audio segments with these modified amplitudes of some of these frequency bins.
Stich together these audio segments to get an audio file which has the modifies amplitudes at certain frequencies at certain points in time with minimal side effects.
Now I'm mostly just beginning with Matlab and am looking for any relevant examples from which I can learn on how to do the above.
Also, some things are not yet clear to me regarding windowing and FFT.
For windowing. Am I correct in thinking that for the above example I can best window and overlap the short segments in such a way that by simply adding the windowed overlapping segments I get the original audio again? So for instance if I use triangular windowing with 50% overlap on both sides, that I will get the original audio back once I stitch these segments together again? Are there other windows that will work in this way? (for instance Hann?) Or am I altogether thinking wrong on how to best use windowing for what I want to do?
For FFT. I understand that the first half of the resulting frequency bins are the bins with the relevant amplitudes (for FFT length of 512, bins 0 to 255 represent the relevant frequencies and contain their amplitudes, bin 256 contains the nyquist if I understood correctly). The second half of the bins (257 to 512), can I just ignore those when modifying the amplitude of the first half? For instance if I have a 1kHz sine wave, do the FFT, modify the amplitude of the bin that contains the 1kHz tone by dividing the amplitude in half, then do an IFFT. Will the endresult be that 1kHz sine reduced in amplitude by 6dB or am I missing something?
Many thanks for any help / pointers!

Accepted Answer

William Rose
William Rose on 20 Sep 2021
You say "For FFT. I understand that the first half of the resulting frequency bins are the bins with the relevant amplitudes (for FFT length of 512, bins 0 to 255 represent the relevant frequencies and contain their amplitudes, bin 256 contains the nyquist if I understood correctly)."
That is not correct. For the FFT of a 512 point long segment, bin 0 is the scaled mean value of the signal. Its imaginary part will always be zero if the original signal is real. Bins 1-255 are the complex numbers representing half of the FFT. Let's call it the bottom half. We could also call it the positive frequency part of the FFT. Bin 256 contains the scaled amplitude of the component sinusoid at the Nyquist freuency (). Its imaginary part will always be = 0, for any FFT with an even number of samples. Bins 257-511 are the other half ("top half", or negative frequency part) of the FFT. If the original signal is real, and they are, then the top half values will be the complex conjugates of the values in bins 1-255, where bin 257=conj(bin255), bin 258=conj(bin(254), ..., bin 511=conj(bin 1). Whtavever you do on the "low half" you must also do to the corresponding element on the "top half". Before you do the inverse FFT, be sure that the top half of the modified FFT is the complex conjugate of the flipped-around bottom half. If that is not true, then you will get complex numbers for the inverse FFT, and that indicates an error.
The other part of your question is: May I segment the signal, do FFTs, manipulate the FFTs, invert the manipulated FFTs, and paste the results back together, to get a signal whose frequencies have been "shaped", as if with a grpahic equalizer? The answer is you may, but you will probably end up with glitches at the segment boundaries. Initially, the signal is smooth across the segment boundaries. If you do an FFT and inverse FFT of each segment, without mean or trend removal, and without any frequency adjustments, you can paste the inverse FFT segments together and get back the original signal exactly. But if you do mean or trend removal or other adjustment of particular frequencies, then the pasted-together signal will have glitches, or discontinuities, at the segment boundaries. This is true for bothe overlapping and non-overlapping segmentation.
Another way of understanding the issue is that the sampling of the signal in the frequency domain is different with segmented signals than with the original signal. You lose samples of the "in-between" frequencies, including the lowest frequencies. Example: Suppose the original signal is sampled at Fs=1000 Hz, for N=1000 samples. Then the frequencies of the FFT are 0, 0.001, 0.002, ..., 0.498, 0.499, 0.500 Hz. Now I divide it into 10 segments of duration Nseg=100 points each. The frequencies of the FFT of each segment are 0, 0.010, 0.020, ..., 0.480, 0.490, 0.500 Hz.
  14 Comments
William Rose
William Rose on 24 Sep 2021
If fs=sampling rate in Hz, and N=number of samples in signal x(i), and y=fft(x), then y is a vector of complex numbers with N elements. The vector of frequencies corresponing to the elements of y is
f=fs*(0:N-1)/N;
About half the frequencies in vector f are higher than the Nyquist frequency (). Those are the "top half" frequencies of the fft. An alternate name for Nyquist frequency is "folding frequency", since the spectrum above is the folded-over copy the spectrum from 0 to .
Pythagorean
Pythagorean on 24 Sep 2021
@William Rose Ah yes thanks. I understand how to plot the right frequencies but this isn't relevant to my plugin so I was lazy with the plots.
Still experimenting with different ways of tapering / windowing regarding precise frequency resolution vs spectral leakage. And the amount of resolution I actually need for my algorithm to work best.
The fft I'm doing on the band outputs of a linear phase perfect reconstruction filter bank (already made this in Matlab). So I can do shorter fft's on the higher frequency bands and longer fft's on the lower frequency bands. The endresult should be good enough frequency resolution and good enough time resolution. Trying to find the optimal balance for audio processing results.

Sign in to comment.

More Answers (0)

Categories

Find more on Measurements and Spatial Audio in Help Center and File Exchange

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!