Why are 8 STFT vectors used for the predictor input, in the "Denoise Speech Using Deep Learning Networks" example ?

2 views (last 30 days)
Daniel Graham
Daniel Graham on 23 Aug 2021
Answered: Sahil Jain on 1 Sep 2021
In the MATLAB example of denoising speech with deep learning, I have a hard time in grasping why they used 8 STFT segments for their predictor input.
it's been stated and underlined in this section;
Please does anyone get it?

Answers (1)

Sahil Jain
Sahil Jain on 1 Sep 2021
Hi Daniel. The example states "The predictor input consists of 8 consecutive noisy STFT vectors, so that each STFT output estimate is computed based on the current noisy STFT and the 7 previous noisy STFT vectors". This may have been done because the authors of this approach believe that taking into account the noisy STFT vectors of the current segment and the noisy STFT vectors of the previous 7 segments would lead to better performance. I would suggest going through the research articles mentioned in the references at the end of the example to further understand the motivation for doing this. Also, you can try training the network using only the current segment as input and see how it performs in comparison to using 8 segments.

Products


Release

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!