How to gender recognition with fft
Show older comments
Hello everyone!
I'm new to Matlab and for a project I need to do a "simple" exercise.
I need to record a voice and decide if it's a male or female voice.
How?
Simple:
- record a wave
- use fft on wave
- use statistics that counts the frequencies
- if there're more lower frequencies then the voice is from male
- if there're more upper frequencies then the voice is from female
(I know it's not very accurate, but this is my task)
I searched the internet for content and I was able to:
- record voice from microphone
- convert audiorecorder file to wav
- use fft on wav file
- but I don't know how to count the frequencies and decide the gender
I put a link here to the .m file where I'm at right now:
Advices, tips, codes are welcome.
Thanks!
ps: I found out that the MatLab FFT function only returns on vector of amplitudes. Is this a problem?
3 Comments
Hey! @kalsoom fatima I am currently stucked on the same project. Can you plz guide me with this. Will be very grateful if you can do anything in this regard.
kalsoom fatima
on 10 Jun 2022
Umr Nawaz
on 10 Jun 2022
@kalsoom fatima thanks for the consideration. My email address is rumarnawaz@gmail.com
Answers (3)
Star Strider
on 24 Nov 2014
2 votes
Interesting problem!
A free online article ‘Phonetic differences between male and female speech’ goes into significant detail. Also ‘The frequency range of the voice fundamental in the speech of male and female adults’ will give you some general guidance. I would certainly do a PubMed search for more information.
See the documentation for fft to understand how to calculate a frequency vector for your fft. You have to know your sampling frequency ‘Fs’, and the rest is straightforward.
If this was my project, my initial approach would use two appropriately-designed bandpass filters (Signal Processing Toolbox), and then compare the RMS values of the outputs of the respective filters to determine the gender. Your final design will depend on how robust you want your classification scheme to be.
19 Comments
Star Strider
on 24 Nov 2014
Peter’s ‘Answer’ moved here:
Thanks for the feedback, but I don't want to dive too deep into it,
I just need to determine are there more lower or upper frequencies.
I've found out that there's a 'findpeaks' function in MatLab,
but I can't get it to work just now.
Star Strider
on 24 Nov 2014
Edited: Star Strider
on 24 Nov 2014
I doubt findpeaks will be very useful with a complex speech signal.
An approach that might work (I haven’t tried this) to find an acceptable frequency to distinguish the two would be to do a progressive summation (use cumsum) separately on both male and female voice abs(fft) results, normalise them by dividing each of the male and female data by the last value (the sum, so that both sum to 1), then subtract one progressive summation curve from the other. The crossover point would provide the frequency where the cumulative energies of the two spectra differ. This works in some situations, I will leave it to you to see if it provides relevant information with your signals.
AFTERTHOUGHT —
For this to work optimally, the two signals have to be sampled at the same rate and be the same length. The lengths of the two signals will affect the length of the resulting fft, so you may want to truncate the longer signal to the length of the shorter one.
Star Strider
on 24 Nov 2014
I’m certain you can do it! I probably wasn’t as clear in my explanation as I might have been, but it’s fairly straightforward.
One other item that you will need to do is to subtract the mean of each signal (male, female) from the respective signals before you do the fft. The reason is that this eliminates the DC offset, and that will also throw off your cumulative sum calculation. I should have mentioned that earlier, but forgot to include it.
I didn’t see any attached code, but the documentation for the fft function explains how to calculate and plot the one-sided fft. (That’s the one you’ll need to do the cumulative sum as well.) Everything you need is explained in the code and text between the first two figures.
Peter
on 24 Nov 2014
Star Strider
on 24 Nov 2014
The fundamental frequency will generalise between genders reasonably well, but the resonances are due to individual variations in the vocal tract and are likely not useful for gender identification. That is the reason I suggested the cumulative sum approach.
No classification scheme is ever 100% specific and 100% sensitive, so expect random misclassifications.
Star Strider
on 26 Nov 2014
I also provided you with some literature references on male and female speech in my original answer.
I can’t get anything from that link. Use the ‘paperclip’ (or ‘staple’) icon to upload your code here. Upload your data files as well if you want me to run your code with them.
There should be several peaks corresponding to the resonances of the individual vocal tract of the person creating each record. The findpeaks function can probably handle such noisy signals, but you will have to set its parameters appropriately. This may require some experimentation. I would have to have your data to help you analyse it.
Star Strider
on 26 Nov 2014
I was hoping to have your .wav-files so that we are working on the same data. Analysing the same data is usually the best way to work on these problems.
I would also like to see the code you are using to calculate your FFTs and analyse your data.
Peter
on 26 Nov 2014
Peter
on 30 Nov 2014
Star Strider
on 30 Nov 2014
I was hoping (and waiting) for a female voice saying the same words to do the comparison with. I got some samples off the Internet and wrote the code to find the crossover, but without matching data, I’m not certain how robust it would be.
From what I’ve discovered, a couple Butterworth bandpass filters with cutoffs of about 10-200 Hz and 200-400 Hz (passbands, with some necessary overlap in the stopbands) would work, then compare the RMS values of the output from each to classify the gender. If you have the Signal Processing Toolbox or a good DSP book and a few minutes to code them, the transfer function representations are easy to create, and would probably be stable. If you don’t or don’t want to, I can calculate the coefficients (SOS representation) and send them to you. I will assume you’re using the 44100 Hz Fs you used in the ‘test_voice’ file.
Peter
on 1 Dec 2014
Star Strider
on 3 Dec 2014
I apologise for the delay. Life intrudes...
The code that calculated them:
Fs = 44100;
Fn = Fs/2;
Rp = 1;
Rs = 10;
Wpm = [10 200];
Wsm = [05 225];
[nm,Wnm] = buttord(Wpm/Fn, Wsm/Fn, Rp, Rs);
[bm,am] = butter(nm,Wnm);
[sosm,gm] = tf2sos(bm,am);
Wpf = [200 400];
Wsf = [185 425];
[nf,Wnf] = buttord(Wpf/Fn, Wsf/Fn, Rp, Rs);
[bf,af] = butter(nf,Wnf);
[sosf,gf] = tf2sos(bf,af);
fv = linspace(0, Fn, 512);
figure(1)
freqz(sosm, fv, Fs)
figure(2)
freqz(sosf, fv, Fs)
save('M-F Filter Coeffs.mat', 'Fs', 'Fn', 'bm', 'am', 'sosm', 'gm', 'bf', 'af', 'sosf', 'gf')
The file is attached.
Manora Mony
on 18 Jun 2017
can anyone send to me "code of gender identification plzzz" ?
Akshat Dashore
on 17 May 2018
Edited: Walter Roberson
on 18 May 2018
"""import matplotlib.pyplot as plt"""
from scipy.io import wavfile as wav
from scipy.fftpack import fft
import numpy as np
from scipy.io.wavfile import read
(fs,x) = read('/home/ubuntu/Downloads/4829251_male-voice-hello_by_urbazon_preview.mp3')
rate, data = wav.read('/home/ubuntu/Downloads/4829251_male-voice-hello_by_urbazon_preview.mp3')
print(x)
print(x.size)
print(fs)
fft_out = fft(data)
print(fft_out)
combined = fft(data).ravel()
print(combined)
print(combined.size)
print(sum(combined))
meanfunfreeq = sum(combined)/combined.size
print(meanfunfreeq)
"""a = sum(meanfunfreeq)/2
print(a)
"""
def voice(meanfun):
if meanfun<0.14:
return("male")
else:
return ("female")
print(voice(meanfunfreeq))
"""
plt.plot(data, np.abs(fft_out))
plt.show()"""
Akshat Dashore
on 17 May 2018
please replace file name present inside wav.read and read with other downloaded audio file
Richard Tony
on 22 May 2020
Hi, can you please tell me in your code, why do you use the value of 0.14 to compare with the meanfreq?? how did you deduce the value?
Brian Hemmat
on 12 Apr 2019
0 votes
The Audio Toolbox includes an example on gender identification using LSTM networks:
The example requires Audio Toolbox and Deep Learning Toolbox.
kalsoom fatima
on 20 Dec 2021
0 votes
hi.can i get the complete source code of your project please.
thankyou
Categories
Find more on Spectral Measurements in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

