How to gender recognition with fft

Hello everyone!
I'm new to Matlab and for a project I need to do a "simple" exercise.
I need to record a voice and decide if it's a male or female voice.
How?
Simple:
  • record a wave
  • use fft on wave
  • use statistics that counts the frequencies
  • if there're more lower frequencies then the voice is from male
  • if there're more upper frequencies then the voice is from female
(I know it's not very accurate, but this is my task)
I searched the internet for content and I was able to:
  • record voice from microphone
  • convert audiorecorder file to wav
  • use fft on wav file
  • but I don't know how to count the frequencies and decide the gender
I put a link here to the .m file where I'm at right now:
Advices, tips, codes are welcome.
Thanks!
ps: I found out that the MatLab FFT function only returns on vector of amplitudes. Is this a problem?

3 Comments

Umr Nawaz
Umr Nawaz on 9 Jun 2022
Edited: Umr Nawaz on 10 Jun 2022
Hey! @kalsoom fatima I am currently stucked on the same project. Can you plz guide me with this. Will be very grateful if you can do anything in this regard.
Hey @kalsoom fatima my gmail ID is rumarnawaz@gmail.com.
Hi! @Umr Nawaz please share your email address.Thank you
@kalsoom fatima thanks for the consideration. My email address is rumarnawaz@gmail.com

Sign in to comment.

Answers (3)

Star Strider
Star Strider on 24 Nov 2014
Interesting problem!
A free online article ‘Phonetic differences between male and female speech’ goes into significant detail. Also ‘The frequency range of the voice fundamental in the speech of male and female adults’ will give you some general guidance. I would certainly do a PubMed search for more information.
See the documentation for fft to understand how to calculate a frequency vector for your fft. You have to know your sampling frequency ‘Fs’, and the rest is straightforward.
If this was my project, my initial approach would use two appropriately-designed bandpass filters (Signal Processing Toolbox), and then compare the RMS values of the outputs of the respective filters to determine the gender. Your final design will depend on how robust you want your classification scheme to be.

19 Comments

Peter’s ‘Answer’ moved here:
Thanks for the feedback, but I don't want to dive too deep into it,
I just need to determine are there more lower or upper frequencies.
I've found out that there's a 'findpeaks' function in MatLab,
but I can't get it to work just now.
Star Strider
Star Strider on 24 Nov 2014
Edited: Star Strider on 24 Nov 2014
I doubt findpeaks will be very useful with a complex speech signal.
An approach that might work (I haven’t tried this) to find an acceptable frequency to distinguish the two would be to do a progressive summation (use cumsum) separately on both male and female voice abs(fft) results, normalise them by dividing each of the male and female data by the last value (the sum, so that both sum to 1), then subtract one progressive summation curve from the other. The crossover point would provide the frequency where the cumulative energies of the two spectra differ. This works in some situations, I will leave it to you to see if it provides relevant information with your signals.
AFTERTHOUGHT —
For this to work optimally, the two signals have to be sampled at the same rate and be the same length. The lengths of the two signals will affect the length of the resulting fft, so you may want to truncate the longer signal to the length of the shorter one.
Peter
Peter on 24 Nov 2014
Edited: Peter on 24 Nov 2014
Uhhhh, sounds awesome but I'm in doubt that I can do it with my poor math skills.
EDIT: after reading your comment again, I think I can do that!
I'll try it tomorrow.
I have one question...
In every example I see fine frequency rates, showing the 'real' frequencies after fft.
but my figure shows something like this
it's like it's mirrored by X axis. Any idea what I'm doing wrong?
(see attached code)
I’m certain you can do it! I probably wasn’t as clear in my explanation as I might have been, but it’s fairly straightforward.
One other item that you will need to do is to subtract the mean of each signal (male, female) from the respective signals before you do the fft. The reason is that this eliminates the DC offset, and that will also throw off your cumulative sum calculation. I should have mentioned that earlier, but forgot to include it.
I didn’t see any attached code, but the documentation for the fft function explains how to calculate and plot the one-sided fft. (That’s the one you’ll need to do the cumulative sum as well.) Everything you need is explained in the code and text between the first two figures.
One sided fft is now working
I'm getting more and more intrested in this. I need to check your approach on the task.
As a first step I might try that, I record only one matching word from a man and a woman and store the sound in two separate wav files, and work with that in Matlab.
After that I can experiment recording from the program directly.
The fundamental frequency will generalise between genders reasonably well, but the resonances are due to individual variations in the vocal tract and are likely not useful for gender identification. That is the reason I suggested the cumulative sum approach.
No classification scheme is ever 100% specific and 100% sensitive, so expect random misclassifications.
Peter
Peter on 26 Nov 2014
Edited: Peter on 26 Nov 2014
Okay according to the internet: "The voiced speech of a typical adult male will have a fundamental frequency from 85 to 180 Hz, and that of a typical adult female from 165 to 255 Hz."
I link you my code here: mycode
(notify me if you can't reach it)
I tried a couple things, but I don't want get really fancy with it right now. I got my frequencies on a figure showing nice results for man and woman voice.
I want to find the peak values of what is on the figure plotted out by the code, so for example: 150Hz 2*10^-3
Can you help me with that? Becouse I think I try to search in the wrong data with findpeaks
I also provided you with some literature references on male and female speech in my original answer.
I can’t get anything from that link. Use the ‘paperclip’ (or ‘staple’) icon to upload your code here. Upload your data files as well if you want me to run your code with them.
There should be several peaks corresponding to the resonances of the individual vocal tract of the person creating each record. The findpeaks function can probably handle such noisy signals, but you will have to set its parameters appropriately. This may require some experimentation. I would have to have your data to help you analyse it.
Peter
Peter on 26 Nov 2014
Edited: Peter on 26 Nov 2014
Here's the code. It's only one .m file becouse it's recording a 2sec audio from microphone.
If you don't have one, please tell and I'll attach a .wav and fix code according to that.
EDIT: wrong attachment sorry
I was hoping to have your .wav-files so that we are working on the same data. Analysing the same data is usually the best way to work on these problems.
I would also like to see the code you are using to calculate your FFTs and analyse your data.
Ok I modified it, here you go.
You need to unxip the file and set the .wav directory inside the code The commented parts are necessary if you record something from microphone.
Do you have any updates on my code?
I was hoping (and waiting) for a female voice saying the same words to do the comparison with. I got some samples off the Internet and wrote the code to find the crossover, but without matching data, I’m not certain how robust it would be.
From what I’ve discovered, a couple Butterworth bandpass filters with cutoffs of about 10-200 Hz and 200-400 Hz (passbands, with some necessary overlap in the stopbands) would work, then compare the RMS values of the output from each to classify the gender. If you have the Signal Processing Toolbox or a good DSP book and a few minutes to code them, the transfer function representations are easy to create, and would probably be stable. If you don’t or don’t want to, I can calculate the coefficients (SOS representation) and send them to you. I will assume you’re using the 44100 Hz Fs you used in the ‘test_voice’ file.
I would be thankful if you could calculate the coefficients for me
I apologise for the delay. Life intrudes...
The code that calculated them:
Fs = 44100;
Fn = Fs/2;
Rp = 1;
Rs = 10;
Wpm = [10 200];
Wsm = [05 225];
[nm,Wnm] = buttord(Wpm/Fn, Wsm/Fn, Rp, Rs);
[bm,am] = butter(nm,Wnm);
[sosm,gm] = tf2sos(bm,am);
Wpf = [200 400];
Wsf = [185 425];
[nf,Wnf] = buttord(Wpf/Fn, Wsf/Fn, Rp, Rs);
[bf,af] = butter(nf,Wnf);
[sosf,gf] = tf2sos(bf,af);
fv = linspace(0, Fn, 512);
figure(1)
freqz(sosm, fv, Fs)
figure(2)
freqz(sosf, fv, Fs)
save('M-F Filter Coeffs.mat', 'Fs', 'Fn', 'bm', 'am', 'sosm', 'gm', 'bf', 'af', 'sosf', 'gf')
The file is attached.
can anyone send to me "code of gender identification plzzz" ?
"""import matplotlib.pyplot as plt"""
from scipy.io import wavfile as wav
from scipy.fftpack import fft
import numpy as np
from scipy.io.wavfile import read
(fs,x) = read('/home/ubuntu/Downloads/4829251_male-voice-hello_by_urbazon_preview.mp3')
rate, data = wav.read('/home/ubuntu/Downloads/4829251_male-voice-hello_by_urbazon_preview.mp3')
print(x)
print(x.size)
print(fs)
fft_out = fft(data)
print(fft_out)
combined = fft(data).ravel()
print(combined)
print(combined.size)
print(sum(combined))
meanfunfreeq = sum(combined)/combined.size
print(meanfunfreeq)
"""a = sum(meanfunfreeq)/2
print(a)
"""
def voice(meanfun):
if meanfun<0.14:
return("male")
else:
return ("female")
print(voice(meanfunfreeq))
"""
plt.plot(data, np.abs(fft_out))
plt.show()"""
please replace file name present inside wav.read and read with other downloaded audio file
Hi, can you please tell me in your code, why do you use the value of 0.14 to compare with the meanfreq?? how did you deduce the value?

Sign in to comment.

Brian Hemmat
Brian Hemmat on 12 Apr 2019
The Audio Toolbox includes an example on gender identification using LSTM networks:
The example requires Audio Toolbox and Deep Learning Toolbox.
kalsoom fatima
kalsoom fatima on 20 Dec 2021
hi.can i get the complete source code of your project please.
thankyou

Asked:

on 24 Nov 2014

Edited:

on 10 Jun 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!