Dynamic Time Warping as a classifier, a good idea??

5 views (last 30 days)
ahmed elmehdawi
ahmed elmehdawi on 27 May 2016
Answered: Bert on 4 Oct 2016
Hi everyone
Before you start reading please forgive me for the bad English, thanks.
I am in my final year in computer engineering course in Libya.
my graduation project name is "Speech Recognition System for isolated words using classifier fusion method". the basic idea of the project is, I input a 1sec recording of a number (0-9), and it gets displayed on the screen as text. My steps are:
* Input the word .
* Pre-processing of the speech signal.
* Extract features using Mel Frequency Cepstral Coefficients.
* classify the word using:
* MED Classifier.
* Dynamic Time Warping Classifier .
* Bayes Classifier .
* Classifier Fusion: Combination of the above classifiers, hoping to compensate for weak
classier performance.
So after I used MFCC and extracted my features , I used the MED just to have a look at the whole ASR system a visualize how it should work. Then I started with the DTW classifier, and to be honest I am not sure I am doing it right, so here is the code and if anyone ever used DTW as a classifier before please tell me is it a good idea using DTW, and if so, am I doing it right???
test.mat has two variables in it 'm' is the spoken word of the number one, 'b' is the spoken word of the number one also but every one was recorded alone, i will then keep 'm', and compare it to the recorded word two, the cost of 1vs1 must be smaller then 1vs2, but not in my case, why is that????
clear;
load('test.mat')
b=m;
m=b;
dis=zeros(length(m),length(b));
ac_cost=zeros(length(m),length(b));
cost=0;
p=[];
%we create the distance matrix by calculating the Eucliden distance between
%all pairs
for i = 1 : length(m)
for j = 1 : length(b)
dis(i,j)=(b(j)-m(i))^2;
end
end
ac_cost(1,1)=dis(1,1);
%calculate first row
for i = 2 : length(b)
ac_cost(1,i)=dis(1,i)+ac_cost(1,i-1);
end
%calculate first coulmn
for i = 2 : length(m)
ac_cost(i,1)=dis(i,1)+ac_cost(i-1,1);
end
%calculate the rest of the matrix
for i = 2 : length(m)
for j = 2 : length(b)
ac_cost(i,j)=min([ac_cost(i-1,j-1),ac_cost(i-1,j),ac_cost(i,j-1)])+dis(i,j);
end
end
%find the best path
i=length(m)
j=length(b)
cost=cost+dis(i,j)+dis(1,1)
while i>1 && j>1
cost=cost+min([dis(i-1, j-1), dis(i-1, j), dis(i, j-1)]);
if i==1
j=j-1;
elseif j==1
i=i-1;
else
if ac_cost(i-1,j)==min([ac_cost(i-1, j-1), ac_cost(i-1, j), ac_cost(i, j-1)])
i=i-1;
elseif ac_cost(i,j-1)==min([ac_cost(i-1, j-1), ac_cost(i-1, j), ac_cost(i, j-1)])
j=j-1;
else
i=i-1;
j=j-1;
end
end
end
Thank you all in advance

Answers (1)

Bert
Bert on 4 Oct 2016
hi,
first, your implementation seems ok on first glance. To be sure, in the latest versions of matlab DTW is implemented so you do not have to write this yourself (or can be downloaded at: https://nl.mathworks.com/matlabcentral/fileexchange/43156-dynamic-time-warping--dtw-)
secondly, DTW is an quite old approach to speech recognition. But for a limited number of words it might work. The biggest downfall is that you only use one example for each word. So if this example is spoken by one person, DTW might not recognize it when spoken by another person (or even by the same person using a different accentuation)... For that reason (if applicable) you might want to have a look at model based approaches such as hidden marcov models, but these are immensely more complex.
Hope this helped

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!