how to evaluate my result knn code using confusion matrix

sir, although I can't replace the old data, I have tried several times to use the App toolbox (ClassificactionLearner) to evaluate the confusion matrix using the old data but I always fail a month ago already. can you help me in this direction through a code to determine this confusion matrix in order to know the well classified data and the badly classified data? thank you for your continued support
attached my code
thank you!!

 Accepted Answer

Rajeev
Rajeev on 16 Jan 2023
Edited: Rajeev on 16 Jan 2023
There are two requirements for the "confusionmat" function that are not being fullfilled in your matlab script:
  • The type of the input must be vectors or character matrices. Your inputs are of the type double. You can use the "num2str" function to convert the double type to char array. For example
char_test_Coords = num2str(test_Coords)
  • The size of both the character array inputs should be the same. In your script, the length of train_Coords is 120 and the length of test_Coords is 30.
Making the above changes should fix the problem.

38 Comments

hanks for your help and clarification on the script.
I read well, but I don't understand well (excuse me, I'm still learning machine learning).
Regarding the size of the input vectors, I partitioned my data into training data (80%) and test data (20%). does this mean that the confusion matrix does not take into account the two types of data?
please sir, can you explain me more? Thank you and see you soon
No worries.
As you have mentioned, the data has been partitioned into training(80%) and testing(20%) sets.
Ideally, the next step is to train you model on the training data. Once trained, you test your model using the rest of the data i.e. the test data (20%).
Let us take an example to understand it better, I have taken this image from the website https://builtin.com/data-science/train-test-split.
For this example, I will use the variables names as given in the image below. The steps one should follow are:
  1. Train your model using the data 'X_train' and 'y_train'. In your case, the (X_train, y_train) is 80% of the total data.
  2. Predict the results of 'X_test' using the model trained in step 1. Store the results in 'y_test_pred'.
  3. Pass 'y_test_pred' and 'y_test' in the confusionmat function. In this case, both 'y_test_pred' and 'y_test 'are of the same dimension.
The results that you obtain by running your model on the test data and the original result that you already have in your test data are given as inputs to the 'confusionmat' function.
NOTE: confusionmat works only for classification problems.
Hi mister!
it's really my concern is fading little by little given your patience with me despite my quality of apprentice. thank you so much.
however, I think in my code I have already done steps 1 and 2 as mentioned. please I beg you can you take a look at my code below to check if I have trained my data and stored it? here is my code attached
please forgive me for disturbing you at any time, I'm new to ML, I have to use it to solve my problem. thank you
Hi, I went through your code and I think there are some pieces of code that needs modification. Here are my suggestions:
You are trying to sort the array based on distances. Correct me if I am wrong, are you looking for the top n elements with the least distance along with their classes? If so, then you can replace these lines in your code
distanceofIndex=[];
temp=0;
gemp=0;
for i=1:length(distanceofIndex)
for j=1:(length(distanceofIndex)-i)
if(distanceofIndex(j)>distanceofIndex(j+1))
temp=distanceofIndex(j);
distanceofIndex(j)=distanceofIndex(j+1);
distance(j+1)=temp;
gemp=trainClass(j);
trainClass(j)=trainClass(j+1);
trainClass(j+1)=gemp;
end
end
end
%4.take first k element from the c array now
k=5;
classy=[];
for i=1:k
classy=[classy trainClass(i)];
end
with
distanceofIndex = distancesOfTheIndexes(:,1);
pred_classes = trainClass(indexes(:,1));
dist_class_aug_matrix = [distanceofIndex, pred_classes'];
dist_class_aug_matrix = sortrows(dist_class_aug_matrix);
% to get the top n elements from the sorted c array
n = 7 % for example
classy = dist_class_aug_matrix(1:n,2)
Now for the part where you are trying to calculate the confusion matrix, you should feed the classes to the confusionmat function instead of the coordinates. Refer to the documentation for more Compute confusion matrix for classification problem - MATLAB confusionmat (mathworks.com).
But it seems like you do not have the classes for the test_Coords data. The size of the trainClass vector is 120 instead of 150. If you had the classes for train_Coords as well, then you could simply pass these classes with the prediceted one (pred_classes) to get the desired confusion matrix.
Hello my dear ! your explanation is masterful and educational, thank you very much. I read your suggestion with joy and I fully approve. however I tried to replace the piece of code you suggested, but here is the message that appears! please can you help me? apologize for the inconvenience. I need it to keep moving forward and above all, understand me, I'm still learning. thank you and see you soon my dear
You are getting the error because of the line number 60.
That line is not required and is incorrect as well. Removing that line will fix the error.
If you look at the error, it says that the variable 'distanceofIndex' is undefined. This means that you have not declared this variable yet and are assigning it to some other variable. Line number 61 declares and initialize the variable in one go.
The knnsearch function returns you the index and distance to the coordinate of train_Coords that is closest to your input coordinate test_Coords. Since there are 5 classes, the output contains the coordinates for closest distances to each of the class. But only the first column of the output is relevant because the rows of the output matrix are sorted in ascending order. That is, the first column contains the distances to the nearest point of the input coordinates and consequently. Using the indexes of the first column and the trainClass vector gives us the classes of the input data.
thank you for your prompt response!
in fact I inserted line number 60 because the same error message was displayed with line number 61. I deleted this line currently, but the same error message appears, really I m sorry to bother you
thank you for your helping
Hi, thanks for pointing it out. I have edited the code to remove the error.
I forgot to take the transpose of the pred_classes matrix.
From what level did you do it ? please
I have changed the third line of the code snippet from
dist_class_aug_matrix = [distanceofIndex, pred_classes];
to
dist_class_aug_matrix = [distanceofIndex, pred_classes'];
To get the transpose of a matrix in MATLAB, apostrophe is added at the end of the name of the matrix. a' is the transpose of a.
thank you for your prompt response sir @Rajeev
I have indeed taken into account the transpose of pred_class, but still error message. I don't understand anything sir. it's related to the fact that my R2015a version is older or you can have other alternatives
thank you again for your unwavering support . line number 63, i added apostroph at the end of pred_classes
This is because there is a mistype in the variables name.
In line 61 and 63 distancesofTheIndexe must be replaced with distancesofTheIndexes. You have missed an 's' at the end.
In line 62, replace indexe with indexes.
I am glad it worked. If you found the answer useful, you can mark it as accepted so that if others also have the same issue, they can be reassured that the answer worked for the OP.
Thank you very much @Rajeev
indeed, that is the line with your help walked.
I'm disturbing you for the rest please excuse me
for the question about the trainClass size of my code you're right it's 120 elts instead of 150, but i figured that as i partitioned the dataset into training and test data the classes should also be partitioned into trainClass and testClass (this is not the case), maybe by trying to have this, we can determine the confusion matrix.
please how to get trainClass and testClass?
thank you again for your availability and your forbearance
Since the data set is just random points in the 2-D plane, there may or may not be well defined clusters.
If I am understanding it correctly, your script is trying to assign random classes to random coordinates. Instead of doing this, what can be done is to generate data in clusters and then partition.
I have written a simple script that you can run and take reference from it. I have also attached the helper function to create coordinates.
In the scripts the coorinates are not overlaping, you can make them overlaping to get a non-ideal confusion matrix.
confusionchart is in the stats toolbox. Do you have that?
NO sir i don't have it., can you guide me please
but i tried to do it with classifcationLearner, but i can't
Classification Learner is also in the Statistics and Machine Learning Toolbox. If you don't have the toolbox, you won't even see Classification Learner, let alone try to use it. When you type
>> ver
do you see the stats toolbox listed?
I said because I did not find a toolbox named stat toolbox
however it is in the App toolbox that I find classificationLearner
There is no "App" toolbox. The App is a tab on the tool ribbon that has applets for the various toolboxes. If you have the toolbox installed, there will be applets for it listed on the Apps tab. Because you can see it there, it indicates that you have the stats toolbox installed but because you cannot run any stats functions it means you do not have a valid license for it, even though it's installed. What does this show:
hasLicenseForToolbox = license('test', 'Statistics_Toolbox'); % Check for Statistics and Machine Learning Toolbox.
which -all confusionchart % See if this function is installed.
the response is 1 does this mean that the stat toolkit is installed! I believe that is it or am i wrong
That means you have it, and it means you should have the confusionchart function. What does this show?
>> which -all confusionchart
C:\Program Files\MATLAB\R2022b\toolbox\shared\mlearnlib\confusionchart.m
C:\Program Files\MATLAB\R2022b\toolbox\stats\bigdata\@tall\confusionchart.m % tall method
'confusionchart ' not found, this is message that I get when i enter which-all confusionchart in the console
Hi all
please, can anyone help me with reviewing my code below and provide suggestions if possible?
Indeed, I want to classify five faults whose characteristics are voltage and current data. so I have 5 classes, my class data is too scattered, I don't understand
attached my code
thank you for all the effort you put in for me
The error in the code was because of the transpose on 'y_est'. Changing the line from
Cm=confusionchart(y_test, y_est');
to
Cm=confusionchart(y_test, y_est);
will solve the issue.
Also, in the plot, the cyan dots are plotting data_class_3 instead of data_class_5. This seems like a copy paste typo.
The data is scattered because of the constraints that were given to the random number generator. There is nothing wrong with the plot as it follows the contraints as expected.
What kind of plot are you expecting for your problem?
What is the actual problem statement?
thank you@Rajeev
as for 'confusionchart', it seems to me that it is not integrated in my R2015a version, since before doing the transpose of y_est I tried without transposing, but the same error message appeared.
however, you're right, that's the typing error for the cyan dots.
Thank you for your two questions, it allows me to explain my problem to you again.
I want to classify a new data in a class by majority vote after calculating the distances (it's knn I believe)
my problem is related to the detection of 5 faults on the basis of two parameters (current and voltage measurement)
your ability to detect and solve problems fascinates me sir @Rajeev, a lot of pedagogy in your approach. Thank you again for your help
Can you update the MATLAB version to R2018b or later so that you can use the 'confusionchart' function?
If, for some reason, you cannot update the MALTAB version, you can simply use the 'confusionmat' function to get the matrix as the output and view it in the command line.
The problem can be solved using knn if the data points that belongs to the same fault are in close proximity. That is, the points of the same class are near to each other.
Given that the assumption mentioned above is followed, you can simply use the script 'myknnessay' after correcting the typo.
@Rajeev thank'you again
i think i will try this version, thanks
yes I used the 'confusionmat' function, the output matrix showed up.
as an expert, you reassure me that this code is good for the detection of my 5 faults, there is no more error if I understand you correctly? now using classificationLearner can I get the same result?
i tried to plot the x_train and x_test points ,i have this image but i don't understood that
excuse i forget this question!
how to plot x_train and x_test of any data_class? i tried it but i have this error:
Subscript indices must either be real positive integers or logicals.
Error in myknnessay (line 98)
thank'you
I am not getting the error when I ran your script. Although, I did get the plot that you have attached in the comment. This plot is incorrect as
plot(x_train,'b.','linewidth',2,'MarkerSize',10)
plots all the coordinates against [1:length(x_train)]. To plot the coordinates of x_train, you need to give the data for x axis and y axis separately like:
plot(x_train(:,1),x_train(:,2),'b.')
Here is the output of test data and train data. As indicated in the figure, the test data belongs to each class of the data. The confustion chart shows that all the predictions made by 'knnsearch' are correct.
Fig 1: The original dataset as per your constraints.
Fig 2: Visual representation of split of test and train data (i.e. x_train and x_test).
Fig 3: The confusion matrix for the faults predicted by the function knnsearch.
I have also attached the file that can plot the results as indicative in the screenshots. Since, confusionchart doesn't work for you, you can simply use the confusionmat and view it in the command line window to get the matrix.
perfect thank you very much. it work! but i have a concern, i don't have the same visualization on figure1 and figure2
look at what I get with my R2015a version
figure1
figure2
The plots that you have attached does not seem to follow the constraints that you have specified before.
Can you share the script which is responsible for generating the data along with any helper function that you might have used?
I ran your script in my system and I am getting the expected output. If possible, try updating the MATLAB version and try it again.
thanks for your feedback @Rajeev
please, if i don't constrain and just use my data set is it not as valid for my fault detection. I want to see how it works without constraint please. attached my dataset
thank'you very much for your help
Upon plotting and observing your dataset, it can be seen that the datapoints that belongs to the same class are not close to each other. It is necessary for them to be in clusters for knn search algorithm to work.
Regardless, if the dataset is passed to the knnsearch function, the results are:
As, you can see, the confusion matrix now shows a lot of error/misclassification as the data is not clustered well.
I have also attached the code for your reference.
What will I say again beyond thank you, sir @Rajeev receive all my gratitude for your competence and your pedagogy in the transmission of knowledge to us who needed. you have a big heart! unfortunately for me the version R2015a I have fails to read the code you sent. however while waiting to change my R2015a version, can I continue if something still bothers me to turn to you and your team?
a concern please: is it necessary to use a threshold for fault detection with kNN? if yes how to do it?
thank you again and again for all you do for me

Sign in to comment.

More Answers (0)

Products

Release

R2015a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!