Is there a way to identify which dataset a value belongs to for overlapping datasets?

I have three types of datasets. These data sets visually shows that `data a` has comparatively lower values compared to `data b` and `data c`. I used a box plot to make a comparison and it shows that they have differences but there are overlaps. I will demonstrate them in the code below:
clc; clear all; close all
load("dataset.mat")
figure
hold on
xlabel('index of points')
ylabel('data value')
plot(a,'.',DisplayName='data1')
plot(b,'.',DisplayName='data2')
plot(c,'.',DisplayName='data3')
figure;
boxplot([a b c],'Notch','on','Labels',{'data1','data2','data3'})
grid on
Now considering these data sets, I have a set of values, say [4 7 40 8 4], I want to predict which dataset these value may belong to. Is there a way to do that? Having a very basic knowledge of statistics, I cannot come up with a solution. I found one solution based on which Kernel density estimate (kde) was used for comparison. However, the data was distinctly separable. In my case, the datasets are more overlapped, is there a way to predict in this case? Forgive my very basic knowledge and suggest a solution. Will appreciate it.
Thanks in advance.
figure
hold on
[fn,xfn,bwn] = kde(a);
plot(xfn,fn)
[fn,xfn,bwn] = kde(b);
plot(xfn,fn)
[fn,xfn,bwn] = kde(c);
plot(xfn,fn)

2 Comments

You might look into logistic regression and discriminant function analysis. These are both techniques for predicting category membership.
Thank you for the idea. I am looking into these.

Sign in to comment.

 Accepted Answer

websave("dataset.mat", "https://www.mathworks.com/matlabcentral/answers/uploaded_files/1650591/dataset.mat")
ans = '/users/mss.system.pbnsl/dataset.mat'
load("dataset.mat")
figure
hold on
xlabel('index of points')
ylabel('data value')
plot(a,'.',DisplayName='data1')
plot(b,'.',DisplayName='data2')
plot(c,'.',DisplayName='data3')
whos
Name Size Bytes Class Attributes a 90x1 720 double ans 1x35 70 char b 90x1 720 double c 90x1 720 double cmdout 1x33 66 char gdsCacheDir 1x14 28 char gdsCacheFlag 1x1 8 double i 0x0 0 double managers 1x0 0 cell managersMap 0x1 8 containers.Map
figure;
boxplot([a b c],'Notch','on','Labels',{'data1','data2','data3'})
grid on
x = [4 7 40 8 4]';
% K Nearest neighbour (KNN) classification
data = [a; b; c];
label = [ones(size(a)); 2*ones(size(b)); 3*ones(size(b)) ];
Mdl = fitcknn(data, label, "NumNeighbors", 80); % larger number of neighbours
predictedClass = predict(Mdl, x) % predicted class
predictedClass = 5x1
1 1 3 2 1

1 Comment

Thank you for your answer. I will further check with whether the most occuring prediction leads to the predicted class or I have to perform some additional analysis. This probably works. Thank you. Good day.

Sign in to comment.

More Answers (0)

Categories

Products

Release

R2024a

Asked:

UH
on 25 Mar 2024

Commented:

UH
on 25 Mar 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!