Kmeans clustering in k=10

I have a matrix with (256*1707) and I want to cluster it with Kmeans with k=10, and plot it..?
I appreciate any help you can provide.

Answers (1)

njj1
njj1 on 18 Apr 2018
Edited: njj1 on 18 Apr 2018

1) Randomly initialize 10 cluster centroids. This can be done by simply randomly selecting 10 points from your dataset.

2) Compute the distance (Euclidean, presumably) from each data point to these 10 centroids.

3) Assign cluster membership of each point to the cluster who's centroid is the closest.

4) Re-compute centroid of each cluster

5) Compute distance from each data point to the 10 centroids.

6) So on...

Plotting:

for i=1:10
     plot(matrix(cluster==i,dim1),matrix(cluster==i,dim2),'o')
     hold on
end

In this plot, you have to choose two dimensions to plot against each other. From the looks of it, you have either 256 or 1707 dimensions (aka features).

17 Comments

I did this code, but it seems that something is missing.. how do I label and continue to 10? is it by iteration or one by one?
opts = statset('Display','final');
[idx,C] = kmeans(X,10,'Distance','sqeuclidean','Replicates',12,'Options',opts);
%%Plotting
for i=1:10
plot(X(idx==i,1),X(idx==i,2),'o')
hold on
end
plot(C(:,1),C(:,2),'kx','MarkerSize',8,'LineWidth',2)
hold off
title 'K-means with 10 Clusters and Centroids'
It appears that your cluster plotting code is OK. The centroids part is not quite correct. Try something like this:
for i=1:10
plot(X(idx==i,1),X(idx==i,2),'o')
hold on
plot(C(i,1),C(i,2),'kx','MarkerSize',8)
end
hold off
title('K-means with 10 Clusters and Centroids')
The 'LineWidth' property is not necessary when you only plot points.
Yes, but it doesn't appear like that clustering groups of dots? I appreciate if you can help me with this.
njj1
njj1 on 18 Apr 2018
Edited: njj1 on 18 Apr 2018
OK, first, I was wrong about the 'LineWidth' property when you plotted your centroids. You can and should use this when plotting the centroids.
Second, I'm not sure what you mean by "it doesn't appear like that clustering groups of dots". Are there any dots plotting? I've just replicated this code for a simpler dataset and it seems to be working fine... Here's what my code looks like right now. Bear in mind that data matrix, X, should be laid out as an n x p matrix, where n is the number of observations and p is the number of dimensions/features.
[idx,C] = kmeans(X,10,'Distance','sqeuclidean','replicates',12);
for i=1:10
plot(X(idx==i,1),X(idx==i,2),'o')
hold on
plot(C(i,1),C(i,2),'kx','markersize',8,'linewidth',2)
end
Ali Ali
Ali Ali on 18 Apr 2018
Edited: Ali Ali on 18 Apr 2018
I meant like pic. attached.
If you would like a plot like this, then there are a few changes we can make.
for i=1:10
plot(X(idx==i,1),X(idx==i,2),'.','MarkerSize',9)
hold on
plot(C(i,1),C(i,2),'kx','markersize',8,'linewidth',2)
end
You can change the property 'MarkerSize' in the first plot() call if you want the dots to be larger.
However, judging from the plot you attached, there are only 5 clusters... Is this what you want or is this wrong?
Don't care about the pic. it just for getting the idea, still not convinced with the result.. please, see what I got.
and why every 'RUN' the centroids are changing? the values are fixed in the matrix..!
Unfortunately, your data is quite high dimensional, which means that picking out any 2 dimensions for plotting is very likely going to produce an odd looking plot.
K-means is an algorithm that based upon an optimization routine and this optimization results in a local, not global, optimum. Further, each of your 'replicates' starts the centroids at different randomly selected location. The introduction of these varying initial conditions in conjunction with the stochastic nature of the optimization algorithm can result in the centroids changing location each time you run k-means.
A further difficulty comes from the high-dimensionality of your data. Look up "curse of dimensionality" to get some understanding of why working in high dimensions can be tricky.
I also encourage you to make sure that you are using the correct parts of your data matrix. You said your matrix was 256 x 1707. Are the rows the observations or the colulmns? My guess is that the you have 1707 observations, each of which has 256 dimensions/features. If so you need to input the transpose of your matrix into kmeans, e.g.,
[idx,C] = kmeans(X',10,'Distance','sqeuclidean','Replicates',12,'Options',opts);
Yes, got it.. I really appreciate your help.
Yes, it is 256 x 1707, and when making transpose for the input, I got;
Index exceeds matrix dimensions.
Error in main_03 (line 13)
plot(X(idx==i,1),X(idx==i,2),'.','MarkerSize',9)
njj1
njj1 on 18 Apr 2018
Edited: njj1 on 18 Apr 2018
This is because you used the transpose when computing the clusters, so the vector idx is length 1707, not 256. It might be easier to just enter X=X' before you do any other operations. Try this:
opts = statset('Display','final');
X = X';
[idx,C] = kmeans(X,10,'Distance','sqeuclidean','Replicates',12,'Options',opts);
%Plotting
for i=1:10
plot(X(idx==i,1),X(idx==i,2),'.','MarkerSize',9)
hold on
plot(C(i,1),C(i,2),'kx','markersize',8,'linewidth',2)
end
hold off
title('K-means with 10 Clusters and Centroids')
I don't know, seems to me there is a problem even with this..!!
Try plotting with other dimensions. Like I said, having a dimensionality of 256 is high and may lead to odd results for a few reasons.
To plot in other dimensions try something like this:
opts = statset('Display','final');
X = X';
[idx,C] = kmeans(X,10,'Distance','sqeuclidean','Replicates',12,'Options',opts);
%Plotting
dim1 = 1; %x-axis in your plot
dim2 = 12; %y-axis in your plot
for i=1:10
plot(X(idx==i,dim1),X(idx==i,dim2),'.','MarkerSize',9)
hold on
plot(C(i,dim1),C(i,dim2),'kx','markersize',8,'linewidth',2)
end
hold off
title('K-means with 10 Clusters and Centroids')
Maybe it will work, but this is an image and converted to a matrix, and I have to plot all of its pixels.
Ali, attach your data in a .mat file if you want more help, to make it easier for people to help you.
Also, you've marked it solved/accepted, so are you all done with this question?
Hi,
this is my input.

Sign in to comment.

Asked:

on 18 Apr 2018

Commented:

on 21 Apr 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!