Kmeans clustering in k=10
Show older comments
I have a matrix with (256*1707) and I want to cluster it with Kmeans with k=10, and plot it..?
I appreciate any help you can provide.
Answers (1)
1) Randomly initialize 10 cluster centroids. This can be done by simply randomly selecting 10 points from your dataset.
2) Compute the distance (Euclidean, presumably) from each data point to these 10 centroids.
3) Assign cluster membership of each point to the cluster who's centroid is the closest.
4) Re-compute centroid of each cluster
5) Compute distance from each data point to the 10 centroids.
6) So on...
Plotting:
for i=1:10
plot(matrix(cluster==i,dim1),matrix(cluster==i,dim2),'o')
hold on
end
In this plot, you have to choose two dimensions to plot against each other. From the looks of it, you have either 256 or 1707 dimensions (aka features).
17 Comments
Ali Ali
on 18 Apr 2018
njj1
on 18 Apr 2018
It appears that your cluster plotting code is OK. The centroids part is not quite correct. Try something like this:
for i=1:10
plot(X(idx==i,1),X(idx==i,2),'o')
hold on
plot(C(i,1),C(i,2),'kx','MarkerSize',8)
end
hold off
title('K-means with 10 Clusters and Centroids')
The 'LineWidth' property is not necessary when you only plot points.
Ali Ali
on 18 Apr 2018
OK, first, I was wrong about the 'LineWidth' property when you plotted your centroids. You can and should use this when plotting the centroids.
Second, I'm not sure what you mean by "it doesn't appear like that clustering groups of dots". Are there any dots plotting? I've just replicated this code for a simpler dataset and it seems to be working fine... Here's what my code looks like right now. Bear in mind that data matrix, X, should be laid out as an n x p matrix, where n is the number of observations and p is the number of dimensions/features.
[idx,C] = kmeans(X,10,'Distance','sqeuclidean','replicates',12);
for i=1:10
plot(X(idx==i,1),X(idx==i,2),'o')
hold on
plot(C(i,1),C(i,2),'kx','markersize',8,'linewidth',2)
end
njj1
on 18 Apr 2018
If you would like a plot like this, then there are a few changes we can make.
for i=1:10
plot(X(idx==i,1),X(idx==i,2),'.','MarkerSize',9)
hold on
plot(C(i,1),C(i,2),'kx','markersize',8,'linewidth',2)
end
You can change the property 'MarkerSize' in the first plot() call if you want the dots to be larger.
However, judging from the plot you attached, there are only 5 clusters... Is this what you want or is this wrong?
Ali Ali
on 18 Apr 2018
njj1
on 18 Apr 2018
Unfortunately, your data is quite high dimensional, which means that picking out any 2 dimensions for plotting is very likely going to produce an odd looking plot.
K-means is an algorithm that based upon an optimization routine and this optimization results in a local, not global, optimum. Further, each of your 'replicates' starts the centroids at different randomly selected location. The introduction of these varying initial conditions in conjunction with the stochastic nature of the optimization algorithm can result in the centroids changing location each time you run k-means.
A further difficulty comes from the high-dimensionality of your data. Look up "curse of dimensionality" to get some understanding of why working in high dimensions can be tricky.
njj1
on 18 Apr 2018
I also encourage you to make sure that you are using the correct parts of your data matrix. You said your matrix was 256 x 1707. Are the rows the observations or the colulmns? My guess is that the you have 1707 observations, each of which has 256 dimensions/features. If so you need to input the transpose of your matrix into kmeans, e.g.,
[idx,C] = kmeans(X',10,'Distance','sqeuclidean','Replicates',12,'Options',opts);
Ali Ali
on 18 Apr 2018
Ali Ali
on 18 Apr 2018
This is because you used the transpose when computing the clusters, so the vector idx is length 1707, not 256. It might be easier to just enter X=X' before you do any other operations. Try this:
opts = statset('Display','final');
X = X';
[idx,C] = kmeans(X,10,'Distance','sqeuclidean','Replicates',12,'Options',opts);
%Plotting
for i=1:10
plot(X(idx==i,1),X(idx==i,2),'.','MarkerSize',9)
hold on
plot(C(i,1),C(i,2),'kx','markersize',8,'linewidth',2)
end
hold off
title('K-means with 10 Clusters and Centroids')
Ali Ali
on 18 Apr 2018
njj1
on 18 Apr 2018
Try plotting with other dimensions. Like I said, having a dimensionality of 256 is high and may lead to odd results for a few reasons.
To plot in other dimensions try something like this:
opts = statset('Display','final');
X = X';
[idx,C] = kmeans(X,10,'Distance','sqeuclidean','Replicates',12,'Options',opts);
%Plotting
dim1 = 1; %x-axis in your plot
dim2 = 12; %y-axis in your plot
for i=1:10
plot(X(idx==i,dim1),X(idx==i,dim2),'.','MarkerSize',9)
hold on
plot(C(i,dim1),C(i,dim2),'kx','markersize',8,'linewidth',2)
end
hold off
title('K-means with 10 Clusters and Centroids')
Ali Ali
on 18 Apr 2018
Image Analyst
on 19 Apr 2018
Ali, attach your data in a .mat file if you want more help, to make it easier for people to help you.
Also, you've marked it solved/accepted, so are you all done with this question?
Ali Ali
on 21 Apr 2018
Categories
Find more on k-Means and k-Medoids Clustering in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

