How to manually set K-means centroids when classifying an image
10 views (last 30 days)
Show older comments
Andreas Westergaard
on 27 Mar 2014
Commented: Image Analyst
on 29 Mar 2014
Hello World (wasn't that what the books told you to print way back when you started doing HTML?...)
I am exploring the kmeans function in matlab to classify an RGB image into three classes. I would like to force the kmeans with regards to the location of the centroids. As I can understand from the documentation, I should use the 'start' option, however I can not figure out how to set it correctly: In the images, I wan't to separate blue sky from water and land. Let's say that I find the sky to have an average RGB value of [120,130,190], water at [110,150,150] and land at [120,140,120]. Could any of you give an example of how to force the kmeans with these centroids? Thank you in advance for any input!
0 Comments
Accepted Answer
Shashank Prasanna
on 27 Mar 2014
if your data matrix X is n-by-p, and you want to cluster the data into 3 clusters, then the location of each centroid is 1-by-p, you can stack the centroids for the 3 clusters into a single matrix which is 3-by-p and provide to kmeans as starting centroids.
C = [120,130,190;110,150,150;120,140,120];
I am assuming here that your matrix X is n-by-3.
This is explained in the documentation:
More Answers (2)
Tom Lane
on 29 Mar 2014
If your goal is to specify the centroids in advance, and not just have kmeans start with them and adjust them as things go along, then I think you don't want to use kmeans at all. Just use pdist2, find the closest centroid for each point, and classify into the cluster defined by the closest centroid.
2 Comments
Image Analyst
on 29 Mar 2014
That is the main reason that automatic thresholds are not always robust. If you have to find something that can range from anywhere of 0% of an image to 100% of an image, using thresholds that force you to pick automatically, or clusters that force you to pick a certain number of clusters, are not robust. They will fail if you don't have the proper number of pixels in the image belonging to those classes. For most or all of my color classification applications I use fixed values to determine the class. I used a training set to determine where the classes will be and then once I decide on them, they are fixed for all images. That way I can get area fractions for all color classes no matter if they are present or 100% or somewhere in between. If you had one cluster and told it to find 4 clusters, it would find 4 clusters but it will chop your image up into 4 clusters when if you had 3 other "real" colors there, it would find them all accurately, whereas in the first case it was calling the cluster 4 clusters when it should actually only be one cluster.
Image Analyst
on 27 Mar 2014
Here is the official Mathworks Example: http://www.mathworks.com/products/demos/image/color_seg_k/ipexhistology.html
Please mark the Answer as accepted if that's what you were looking for. Thanks.
3 Comments
Image Analyst
on 27 Mar 2014
Why don't you just manually segment these things. kmeans is appropriate if you have the same number of color classes but they move around in color space all the time (from image to image). If you have known classes, like you know you'll always have clouds, sky, water, sand, and grass, then it's best if you just define those regions in colorspace and segment according to them. What are you going to do if you have 5 classes like I said, and you tell it there are only 3 classes (sky, water, land)? It will fail.
Perhaps you'd like to use this approach (I haven't tried it):
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!