MATLAB Answers


Initial centroids for K-means clustering

Asked by Salad Box on 16 Sep 2019 at 23:09
Latest activity Commented on by Adam
on 17 Sep 2019 at 13:41
If I have an array (i.e., 5 by 3 matrix) can serve as the initial centroids for kmeans clustering, how can I properly initialize the kmeans algorithm?
(Matlab's kmeans function has more than 600 lines of code and I have no idea how to modify it...)
The purpose of having my own initial centroids rather than have them randomly generated in the kmeans function is to remove the randomness in the outputs.
P.s. Python has the answer to it but I don't know Python.

  1 Comment

on 17 Sep 2019 at 10:45
You should always read the documentation before the code. The 'Start' option gives you the option to input your own initial cluster centres.
I always suggest using your embedded help though via
doc kmeans
and clicking on the 'Name','Value' hyperlink in the 2nd function signature to take you to the list of possible (Name,Value) pairs that are supported. If you always use the latest version of Matlab the online help is fine though.

Sign in to comment.

1 Answer

Answer by KALYAN ACHARJYA on 17 Sep 2019 at 9:42
Edited by KALYAN ACHARJYA on 17 Sep 2019 at 9:42

Before I share the helpful link, I requested you to watch the Andrew Ng. lecture on Random Initialization of K menas (Machine Learning).
He suggests to avoid k-means stuck in local minima or ensure the optimize K-menas, choose multiple random initailizations.
Manual Initialization


Salad Box on 17 Sep 2019 at 13:38
Thanks for your answers Kalyan. I do appreciate that.
AndewNg's video only gives some help on when k-means gets stuck on local optimal. His suggestion was to use 'multiple iteration' to better find global optimal rather than local optimal based on the calculation of cost function, choosing the centroids with minimum cost function and record that centroids. That still remains my problem unsolved. If I run the k-means again with 100 new iterations, the output in most cases will be slightly different compared to the first running of k-means with initial 100 iterations.
I need to fix the issue and my request is that everytime when I run the k-means, the output needs to be the same. That's why with my prepared initial centroids, running k-means and moving centroids at each step during k-means, theoretically I should get the same output at the end. I have other variables/parameters to look at during my research, I can't let randomness in the output of k-means be one of my variable. I need to remove this randomness. Hope that is understandable.
The second link in your answer is on 'how to set initial centroids for k means'. However, I have already done that in my way. It is irrelavant to my question.
My question is:
Once I have an array as my initial centroids, how do I embed them into Matlab's own k-means function?
Hope my question is clear.
Can anyone help directly to this question please?
on 17 Sep 2019 at 13:41
As I added in a comment above, the Matlab help is always the first place to go. This shows how you can do this.

Sign in to comment.