KMEANS delivers different results on the same data set?
3 views (last 30 days)
Show older comments
I'm performing a cluster analysis on financial time series. The distance measure is correlation.
IDX = kmeans(data',2,'distance','correlation')
The formula above delivers different results on the same set of time series. I’m wondering how this is possible.
Thanks for your help!
2 Comments
Accepted Answer
Peter Perkins
on 10 Dec 2012
Christian, the kmeans functions uses a randomly-chosen starting configuration:
>> help kmeans
kmeans K-means clustering.
[snip]
'Start' - Method used to choose initial cluster centroid positions,
sometimes known as "seeds". Choices are:
'sample' - Select K observations from X at random (the default)
'uniform' - Select K points uniformly at random from the range
of X. Not valid for Hamming distance.
'cluster' - Perform preliminary clustering phase on random 10%
subsample of X. This preliminary phase is itself
initialized using 'sample'.
matrix - A K-by-P matrix of starting locations. In this case,
you can pass in [] for K, and kmeans infers K from
the first dimension of the matrix. You can also
supply a 3D array, implying a value for 'Replicates'
from the array's third dimension.
Like many optimizations, the K-Means algorithm can end up with different solutions for different starting points. You can take advantage of the randomness built into the kmeans function by running several replicates from different starting points:
'Replicates' - Number of times to repeat the clustering, each with a
new set of initial centroids. A positive integer, default is 1.
Hope this helps
More Answers (0)
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!