Setting number of clusters on clustergram object

Hi, I'm using the clustergram tool to group some data. I know a priori how many groups I want to classify the data into. Is there a way to set the number of clusters that the clustergram object makes? I know it is possible to set the number of nodes with the dendrogram setting, but this is not working very well for me.
Thanks,
Avi

4 Comments

Please clarify what you mean by this: "I know it is possible to set the number of nodes with the dendrogram setting, but this is not working very well for me."
I mean because each cluster has a different number of nodes that setting a node threshold does not work very well as a way to set the number of clusters in the large data sets. I would like to set the number of clusters generated in the denodrgram and plotted in the heat map of the clustergram.
Sorry, but I'm still a little confused by your question. I should have been clearer with my original comment. What you mean by the "dendrogram setting"? Do you mean that you specify the 'Dendrogram' argument when you call clustergram? Are you happy with this coloring scheme as a way of identifying clusters? If I understand correctly, here's how the coloring works: Each node that is colored black is a "cluster of 1" but other identically colored nodes belong to the same cluster (for the specified threshold). And if I'm understanding everything correctly so far, then I would rephrase your question as follows: "How can I determine the appropriate value of the Dendrogram option to clustergram so that I end up with a specific number of clusters?"
Yes, you can put it like that. How can I determine the dendrogram option setting so that the specified number of clusters is generated and plotted?

Sign in to comment.

Answers (2)

I don't know the clustergram() function. Not in any of my toolboxes. Why not use kmeans() in the Statistics and Machine Learning Toolbox, where you can tell it how many clusters you want?

2 Comments

Clustergram is nice because it plots the dendrograms beside a clustered heat map of the data.
I don't know why clustergram does not have a similar flag like 'maxclust' in the clusterdata function? Presumably, the clustergram function contains a similar flag but I cannot locate it.
I don't know. Say "No" on the "Was this helpful" part of the help and someone will read your suggestion.

Sign in to comment.

Arthur Goldsipe
Arthur Goldsipe on 26 Aug 2016
Edited: Arthur Goldsipe on 26 Aug 2016
I'm not an expert with clustering, but I don't think clustergram was designed with this sort of functionality in mind. I think you have two options: (1) Use trial and error to adjust the Dendogram propety of the resulting clustergram. Or (2) implement your own function that behaves like clustergram but offers the option to determine the number of clusters.
If you want to take approach 1, you could write MATLAB code that automates the trial-and-error process. For example, maybe you can write code to inspect the figure and determine how many clusters were identified and increase or decrease the Dendrogram property value.
If you want to take approach 2, you might find the cluster function useful, since it allows you to specify the number of clusters: http://www.mathworks.com/help/stats/cluster.html

1 Comment

Thanks Arthur. Presumably, clustergram calls the cluster function. It has all of the options that cluster has with the exception of the maxclust option. That happens to be the option that is most important to what I am trying to accomplish!

Sign in to comment.

Asked:

on 24 Aug 2016

Commented:

on 26 Aug 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!