5

In many cases, FindClusters and ClusteringComponents provide similar results. Options for hierarchical and agglomerative clustering are available for both of these functions. So, would you please help me to understand clearly the differences between them?

Just to be clear,my main question concerns about choosing appropriate function for different research questions and anlysis. That is, it's not important if the two functions (or other similar functions) produce different outputs with the same meaning rather I want to know when to use which function and why.

Also, please post any tutorial for cluster analysis in Mathematica if you aware of.

Amin
  • 349
  • 2
  • 9
  • 2
    Did you try http://reference.wolfram.com/mathematica/tutorial/PartitioningDataIntoClusters.html ? – Rod Jun 12 '13 at 16:58
  • @ Rod Lm , yes I did but I couldn't figure out the difference then I posted my question here. I know that there are different options but when results are similar in most the cases and ClusteringComponents is faster that the other procedures, wondering what's the advantage of FindClusters ? – Amin Jun 12 '13 at 17:05
  • Take a look at the difference between ClusteringComponents[{1, 2, 3, 7, 8}, 2] and FindClusters[{1, 2, 3, 7, 8}, 2] ... – Rod Jun 12 '13 at 17:07
  • @ Rod Lm ,just the outputs were different. ClusteringComponents gives cluster memberships for each element but FindClusters clusters elements into two clusters.The same results! – Amin Jun 12 '13 at 17:15
  • "Just the outputs were different" means they are exactly intended to give different types of answer to the same problem... – Rod Jun 12 '13 at 17:21
  • It's not clear to me why FindClusters doesn't support the k-means and partitioning around medoids methods that are offered by ClusteringComponents. In that sense, and apart from the differences in output format, I think this is a fair question. – Oleksandr R. Jun 12 '13 at 17:27
  • @OleksandrR. Have you seen this post? – Rod Jun 12 '13 at 17:31
  • @RodLm yes, but it doesn't attempt any explanation of why there are two functions provided for the same operation that each support different methods and produce their output in a different format... – Oleksandr R. Jun 12 '13 at 17:34
  • @OleksandrR. I think this is an interesting question... However, it was asked to point "the differences between them", and, as you've said, the main difference lies in the output format... – Rod Jun 12 '13 at 17:58
  • 3
    @RodLm well, since (as you point out) the "what" is addressed in the documentation, while the "why" is unlikely to be answerable, I suppose there isn't anything left to say. As such, I'm voting to close as TL. – Oleksandr R. Jun 12 '13 at 18:03
  • I just made my post more clear. @OleksandrR, thank for the edit! – Amin Jun 12 '13 at 19:07

1 Answers1

9

Apart from the output format, the main differences are:

  • FindClusters can take a custom DistanceFunction whereas ClusteringComponents can only use those listed in the documentation

  • FindClusters works with strings and lists of True/False but ClusteringComponents only takes numerical arrays

  • FindClusters takes a 1D list as input, ClusteringComponents can take arrays of any dimension and has a level argument to determine at what level to find clusters

It is also worth noting that with the method options "Optimize" and "Agglomerate", ClusteringComponents uses FindClusters internally. For the "KMeans" and "PAM" methods there are separate implementations, using Image`KMeansClustering and Image`KMedoidsClustering.

Simon Woods
  • 84,945
  • 8
  • 175
  • 324