6

How do I instruct Agglomerative clustering to stop/NOT merging two particular clusters if the distance between the two clusters is larger than a threshold value?

I know that we can implement a custom distance measure and a custom linkage function. Distance measure says how to measure the distance between any two data points. So even if I implement a custom distance measure, it just says what is the distance between any two data points.

Where in the code should/can I mention a threshold value, that if not met, should prevent thw clusters from being merged. Thus, at the end, if only two clusters are remaining, and if the threshold value is not met between these two clusters, then these two clusters should not be merged into one global cluster.

Is this possible in Mathematica?

rcollyer
  • 33,976
  • 7
  • 92
  • 191
London guy
  • 453
  • 1
  • 3
  • 6

1 Answers1

5

(After belisarius's comment)

The hierarchical clustering invoked by using Method -> "Agglomerate" can be further customized, by using the undocumented "Linkage" suboption. I assume this is ultimately provides the same functionality as the Linkage option in the HierarchicalClustering package, which accepts the following values:

"Single"          smallest intercluster dissimilarity
"Average"         average intercluster dissimilarity
"Complete"        largest intercluster dissimilarity
"WeightedAverage" weighted average intercluster dissimilarity
"Centroid"        distance from cluster centroids
"Median"          distance from cluster medians 
"Ward"            Ward's minimum variance dissimilarity
f                 a pure function
data = {{-1.1, 2.6}, {3.9, -0.8}, {4.2, -3.7}, {3.3, 3.5}, {3.9, 5.2},
    {4.1, -4.8}, {3.8, 3.7}, {5.6, 0.1}, {3.1, -5.2}, {-0.9, 2.3},
    {2.9, 4.1}, {-2.3, 3.9}, {-2.5, 3.}, {2.6, -5.5}, {5.2, 1.9},
    {-0.7, 1.3}, {0.9, 2.8}, {-1.5, 3.3}, {3.8, 1.2}, {2.6, -5.1},
    {-0.8, 3.2}, {4.7, 0.7}, {3., 3.}, {3.9, 3.6}, {4.5, 1.4},
    {4.2, 1.3}, {-1.1, 2.6}, {4.8, 2.4}, {3.3, -3.5}, {3.2, -4.6},
    {3.3, -4.9}, {3., 3.5}, {0.7, 2.1}, {3.2, -4.3}, {-2., 0.5},
    {-1.2, 2.}, {-1.6, 1.8}, {-3.5, 3.7}, {4.8, 0.2}, {3.3, 2.4},
    {-0.1, 2.1}, {-1.3, 2.5}, {4.4, 3.9}, {3.5, 0.2}, {0.1, 2.9},
    {-1., 1.6}, {-1.4, 4.5}, {3.2, 2.5}, {-1.6, 2.4}, {2.6, -5.1}};

{
 ListPlot[FindClusters[data], PlotStyle -> PointSize@.05],
 ListPlot[FindClusters[data, 
   Method -> {"Agglomerate",
      "Linkage" -> (If[#3 > 1.9`*^-6, #1 + #2, (#1 + #2)^2] &)}], 
  PlotStyle -> PointSize@.05]
 }

Mathematica graphics

István Zachar
  • 47,032
  • 20
  • 143
  • 291