13

I want to perform a cluster analysis using the HierarchicalClustering package. Is there a way to display the inter-cluster distances in a dendrogram plot?

An example how the result should look like: here.

cormullion
  • 24,243
  • 4
  • 64
  • 133
Frederik Ziebell
  • 1,053
  • 8
  • 21

1 Answers1

14

DendrogramPlot accepts Axes as an option. Despite syntax highlighting in red of Axes and AxesOrigin, GridLines etc. these options seem to work with DendrogramPlot.

Inter-cluster distance in a Cluster object is given as the third element.

enter image description here

Several combinations of DistanceFunction and Linkage where inter-cluster distances are highlighted in red and shown as green gridlines in the dendogram plot:

Needs["HierarchicalClustering`"]

Grid[{{ToString@#[[1]] <> "--" <> #[[2]]}, 
  {Replace[ Agglomerate[{1, 2, 10, 4, 8},
    DistanceFunction -> #[[1]], Linkage -> #[[2]]], 
    Cluster[a_, b_, c_, d__] -> 
    Cluster[a, b, Style[c, 18, Red, Bold], d], {0, 
    Infinity}]}, {DendrogramPlot[{1, 2, 10, 4, 8},
   DistanceFunction -> #[[1]], Linkage -> #[[2]], 
   LeafLabels -> (# &), 
   GridLines -> {None, Cases[Agglomerate[{1, 2, 10, 4, 8},
       DistanceFunction -> #[[1]], Linkage -> #[[2]]], 
      Cluster[a_, b_, c_, d__] :> c, {0, Infinity}]}, 
   GridLinesStyle -> Green, ImageSize -> 500, 
   Axes -> {False, True}, AxesOrigin -> {.75, Automatic}]}}] & /@ 
 Tuples[{{Automatic, ManhattanDistance}, {"Complete",  "Centroid"}}] // Column

enter image description here

So ... vertical axis does indeed measure the inter-cluster distances for a given DistanceFunction and Linkage.

For various combinations of DistanceFunction and Linkage you get the following pictures:

{#, Agglomerate[{1, 2, 10, 4, 8}, DistanceFunction -> Automatic, Linkage -> #], 
 DendrogramPlot[{1, 2, 10, 4, 8},
 DistanceFunction -> Automatic, Linkage -> #, 
 Axes -> {False, True}, AxesOrigin -> {-1, Automatic}],
 Agglomerate[{1, 2, 10, 4, 8}, DistanceFunction -> ManhattanDistance, Linkage -> #],
 DendrogramPlot[{1, 2, 10, 4, 8},
 DistanceFunction -> ManhattanDistance, Linkage -> #, 
 Axes -> {False, True}, AxesOrigin -> {-1, Automatic}]} & /@
 {"Single", "Average","Complete", "WeightedAverage", "Centroid", "Median","Ward"} // 
 Grid[Prepend[#, {"", "EuclideanDistance-Clusters", 
 "EuclideanDistance-Dendogram", "ManhattanDistance-Clusters",
 "ManhattanDistance-Dendogram"}], 
  Dividers -> All, Alignment -> Bottom] &    

enter image description here

EDIT: What I get for Frederik's example in the comments:

DendrogramPlot[Prime[#] & /@ Range[30], Axes -> {False, True}, 
AxesOrigin -> {-1, Automatic}]

enter image description here

kglr
  • 394,356
  • 18
  • 477
  • 896
  • I posted the same answer, but deleted it. Are you sure the vertical axis represents the inter-cluster distance? – Dr. belisarius Aug 06 '12 at 14:02
  • Thank you very much. Unfortunately, drawing axes seems a little buggy. Try: DendrogramPlot[Prime[#]&/@Range[30],Axes->{False, True}] – Frederik Ziebell Aug 06 '12 at 14:09
  • @belisarius, inter-cluster distance should depend on DistanceFunction and Linkage. I tried various combinations of DistanceFunction and Linkage, and the vertical tick labels seem to vary in expected ways... but I have not beenable to verify the exact mapping ... yet :) – kglr Aug 06 '12 at 14:21
  • Seems correlated, yes. But not sure if it exactly represent the distance. Oh, well :) – Dr. belisarius Aug 06 '12 at 14:25
  • @Frederik, I added the picture I get for your example. – kglr Aug 06 '12 at 14:27
  • @belisarius, please see new table for various distance linkage combinations :) – kglr Aug 06 '12 at 15:00
  • @kguler Nice work! Here goes my +1 then – Dr. belisarius Aug 06 '12 at 15:11