8

Bug introduced in 11.0 and fixed in 11.3


I have a problem with using custom DistanceFunction in FindClusters. To make the issue as simple as possible, consider clustering odd & even numbers in the following way:

cases=Table[i,{i,1,100}];
FindClusters[cases,DistanceFunction->(Mod[#1-#2,2]&)]

Error:

Set::shape: Lists {MachineLearning`file50GaussianMixture`PackagePrivate`n$530941,MachineLearning`file50GaussianMixture`PackagePrivate`m$530941} and {100} are not the same shape.
Transpose::nmtx: The first two levels of {77,26,1,93,83,66,41,98,30,25,65,85,36,49,48,80,69,29,45,97,94,14,99,76,56,37,87,46,86,34,5,78,90,12,9,91,3,64,19,38,53,70,96,92,67,39,57,7,44,17,<<50>>} cannot be transposed.

Also consider this case, with a totally meaningless error:

cases = Table[i, {i, 1, 100}];
FindClusters[cases, DistanceFunction -> ((Abs[Mod[#1 - #2, 2]] + 1) &)]

Error:

FindClusters::disnopos: The user-supplied distance is not positive definite. 
Alexey Popkov
  • 61,809
  • 7
  • 149
  • 368
ahrvoje
  • 153
  • 5
  • 1
    I don't confirm this behaviour on 10.4.1 for Linux x86 (64-bit). Try clearing the variables/quiting the kernel. What's your $Version? – corey979 Nov 28 '16 at 17:17
  • It works on version 10.0 for Windows x86 (64-bits) too. – mattiav27 Nov 28 '16 at 17:30
  • 4
    I can reproduce in 11.0. Looks like a bug, please report it to support@wolfram.com. – chuy Nov 28 '16 at 17:39
  • 1
    I see the same error in 11.0.1.0. Other custom distance functions seem to work on e.g. two-dimensional datasets, but something goes horribly wrong with the one shown here. – MarcoB Nov 28 '16 at 18:02
  • 3
    Reported it. As to the "meaningless" message, not exactly, but maybe not as good as it could be. At issue is that ((Abs[Mod[#1 - #2, 2]] + 1) &) lacks one of the basic properties of distance functions, e.g. $\operatorname{dist}(x,x) = 0$. – rcollyer Nov 28 '16 at 21:16
  • 3
    A workaround appears to be to use ClusterAnalysis`FindClusters`FindClustersOld which I assume uses the pre-version-11 code. – Simon Woods Nov 28 '16 at 21:24
  • Thank you all. @SimonWoods Workaround works great! – ahrvoje Nov 29 '16 at 00:21

2 Answers2

3

You should assign Method -> "DBSCAN" or Method -> "Agglomerate",such as

FindClusters[cases, DistanceFunction -> (Mod[#1 - #2, 3] &), 
 Method -> "Agglomerate"]

{{1,4,7,10,13,<<29>>},{2,3,5,6,8,<<61>>}}

yode
  • 26,686
  • 4
  • 62
  • 167
  • You are right, looks like assigning Method to "DBSCAN", "Agglomerate" or "JarvisPatrick" solves the problem, other methods raise errors. Also seems Method -> "JarvisPatrick" works the best, and correctly clusters odd vs. even numbers. – ahrvoje Mar 19 '17 at 11:06
  • 1
  • @ahrvoje For the original example cases=Table[i,{i,1,100}]; methods "DBSCAN" and "JarvisPatrick" find identical clusters. – Alexey Popkov Mar 19 '17 at 11:44
  • "Optimize" works and finds good clusters, but I don't get good result with "DBSCAN", only a single cluster containing all the cases for Mod[#1 - #2, 2] &. Notice @yode changed the original DistanceFunction to Mod[x, 3]. – ahrvoje Mar 19 '17 at 15:27
2

As Simon Woods suggests in the comment, a workaround is to use ClusterAnalysis`FindClusters`FindClustersOld instead of FindClusters:

$Version
"11.1.0 for Microsoft Windows (64-bit) (March 13, 2017)"
cases = Table[i, {i, 1, 100}];
ClusterAnalysis`FindClusters`FindClustersOld[cases, 
 DistanceFunction -> (Mod[#1 - #2, 2] &)]
{{1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 
  47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 
  91, 93, 95, 97, 99}, {2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 
  36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 
  80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100}}
ClusterAnalysis`FindClusters`FindClustersOld[cases, 
 DistanceFunction -> ((Abs[Mod[#1 - #2, 2]] + 1) &)]
{{1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 
  47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 
  91, 93, 95, 97, 99}, {2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 
  36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 
  80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100}}

The above output is identical to what version 10.4.1 returns.

Simon's guess that ClusterAnalysis`FindClusters`FindClustersOld is the pre-version 11 FindClusters is supported by the fact that it accepts only Method's "Agglomerate" and "Optimize" and doesn't support any of the methods added in version 11:

cases = Table[i, {i, 1, 100}];
{#, Quiet@Check[
     Shallow@ClusterAnalysis`FindClusters`FindClustersOld[cases, 
       DistanceFunction -> (Mod[#1 - #2, 2] &), Method -> #], $Failed]} & /@ {"Optimize", 
  "Agglomerate", "DBSCAN", "NeighborhoodContraction", "JarvisPatrick", "KMeans", 
  "MeanShift", "KMedoids", "SpanningTree", "Spectral", "GaussianMixture"}

(* {{"Optimize", {{1, 3, 5, 7, 9, 11, 13, 15, 17, 19, <<40>>}, 
                  {2, 4, 6, 8, 10, 12, 14, 16, 18, 20, <<40>>}}}, 
    {"Agglomerate", {{1, 3, 5, 7, 9, 11, 13, 15, 17, 19, <<40>>}, 
                     {2, 4, 6, 8, 10, 12, 14, 16, 18, 20, <<40>>}}}, 
    {"DBSCAN", $Failed}, {"NeighborhoodContraction", $Failed}, 
    {"JarvisPatrick", $Failed}, {"KMeans", $Failed}, {"MeanShift", $Failed}, 
    {"KMedoids", $Failed}, {"SpanningTree", $Failed}, {"Spectral", $Failed}, 
    {"GaussianMixture", $Failed}}
*)
Alexey Popkov
  • 61,809
  • 7
  • 149
  • 368