4

I have set of coordinates. I want to make clusters in which every point is within 1.5 distance unit of it's neighbor.

ex of point coordinates:

{{-12.945, 20.6509, 12.5901}, {-13.4452, 20.307, 111.626}, 
{-12.9731, 22.8458, 12.4215}, {-13.2381, 24.8167, 10.7147}, 
{-11.3668, 23.3908,11.8499}, {-11.6828, 23.7311, 10.8839}, 
{-13.3929, 21.1835, 9.86324}, {-11.5016, 21.3324, 10.1392}, 
{-12.3079, 22.096, 8.57246}, {-12.5268, 20.9679, 10.5444}, 
{-12.1951, 24.5423, 10.1807}, {-11.8887, 22.3883, 10.0751}, 
{-14.2529, 20.4808, 9.81084}, {-11.9876, 21.8094, 11.0478}, 
{-12.3718, 23.6176, 11.8266}, {-11.6179, 20.8324, 11.2154}, 
{-12.5927, 21.7492, 12.5087}, {-12.1665, 24.6649, 11.2909}, 
{-12.3854, 21.5571, 9.51876}, {-12.2237, 23.4278, 9.9787}}

what is the quickest way in Mathematica for this (for large data sets).

I tried this to find all points that are within mentioned distance:

Table[Select[List, EuclideanDistance[List[[i]], #] < 1.5 &], {i, 1, Length[[List]]}]

but now I have troubles to join all sets that have common elements.

rcollyer
  • 33,976
  • 7
  • 92
  • 191
Sesna Secna
  • 169
  • 8
  • I formatted your code. As you have been here a while, you should learn how to do so, yourself. To see what changes I made, click on the "Edited ..." link above my gravitar. – rcollyer Aug 19 '13 at 16:25
  • @rcollyer: my apologize. I'll be more careful. – Sesna Secna Aug 19 '13 at 16:32
  • how'd you cluster, if you have say points {0,0,0}, {0,0,1} and {0,0,2}? I.e. the 2nd point could be clustered with both other, but the 1st not with the 3rd – Pinguin Dirk Aug 19 '13 at 16:40
  • 'two clusters' means there is no point in one being within 1.5 distance unit of any point from another cluster. so these three numbers are all in one cluster together. – Sesna Secna Aug 19 '13 at 16:43
  • No! in each cluster, for any points, there is at least one point to be within it's 1.5. like: {0,0,0},{0,0,1},{0,0,2} are in one cluster but 1st and 3rd are 2 units apart. – Sesna Secna Aug 19 '13 at 16:55
  • 1
    Something like Gather[data, EuclideanDistance[#1, #2] < 1.5 &] ? – b.gates.you.know.what Aug 19 '13 at 17:53

3 Answers3

4

Here's a different approach, though I think it's quite inefficient.

I treat the points as vertices in a graph. I check each pair of points and if the distance between them is less than 1.5 I connect them with an edge. The clusters are just the ConnectedComponents of the graph.

v = Range @ Length @ data;
e = UndirectedEdge @@@ Select[Subsets[v, {2}], EuclideanDistance @@ data[[#]] < 1.5 &];

ConnectedComponents @ Graph[v, e]
(* {{14, 8, 10, 12, 16, 19, 7, 20, 9, 13, 6, 11, 5, 15, 18, 4, 3, 17, 1}, {2}} *)
Simon Woods
  • 84,945
  • 8
  • 175
  • 324
  • 2
    SparseArray\StronglyConnectedComponents@SparseArray@UnitStep[1.5^2 - DistanceMatrix@data]` follows the same logic but uses an undocumented internal function first publicized by Carl Woll. He found it to be quite fast. – Ray Koopman Aug 22 '13 at 05:18
  • @RayKoopman I highly encourage you to pose an answer with this method. Also, other mentioned in this link are wort recalling. – Kuba Jan 19 '14 at 22:27
3

Here is a possible alteernative, I was working on while Kuba posted his answer :-) I also started by using FixedPoint and the inner loop seems to work but the outer one is easier with While.

c = {{-12.945, 20.6509, 12.5901}, {-13.4452, 20.307, 
    111.626}, {-12.9731, 22.8458, 12.4215}, {-13.2381, 24.8167, 
    10.7147}, {-11.3668, 23.3908, 11.8499}, {-11.6828, 23.7311, 
    10.8839}, {-13.3929, 21.1835, 9.86324}, {-11.5016, 21.3324, 
    10.1392}, {-12.3079, 22.096, 8.57246}, {-12.5268, 20.9679, 
    10.5444}, {-12.1951, 24.5423, 10.1807}, {-11.8887, 22.3883, 
    10.0751}, {-14.2529, 20.4808, 9.81084}, {-11.9876, 21.8094, 
    11.0478}, {-12.3718, 23.6176, 11.8266}, {-11.6179, 20.8324, 
    11.2154}, {-12.5927, 21.7492, 12.5087}, {-12.1665, 24.6649, 
    11.2909}, {-12.3854, 21.5571, 9.51876}, {-12.2237, 23.4278, 
    9.9787}};
MyClustering[data_List, distance_?NumericQ] :=
 Module[{dataoriginal = data, res = {}, temp = {}},
  While[dataoriginal =!= {},
   temp = {};
   AppendTo[res,
    FixedPoint[(
       Map[
        Function[p, 
         temp = Join[temp, 
           Select[dataoriginal, EuclideanDistance[#, p] < distance &]];
         dataoriginal = Complement[dataoriginal, temp]], #]; 
       temp) &, {dataoriginal[[1]]}]]];
  Return[res]]

Just few notes: dataoriginal is needed because I modify the original list and the argument of a function (data in that case) cannot be modified inside the function's body. For huge lists AppendTo is generally slow, so a possible alternative is

res = Join[{res}, FixedPoint[...]]

bobknight
  • 2,037
  • 1
  • 13
  • 15
  • It seems approach is the same :) but yours is tidy and compact +1. I will probably delete mine then, but later, I don't have time now to parse this. :) – Kuba Aug 19 '13 at 19:20
  • @Kuba: I was focused on a annoying job so when I saw this question I shifted my attention on Mathematica and spent a couple of hours just to have an alibi for the other job not completed ;-) – bobknight Aug 19 '13 at 19:25
  • haha, the same here, but now I have to focus back on work :P – Kuba Aug 19 '13 at 19:28
2

This is my interpretation:

your cluster is a set of points that for each one there is at least one within 1.5 distance.

I will not be surprised if there is some kind of one-liner but I haven't played with Clusters etc much.

This is straightforward approach:

data = (* your data*)

SetAttributes[f, HoldAll];
f[cluster_] := Module[{n = Length@cluster},
               Do[
                cluster = Join[cluster, 
                               Select[data, EuclideanDistance[#, cluster[[i]]] < 1.5 &]
                              ];
                           data = Complement[data, cluster];
                 , {i, n}];
               cluster
               ];

The the inner loop should be done with FixedPoint but I've failed in implementation so I used While:

i = 0;
While[Length[data] > 0,
      i++;
      clusters[i] = data[[{1}]];
      data = Rest@data;
      start = 0;
      end = 1;
      While[start != end,
            With[{i = i}, start = Length@clusters[i]; 
                          f[clusters[i]]; 
                          end = Length@clusters[i]];
           ];
     ]

set = clusters /@ Range@i

ListPointPlot3D[set, PlotStyle -> {Red, Blue}, BaseStyle -> AbsolutePointSize@10, 
                     PlotRange -> All]

enter image description here

Looks reasonable :)

I have to focus and it will take some time to write explanation, be patient. Or maybe it will be pointless if this is an overkill :)

Kuba
  • 136,707
  • 13
  • 279
  • 740