11

The following list has some elements that are labeled. For example {1, 2} -> 1, {-1, 3} -> 3, etc:

list = {{1, 2}, {-1, 3}, {5, 6}, {-3, 4}, {7, 8}, {-9, 1}, {0, 1}};
labels = {1, 3, 2, 1, 2, 1, 3};

What is a good way to gather list's elements clustered according to their labels?

clusters = {{{1 ,2}, {-3, 4}, {-9, 1}}, {{5, 6}, {7, 8}}, {{-1, 3}, {0, 1}}}
Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
tchronis
  • 2,445
  • 1
  • 14
  • 26
  • 2
    I think there was a question like this in the past but I can't recall it. Extract[list, Position[labels, #]] & /@ Union@labels – Kuba Jul 15 '13 at 13:10
  • @Kuba. Please write up your solution as an answer so we can get this question off the unanswered list. – m_goldberg Jul 15 '13 at 13:21
  • After 5 days i will run a benchmark for all answers (with reasonable data) and post here the results. – tchronis Jul 15 '13 at 13:58
  • 1
    @tchronis More than five days have passed, and you've got two new answers. I chose to add timings to my own. I'll be interested to see your own timings when you get around to it. – Mr.Wizard Jul 31 '13 at 10:47
  • @Mr.Wizard thank you and I am sorry I didn't fulfill my promise on publishing timing results. By the way you are too fast for me and sure your evaluations are more reliable than mine. – tchronis Nov 29 '13 at 14:27
  • 1
  • @tomd That looks like a duplicate to me. If you agree would you please cast a vote to close that question as a duplicate of this one? – Mr.Wizard Feb 12 '17 at 12:02

4 Answers4

10

I believe the best way is to use an Ordering function with recognition of duplicates.
Please see that (self) Q&A for an explanation.

myOrdering[a_List] := GatherBy[Ordering@a, a[[#]] &]

list[[#]] & /@ myOrdering[labels]
{{{1, 2}, {-3, 4}, {-9, 1}}, {{5, 6}, {7, 8}}, {{-1, 3}, {0, 1}}}

Benchmarking

And updated benchmark for recent versions, performed in 10.1.0.

Note: in version 7 Pick was orders of magnitude slower in this test. Now it is competitive but it still falls behind as the number of unique labels increases.

myOrdering[a_List] := GatherBy[Ordering@a, a[[#]] &]

f1[{list_, labels_}] :=
  Extract[list, Position[labels, #]] & /@ Union@labels

f2[{list_, labels_}] :=
  Pick[list, labels, #] & /@ Union@labels

f3[{list_, labels_}] :=
  GatherBy[Sort[Transpose@{labels, list}, OrderedQ[{#1[[1]], #2[[1]]}] &], 
   First][[All, All, 2]]

f4[{list_, labels_}] :=
  Reap[MapThread[Sow, {list, labels}], Union@labels][[2, All, 1]]

f5[{list_, labels_}] :=
  list[[#]] & /@ myOrdering[labels]

g[n_] := RandomInteger[⌈n/4⌉, #] & /@ {{n, 2}, n}


Needs["GeneralUtilities`"]

BenchmarkPlot[{f1, f2, f3, f4, f5}, g, 10]

enter image description here

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
  • Performance depends of labels variety. Try labels = RandomInteger[10, 3000], Pick should show it's strength :) – Kuba Jul 31 '13 at 10:39
  • @Kuba It's still nearly an order of magnitude slower than myOrdering on my system. I know Pick was improved on Packed Arrays in version 8 (I use v7). What timings do you get? Nevertheless I think the multiple-pass Pick/Position method has a higher complexity by nature. – Mr.Wizard Jul 31 '13 at 10:44
  • 0.000625001 for Pick and 0.00057500 for myOrdering :) – Kuba Jul 31 '13 at 10:51
  • @Kuba I added a note to my answer; I hope you approve. – Mr.Wizard Jul 31 '13 at 10:58
  • I fully agree :) I was aware of not efficient nature of this approach, I should have described it but I forgot :) – Kuba Jul 31 '13 at 11:02
  • @Mr.Wizard thanks for the comparisons and discussion re: Ordering and Pick. – ubpdqn Jul 31 '13 at 11:16
8

I'm always afraid in case of that there was a duplicate in the past. But I do not remember.

You can try this:

Extract[list, Position[labels, #]] & /@ Union@labels

{{{1 ,2}, {-3, 4}, {-9, 1}}, {{5, 6}, {7, 8}}, {{-1, 3}, {0, 1}}}

and this:

Pick[list, labels, #] & /@ Union@labels

{{{1, 2}, {-3, 4}, {-9, 1}}, {{5, 6}, {7, 8}}, {{-1, 3}, {0, 1}}}

GatherBy variation

GatherBy[Sort@Thread[Rule[labels, list]], First][[ ;; , ;; , 2]]

{{{-9, 1}, {-3, 4}, {1, 2}}, {{5, 6}, {7, 8}}, {{-1, 3}, {0, 1}}}

Kuba
  • 136,707
  • 13
  • 279
  • 740
7

My GatherBy variation:

GatherBy[Transpose@{labels, list}, First][[All, All, 2]]

{{{1, 2}, {-3, 4}, {-9, 1}}, {{-1, 3}, {0, 1}}, {{5, 6}, {7, 8}}}

A possible drawback is that the result is not sorted by label. This is easy to change by doing

GatherBy[Sort@Transpose@{labels, list}, First][[All, All, 2]]

{{{-9, 1}, {-3, 4}, {1, 2}}, {{5, 6}, {7, 8}}, {{-1, 3}, {0, 1}}}

which sorts by label but destroys the initial intra-label ordering or by

GatherBy[Sort[Transpose@{labels, list}, OrderedQ[{#1[[1]], #2[[1]]}] &], First][[All, All, 2]]

{{{1, 2}, {-3, 4}, {-9, 1}}, {{5, 6}, {7, 8}}, {{-1, 3}, {0, 1}}}

which keeps the initial order.

sebhofer
  • 2,741
  • 18
  • 25
  • Thank you @sebhofer. Yes the non sorted drawback matters in my case. – tchronis Jul 15 '13 at 13:39
  • 1
    @tchronis I realised this is easy to change, see my edit. – sebhofer Jul 15 '13 at 13:44
  • Another idea for the sorting might be: Sort[GatherBy[Transpose[{labels, list}], First], First@#1[[1]] < First@#2[[1]] &][[All, All, 2]] (possibly slower than your ideas, but at least it's rather short :) ) – Pinguin Dirk Jul 15 '13 at 16:55
  • @PinguinDirk Sure that also works, I would contest it's shorter though. You can do Sort[GatherBy[Transpose[{labels, list}], First], #1[[1, 1]] < #2[[1, 1]] &][[All, All, 2]] then it is shorter, but only by 7 keystrokes :) – sebhofer Jul 15 '13 at 17:07
  • ok :) it's not code golfing, and I'd have chosen the same approach! +1 – Pinguin Dirk Jul 15 '13 at 17:10
5

This also works:

Reap[MapThread[Sow, {list, labels}]][[2]]

or an alternatively ordering by tags:

Reap[MapThread[Sow, {list, labels}], Union @ labels][[2, All, 1]]
Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
ubpdqn
  • 60,617
  • 3
  • 59
  • 148
  • I really like Sow and Reap. (+1) I hope you don't mind, but I'm taking the liberty to streamline your code; you can revert the edit if you disapprove. – Mr.Wizard Jul 31 '13 at 10:11
  • Value your comment and edit. I always learn something. – ubpdqn Jul 31 '13 at 10:36
  • Thanks: Reap/Sow seemed a useful approach. I just failed to understand it well enough, My ignorance, I hope, is now further reduced. – ubpdqn Aug 01 '13 at 00:59
  • What you had was perfectly valid. It is simply that the default values work in this case so we aren't required to specify them. Regarding using Sow[#, #2]& rather than simply Sow, I've seen many people do that so you're in good company. :-) – Mr.Wizard Aug 01 '13 at 01:04