9

So let's say I want to group a list of points based on their distance from the origin. I would do something like:

list = {{1, 20}, {1, 6}, {1, 7}, {1, 3}, {3, 1}};
gathered = GatherBy[list, Norm[#]&]

and that would give me what I need, but that would not necessarily respect the ordering. I can apply SortBy[] first and then GatherBy[] the result:

sorted = SortBy[list, Norm[#] &]
partitionedBy = GatherBy[sorted, Norm[#] &]

But that is not very efficient. Is there any more efficient way of doing this using Mathematica's built-in functions. If not, What would be the most efficient way of achieving this?

update: if list is set to be a reduced accuracy version of an exact numerical expression, some of the methods suggested below will yield incorrect results. That is investigated in this question.

Shb
  • 701
  • 3
  • 10
  • 1
    What would be your expected output for your example? – march Feb 13 '16 at 19:44
  • Also what do you mean by "efficient"? You want the fewest operations? Fastest code? Shortest code? – Quantum_Oli Feb 13 '16 at 19:45
  • @march The result given by Gatherby[Sortedby[ ]]. – Shb Feb 13 '16 at 19:47
  • It can be written more succinctly as GatherBy[SortBy[list, Norm], Norm] – Bob Hanlon Feb 13 '16 at 19:48
  • @Quantum_Oli Yeah I guess I should have made that more clear. I meant fastest code. – Shb Feb 13 '16 at 19:48
  • @BobHanlon Yes that would do it, but this way I am making Mathematica evaulate the sorting function twice. I would like to avoid that. – Shb Feb 13 '16 at 19:50
  • SplitBy[SortBy[list, Norm], Norm] – Bob Hanlon Feb 13 '16 at 19:54
  • @BobHanlon Could you explain the difference please? – Shb Feb 13 '16 at 19:55
  • Regarding the update, you have a misplaced bracket. N[Norm[#] &] should be N[Norm[#]] & – Simon Woods Feb 13 '16 at 22:31
  • @SimonWoods Thanks. that was a typo. It still doesn't solve the problem though. – Shb Feb 13 '16 at 22:44
  • Well GatherBy[SortBy[list2, N[Norm[#]] &], Norm] gives the correct answer unlike the typo version. – Simon Woods Feb 13 '16 at 22:51
  • For the first one you need to take into account that GatherBy returns lists of points, so you need to apply Norm to the first element not the whole list, e.g. SortBy[GatherBy[list2, Norm], N[Norm[#[[1]]]] &] – Simon Woods Feb 13 '16 at 22:52
  • @SimonWoods Thanks. Both your new suggestions return the same answer for me, but that answer is not: {{{0.10^-2, 0.10^-2}}, {{0.6, 0.10^-2}, {-0.5, -0.4}, {-0.5, 0.4}, {0.2, -0.6}, {0.2, 0.6}}, {{-1.6, 0.10^-2}, {1.3, -1.}, {1.3, 1.}, {-0.5, 1.5}, {-0.5, -1.5}}} which is what I am expecting. – Shb Feb 13 '16 at 22:55
  • I'm not sure why that's what you expect. For example {-0.5, 0.4} and {0.2, -0.6} have different norms, so why are you expecting them in the same sublist? – Simon Woods Feb 13 '16 at 23:05
  • Oops. While testing, I accidentally altered the list2 values. So the values that I have typed in the question need updating. It should be: {{0.10^-2, 0.10^-2}, {1.3, 1.}, {0.2, 0.6}, {0.2, -0.6}, {1.3, -1.}, {-0.5, 1.5}, {-0.5, 0.4}, {0.6, 0.10^-2}, {-1.6, 0.10^-2}, {-0.5, -0.4}, {-0.5, -1.5}} which should have 0, 0.6 and 1.6 as their norms. – Shb Feb 13 '16 at 23:14
  • No, the norms are {0., 1.64012, 0.632456, 0.632456, 1.64012, 1.58114, 0.640312, 0.6, 1.6, 0.640312, 1.58114} – Simon Woods Feb 13 '16 at 23:19
  • I am using: Map[Norm, lis2] and it returns {0, 1.6, 0.6, 0.6, 1.6, 1.6, 0.6, 0.6, 1.6, 0.6, 1.6} for me. what expression are you using? – Shb Feb 13 '16 at 23:23
  • 1
    Check those results with a calculator, they are wrong! – Simon Woods Feb 13 '16 at 23:38
  • Ha I apologise. These were the results of a solve routine which was outputting SetAccuracy[N[sln],2] as the return expression and I was expecting the results to be on three circles hence confusion. So, Obviously it seems that using N[] in the solutions are somehow problematic. Can you tell me why? – Shb Feb 13 '16 at 23:47
  • 2
    The issue of how N affects Accuracy is a quite different problem to the original question regarding efficient sort & gather algorithms. It should be a separate question. People took time and trouble to answer the original question, and it's rather unfair to claim that their (perfectly good) answers don't work because of special features in your data that you didn't reveal until several hours later. It will also be more useful to future visitors looking for efficient sort & gather algorithms if the question is unencumbered by the requirement to avoid using N. – Simon Woods Feb 14 '16 at 11:41
  • IMO the best thing would be to remove both updates from the question, and accept an answer which best addresses the problem as originally asked. Then create a new question about why two low-accuracy numbers can give True for a==b but False for N[a]==N[b]. – Simon Woods Feb 14 '16 at 11:42
  • @SimonWoods Thanks. I agree. I did as you suggested. Here is the question: http://mathematica.stackexchange.com/questions/106345/role-of-accuracy-in-numerical-evaluations-splitby-vs-gatherby – Shb Feb 14 '16 at 12:42
  • Somewhat related: (21458) – Mr.Wizard Feb 14 '16 at 14:58

3 Answers3

10

As noted by @SimonWoods in the comments, using #.#& instead of Norm gives a huge speed up.

ClearAll[f1, f1b, f2, f2b, f3, f3b, f4, f5, f6, f7, f8]
f1 = GatherBy[SortBy[#, N@Norm@# &], N@Norm@# &] &;
f2 = SplitBy[SortBy[#, N@Norm@# &], N@Norm@# &] &;
f3 = SortBy[GatherBy[#, N[Norm@#] &], N[Norm[#[[1]]]] &] &;
f1b = GatherBy[SortBy[#, #.# &], #.# &] &;
f2b = SplitBy[SortBy[#, #.# &], #.# &] &;
f3b = SortBy[GatherBy[#, #.# &], #[[1]].#[[1]] &] &;
f4 = With[{norms = #.# & /@ #, lst = #}, 
    lst[[#]] & /@ SplitBy[Ordering[norms], norms[[#]] &]] &;
f5 = With[{norms = #.# & /@ #, lst = #}, 
    lst[[#]] & /@ GatherBy[Ordering[norms], norms[[#]] &]] &;
f6 = With[{gathered = GatherBy[#, #.# &]}, 
    gathered[[Ordering[#.# & /@ (First /@ gathered)]]]] &;
f7 = With[{gathered = GatherBy[#, #.# &]}, 
    With[{norms = #.# & /@ (First /@ gathered)}, 
     gathered[[Ordering[norms]]]]] &;

list0 = RandomInteger[{0, 20}, {100000, 2}];
timings = 
  First[AbsoluteTiming[# = #2@list0;]] & @@@ 
   Transpose[{{l1, l1b, l2, l2b, l3, l3b, l4, l5, l6, l7}, {f1, f1b, 
      f2, f2b, f3, f3b, f4, f5, f6, f7}}];
functions = {"f1", "f1b", "f2", "f2b", "f3", "f3b", "f4", "f5", "f6", 
   "f7"};
TableForm[Transpose[{functions, timings}], 
 TableHeadings -> {None, {"functions", "timings"}}]

Mathematica graphics

Equal @@ ((Norm /@ (First /@ #) & /@ {l1, l1b, l2, l2b, l3, l3b, l4, l5, l6, l7}))
(* True *)

Additional timings:

list0 = RandomInteger[{0, 500}, {100000, 2}];

Mathematica graphics

list0 = RandomInteger[{0, 20}, {100000, 5}];

Mathematica graphics

kglr
  • 394,356
  • 18
  • 477
  • 896
  • Thanks. FYI, I just tested these with {{0.10^-2, 0.10^-2}, {1.3, 1.}, {0.2, 0.6}, {0.2, -0.6}, {1.3, -1.}, {-0.5, 1.5}, {-0.5, 0.4}, {0.6, 0.10^-2}, {-1.6, 0.10^-2}, {-0.5, -0.4}, {-0.5, -1.5}} and only f2 and f4 are giving me the results I am expecting. – Shb Feb 13 '16 at 22:40
  • 1
    The difference between using Norm and #.#& accounts for a lot of the speed differences. It would be interesting to compare the different approaches using the same norm function throughout. – Simon Woods Feb 13 '16 at 22:47
  • f7 too returns the wrong results for the example list in the updated question. – Shb Feb 13 '16 at 23:04
  • Thank you @Simon, I updated the post with your suggestions. – kglr Feb 13 '16 at 23:09
  • @Shb, the only difference between the outputs of the 10 functions is the ordering within each group; groups are ordered correctly. – kglr Feb 13 '16 at 23:15
  • f7 is lovely. One of those bits of code that seems obvious, once someone else has thought of it... – Simon Woods Feb 13 '16 at 23:17
  • again, f1[intersections] == f2[intersections] == f3[intersections] == f4[intersections] == f5[intersections] == f6[intersections] == f7[intersections] == f1b[intersections] == f2b[intersections] == f3b[intersections] returns false for me. – Shb Feb 13 '16 at 23:29
  • Feel free to check out the second update. – Shb Feb 13 '16 at 23:55
  • On my system f3b is as fast as f6 and f7 in your test. Using instead RandomInteger[{0, 500}, {100000, 2}] your function f1b is the fastest. (Mathematica 10.1.0 under Windows 7.) – Mr.Wizard Feb 14 '16 at 14:58
  • @Mr.Wizard, I added timings for a few more input structures. – kglr Feb 14 '16 at 15:23
6

This is more than 50% faster, since it evaluates only one Norm for each sublist:

AbsoluteTiming[list3 = SortBy[GatherBy[list, Norm], N[Norm[#[[1]]]] &];]
Simon Woods
  • 84,945
  • 8
  • 175
  • 324
Dr. belisarius
  • 115,881
  • 13
  • 203
  • 453
  • I had result = With[{n = #.# &}, list~GatherBy~n~SortBy~n@*First] - same principle but faster if list is integers. – Simon Woods Feb 13 '16 at 21:08
  • @SimonWoods Can't try it as it is, because V9 still here, but Norm is usually a slow thing, so I guess you're right :) – Dr. belisarius Feb 13 '16 at 21:10
  • see updated question. – Shb Feb 13 '16 at 22:09
  • @Shb The only difference is the order into the sublists list2 = {{0.*10^-2, 0.*10^-2}, {1.3, 1.}, {0.2, 0.6}, {0.2, -0.6}, {1.3, -1.}, {-0.5, 1.5}, {-0.5, 0.4}, {0.6, 0.*10^-2}, {-1.6, 0.*10^-2}, {-0.5, -0.4}, {-0.5, -1.5}}; Sort /@ SplitBy[SortBy[list2, Norm], Norm] == Sort /@ SortBy[GatherBy[list2, Norm], N[Norm[#[[1]]] &]] – Dr. belisarius Feb 13 '16 at 22:17
  • I don;t understand why you have an additional sort in there, but no your new suggestion too returns the old wrong answer. – Shb Feb 13 '16 at 22:29
  • @SimonWoods See updated question. – Shb Feb 13 '16 at 22:31
  • @Shb It returns {{{0., 0.}}, {{0.6, 0.}}, {{0.2, -0.6}, {0.2, 0.6}}, {{-0.5, -0.4}, {-0.5, 0.4}}, {{-0.5, -1.5}, {-0.5, 1.5}}, {{-1.6, 0.}}, {{1.3, -1.}, {1.3, 1.}}} which is good, AFAIK – Dr. belisarius Feb 13 '16 at 22:33
  • is it? I am expecting: {{{0.10^-2, 0.10^-2}}, {{0.6, 0.10^-2}, {-0.5, -0.4}, {-0.5, 0.4}, {0.2, -0.6}, {0.2, 0.6}}, {{-1.6, 0.10^-2}, {1.3, -1.}, {1.3, 1.}, {-0.5, 1.5}, {-0.5, -1.5}}} which is what I am getting with that new solution. – Shb Feb 13 '16 at 22:36
  • @Shb That's a completely different problem related with the way Mathematica evaluates and formats the output. Not related with your current question as it is (and there are a sleuth of questions about that ind of things in this site) – Dr. belisarius Feb 13 '16 at 22:39
  • Yes I understand, but it is a related question. I described my problem and received a few possible solutions. I am highlighting the fact that some of these solutions don't work and I would like to know why. But yes, it can get its own question if it ends up not being so obvious. – Shb Feb 13 '16 at 22:43
4
list = RandomInteger[{0, 20}, {10000, 2}];

AbsoluteTiming[list2 = GatherBy[SortBy[list, N[Norm[#]] &], Norm];]

(*  {0.149292, Null}  *)

SplitBy will partition without additional sorting; however, it is nonetheless slower.

AbsoluteTiming[list3 = SplitBy[SortBy[list, N[Norm[#]] &], Norm];]

(*  {0.212032, Null}  *)

Verifiying that the two results are identical

list2 === list3

(*  True  *)

Sort and SortBy sort by canonical order. This is only equivalent to numeric order for numbers rather than numeric expressions. See Possible Issues under Sort: "Numeric expressions are sorted by structure as well as numerical value"

EDIT: To address the revised question

list4 = {{0.*10^-2, 0.*10^-2}, {1.3, 1.}, {0.2, 
    0.6}, {0.2, -0.6}, {1.3, -1.}, {-0.5, 1.5}, {-0.5, 0.4}, {0.6, 
    0.*10^-2}, {-1.6, 0.*10^-2}, {-0.5, -0.4}, {-0.5, -1.5}};

ans1 = GatherBy[SortBy[list4, N[Norm[#] &]], Norm];

Since you are using machine numbers in this case, use of N is not required

ans2 = GatherBy[SortBy[list4, Norm], Norm];

ans3 = SplitBy[SortBy[list4, N[Norm[#]] &], Norm];

Again, since you are using machine numbers, use of N is not required

ans4 = SplitBy[SortBy[list4, N[Norm[#]] &], Norm];

These are all identical

ans1 === ans2 === ans3 === ans4

(*  True  *)

These are in the correct numeric order

OrderedQ[Norm[#[[1]]] & /@ ans1]

(*  True  *)
Bob Hanlon
  • 157,611
  • 7
  • 77
  • 198
  • I just noticed that GatherBy[SortBy[list, Norm], Norm] does not give the answer I was expecting. it gives me: {{{1, 7}}, {{1, 3}, {3, 1}}, {{1, 6}}, {{1, 20}}} I was expecting {{1,3},{3,1}} to come first. Also there is extra parenthesis. – Shb Feb 13 '16 at 20:17
  • I initially had: SortBy[GatherBy[list, Norm], Norm] but I realized that while it was giving me the right answer, it was only working because Norm[list]==Norm[{list}] which doesn't necessarily hold for a given function. – Shb Feb 13 '16 at 20:19
  • hmmm why is SortBy[list, Norm] giving me: {{1, 7}, {1, 3}, {3, 1}, {1, 6}, {1, 20}} – Shb Feb 13 '16 at 20:20
  • SortBy[list, N[Norm[#]] &] works though. – Shb Feb 13 '16 at 20:29
  • 1
    I suggest using #.#& in place of N@*Norm for integers - it omits the square root step but that doesn't change the ordering. – Simon Woods Feb 13 '16 at 21:10
  • see updated question. – Shb Feb 13 '16 at 22:09
  • Thanks. I am using Mathematica 10.0.0.0 and I have ans1==ans2 but not equal to (ans3==ans4) – Shb Feb 13 '16 at 23:06
  • btw, there was a typo in the list I was doing my tests with. It is corrected now, but my statement about ans1,ans2, ans3, ans4 remains valid. They are not equal on my machine. – Shb Feb 13 '16 at 23:21
  • Feel free to check out the second update. – Shb Feb 13 '16 at 23:56
  • 2
    I am finished chasing a moving target. – Bob Hanlon Feb 14 '16 at 01:53
  • @BobHanlon knowledge is a moving target! :) I have edited the question text to make it more readable and reproducible. Basically some of the provided solutions give the incorrect answer for a given input list. I would like to know why. Would be great if you could have another look. – Shb Feb 14 '16 at 11:00