9

I have a list of the form:

list={{0,...},{1,...},{1,...},{0,...},{3,...},{3,...},{0,...},{0,...},{5,...},{5,...},{5,...},{0,...},{5,...},{0,...},...}

So when we take all the first elements we get a run of integers:

list[[All,1]]
(* {0,1,1,0,3,3,0,0,5,5,5,0,5,0,...} *)

What I want to do is sort my list based on the first element of each sublist (the integers) by gathering the non zero integers but preserving the zeroes between them. So for this example for the sorted list the list of all first elements would look like this:

{0,1,1,0,3,3,0,0,5,5,5,5,0,0,...}

i.e. the second and subsequent occurrence of "5" get moved to join the earlier occurrences. Likewise for all other occurrences of integers -- they get moved up to join the first occurrence or group of occurrences.

I am doing this in a round about way at the moment in which I record a list of positions after the reordering and then return list[[positions]]. I can post what I am doing at the moment but am interested to know if anyone has a one or two liner type solution.

Also I wasn't quite sure how to title this question to make it easier for searches. Any ideas on that?

Edit

The integers will not necessarily appear in order. So, for example, the first appearance of a non zero integer could be ordered like

3, 1, 5, 4, 6, ...

The function below is what I am using to return the list of positions:

sortedPositions[list_List] := 
  Module[{tmp = list[[All, 1]], length, pos, tmp1, tmp2, 
    tmp3},
   length = Length[tmp];
   tmp1 = List /@ Cases[Transpose[{tmp, Range[length]}], {0, _}];
   tmp2 = DeleteCases[Transpose[{tmp, Range[length]}], {0, _}];
   tmp3 = GatherBy[tmp2, First];
   tmp2 = Join[tmp1, tmp3];
   Flatten[SortBy[tmp2, #[[1, 2]] &], 1][[All, 2]]
   ];

But it seems like a lot of code to get the result I need. Here is a test list:

num = 20;
testList = Join[List /@ RandomInteger[{0, 9}, num], RandomReal[{0, 1}, {num, 6}], 2]

(* 
{{6,0.456203,0.0900917,0.62677,0.638615,0.227849,0.61252},
{4,0.317069,0.44889,0.456945,0.05121,0.940742,0.495415},
{7,0.573698,0.381817,0.859495,0.517238,0.459022,0.957771},
{5,0.832945,0.867634,0.0843833,0.296803,0.944986,0.563913},
{1,0.598743,0.803861,0.082542,0.138926,0.630364,0.0445202},
{7,0.289183,0.257115,0.358083,0.677393,0.206347,0.987678},
{5,0.947487,0.320408,0.600928,0.0718489,0.976703,0.449376},
{0,0.0996927,0.210278,0.408291,0.861885,0.946081,0.0522955},
{0,0.537572,0.160541,0.212737,0.508406,0.353786,0.479605},
{7,0.0815373,0.0677839,0.388955,0.681041,0.795607,0.404398},
{4,0.18704,0.253819,0.141732,0.43889,0.931269,0.556534},
{2,0.262136,0.110553,0.60296,0.482498,0.693049,0.430039},
{5,0.569696,0.262133,0.397575,0.246202,0.499777,0.073326},
{6,0.487893,0.121165,0.413376,0.874849,0.836484,0.792685}, 
{0,0.677934,0.543956,0.593967,0.138832,0.896184,0.604194},
{2,0.138691,0.150235,0.614355,0.326924,0.615902,0.900494},
{0,0.0254698,0.258354,0.377134,0.569083,0.0925844,0.672802},
{7,0.354392,0.976598,0.658138,0.124943,0.39485,0.239671},
{2,0.622461,0.195612,0.997663,0.421797,0.130802,0.110463},
{2,0.136431,0.799215,0.698071,0.0599957,0.452992,0.378609}} *)

Find the position order you want in your final list:

positions = sortedPositions[testList]
(* {1, 14, 2, 11, 3, 6, 10, 18, 4, 7, 13, 5, 8, 9, 12, 16, 19, 20, 15, \
17} *)

Make your "sorted" list "sorting" according to an algorithm applied to the first element:

testList[[positions]]
(* 
{{6,0.456203,0.0900917,0.62677,0.638615,0.227849,0.61252},
{6,0.487893,0.121165,0.413376,0.874849,0.836484,0.792685},
{4,0.317069,0.44889,0.456945,0.05121,0.940742,0.495415},
{4,0.18704,0.253819,0.141732,0.43889,0.931269,0.556534},
{7,0.573698,0.381817,0.859495,0.517238,0.459022,0.957771},
{7,0.289183,0.257115,0.358083,0.677393,0.206347,0.987678},
{7,0.0815373,0.0677839,0.388955,0.681041,0.795607,0.404398},
{7,0.354392,0.976598,0.658138,0.124943,0.39485,0.239671},
{5,0.832945,0.867634,0.0843833,0.296803,0.944986,0.563913},
{5,0.947487,0.320408,0.600928,0.0718489,0.976703,0.449376},
{5,0.569696,0.262133,0.397575,0.246202,0.499777,0.073326},
{1,0.598743,0.803861,0.082542,0.138926,0.630364,0.0445202},
{0,0.0996927,0.210278,0.408291,0.861885,0.946081,0.0522955},
{0,0.537572,0.160541,0.212737,0.508406,0.353786,0.479605},
{2,0.262136,0.110553,0.60296,0.482498,0.693049,0.430039},
{2,0.138691,0.150235,0.614355,0.326924,0.615902,0.900494},
{2,0.622461,0.195612,0.997663,0.421797,0.130802,0.110463},
{2,0.136431,0.799215,0.698071,0.0599957,0.452992,0.378609},
{0,0.677934,0.543956,0.593967,0.138832,0.896184,0.604194},
{0,0.0254698,0.258354,0.377134,0.569083,0.0925844,0.672802}}
 *)

So by "sorting"/"gathering" based on doing something with the first elements you do something like what I have tried to illustrate in the image below:

enter image description here

and create a new ordering of ultimately the initial list (testList) with the new order probably best seen by the new order of the first elements:

enter image description here


As per Mr.Wizards answer what I am wanting to do is gather the list based on the first elements however I don't want to gather the zeros so only non-zero first elements are grouped.

Carl Woll
  • 130,679
  • 6
  • 243
  • 355
Mike Honeychurch
  • 37,541
  • 3
  • 85
  • 158
  • are your integers all positive? – rm -rf Oct 03 '12 at 00:06
  • yes all positive – Mike Honeychurch Oct 03 '12 at 00:19
  • I think I have a working solution, but I have one more question: Barring the zeros, do the integers appear in order? i.e., 1, 3, 5,... (even if a 1 appears later after 5)? – rm -rf Oct 03 '12 at 00:29
  • with make an edit to clarify – Mike Honeychurch Oct 03 '12 at 00:31
  • Your sortedPositions function does NOT preserve the zeros from the original list. Thus it doesn't work as you say you want it to work. I recommend you get rid of the first List, and use a testList in which you specify exactly all of the elements. Then you ought to show exactly the output you expect. – DavidC Oct 03 '12 at 00:57
  • The current function preserves the occurrence of the zeros in between occurrences of the subsequently gathered integers. If the semantics are clumsy I think {0,1,1,0,3,3,0,0,5,5,5,5,0,0,...} shows what I mean. There was one zero after the first one or more sequential 1s therefore there should be that same zero after any grouping of 1s. There was two zeros after the first one or more sequential 3s. I want to keep two zeros after all grouped 3s. There was one zero after the first occurrence of one or more sequential 5s. I want to keep one zero after all the gathered 5s. etc. – Mike Honeychurch Oct 03 '12 at 01:22
  • 1
    @MikeHoneychurch Your function is underspecified since you don't indicate what sorting actually means in your context. In other words, it is not clear from your description, how would one obtain a sorted list you started from, from an arbitrary unsorted list involving zeros and non-zero elements. Yet you want the result to work on arbitrary (generally unsorted) lists. – Leonid Shifrin Oct 03 '12 at 01:26
  • ...cont. However if you had {1,...,0,1,0,...} the second occurrence of 1 moves up to join the first 1 and you would now have two zeros together. So okay I think this is difficult for em to explain in writing. The test function delivers the output I want but I was wondering if a more efficient method was possible. – Mike Honeychurch Oct 03 '12 at 01:28
  • 1
    Before you withdraw a question, think about this: if you are unable to communicate what it should do, are you sure that what it currently does for you is correct, in all cases you are interested in? – Leonid Shifrin Oct 03 '12 at 01:38
  • @LeonidShifrin yes what it currently does is what I want, i.e. correct. My reason for asking the question was that it seemed like a long way of getting to the result. However if I am unable to communicate this adequately then obviously alternative, and/or more efficient, methods are unlikely to be offered as answers. Hence probably best to close and maybe I'll repost at a later date if I can figure out a better way to describe in writing what I want to do. – Mike Honeychurch Oct 03 '12 at 01:44
  • Why not just talk about the actual problem that is requiring you to sort your lists in this manner? – J. M.'s missing motivation Oct 03 '12 at 02:03
  • @J.M. Unfortunately I will be out for probably the rest of the day but I'll make a further edit with a worked example using the current code – Mike Honeychurch Oct 03 '12 at 03:11

4 Answers4

7

It seems my understanding was correct.

Unique[] is concise and descriptive but runs slower every time it is used. A more robust method is:

group2[lst_] := 
  Module[{x, i = 1}, Join @@ GatherBy[lst, #[[1]] /. 0 :> x[i++] &]]

Compare these Timings:

big = RandomInteger[5, {10000, 3}];

Table[group[big] // Timing // First, {5}]

Table[group2[big] // Timing // First, {5}]

{0.172, 0.515, 0.842, 1.17, 1.529}

{0., 0.016, 0.015, 0., 0.016}

  • Note: The timings above were performed in Mathematica 7. In v10.1 I cannot reproduce the progressive slow-down so I believe this problem has been corrected.

I believe you just want a Gather where zeros are considered unique:

group[lst_] := Join @@ GatherBy[lst, #[[1]] /. 0 :> Unique[] &]

Test:

Join[List /@ RandomInteger[{0, 5}, 20], RandomReal[{0, 1}, {20, 6}], 2];

%[[All, 1]]

group[%%][[All, 1]]
{0, 0, 4, 5, 0, 3, 3, 3, 4, 2, 4, 2, 3, 2, 5, 0, 5, 0, 1, 4}

{0, 0, 4, 4, 4, 4, 5, 5, 5, 0, 3, 3, 3, 3, 2, 2, 2, 0, 0, 1}

If not at least I tried. :-)

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
  • "you just want a Gather where zeros are considered unique" ...wow, one sentence summarizes what I was trying to say unsuccessfully in multiple paragraphs! excellent! – Mike Honeychurch Oct 03 '12 at 08:00
  • @Mike Glad I could be of help. :-) (I don't always express myself well so I know the frustration!) – Mr.Wizard Oct 03 '12 at 08:11
  • and as is the case when you see it expressed correctly you immediately wonder why you couldn't think of that! Thanks again, that replacement with Unique is a good trick. – Mike Honeychurch Oct 03 '12 at 08:13
  • BTW I changed the title which hopefully now better reflects the problem and will be more useful for others to find your answer in the future. – Mike Honeychurch Oct 03 '12 at 08:14
  • @Mike it is a nice trick, but IIRC if using this on huge data sets it may cause a memory leak. You may want to use something like this: Module[{x, i = 1}, Join @@ GatherBy[lst, #[[1]] /. 0 :> x[i++] &]] and possibly even Remove should this prove to be a problem. I cannot remember where someone (perhaps Leonid) described this problem but I'll see if I can find it. – Mr.Wizard Oct 03 '12 at 08:18
  • ok. thanks for that clarification – Mike Honeychurch Oct 03 '12 at 08:21
  • @Mike I couldn't find the post I was thinking of but I put an illustration of the problem in my answer, along with the improved version. The magnitude of the problem is pretty severe in this application. – Mr.Wizard Oct 03 '12 at 08:35
  • Thanks. That is a major slowdown relative to i++. – Mike Honeychurch Oct 03 '12 at 08:48
3

Here is a refinement of @Mr.Wizards answer using my GatherByList function. The function is short:

GatherByList[list_, representatives_] := Module[{func},
    func /: Map[func, _] := representatives;
    GatherBy[list, func]
]

And using GatherByList:

g[list_] := Module[{min = 0},
    Join @@ GatherByList[list, Replace[list[[All, 1]], 0 :> min--, {1}]]
]

Timing comparison:

list = RandomInteger[5, {10^5, 3}];

r1 = g[list]; //AbsoluteTiming
r2 = group2[list]; //AbsoluteTiming

r1 === r2

{0.030611, Null}

{0.219069, Null}

True

Carl Woll
  • 130,679
  • 6
  • 243
  • 355
2

Here's an easy way using //.:

list = {0, 1, 1, 0, 3, 3, 0, 0, 5, 5, 5, 0, 5, 0, 1};
Split[list] //. {h___, x : {a_, ___}, m___, y : {a_, ___}, t___} :> 
    {h, x ~Join~ y, m, t} /; a =!= 0 // Flatten

(* {0, 1, 1, 1, 0, 3, 3, 0, 0, 5, 5, 5, 5, 0, 0} *)
rm -rf
  • 88,781
  • 21
  • 293
  • 472
  • This implicitly assumes that the first appearance of each integer (other than 0) is in order. i.e., 1, 3, 5... and not 4, 1, 5, 3,... – rm -rf Oct 03 '12 at 00:37
  • Sorry I was a bit slow getting my edit up in reply to your comment. Unfortunately the order could be 4,1,5,3 ... – Mike Honeychurch Oct 03 '12 at 00:52
  • 1
    @MikeHoneychurch In that case it is underspecified... consider list = {0, 4, 1, 1, 0, 3, 3, 0, 0, 5, 5, 5, 0, 5, 0, 1}; Does the 4 go immediately after 3 or just before 5? – rm -rf Oct 03 '12 at 01:17
  • I apologize for my inability to explain in writing what I am looking for but the test function delivers the result i am after -- but it seems to be a long way to get the result. For your example above the reordered list (of first elements) would be {0, 4, 1, 1, 1, 0, 3, 3, 0, 0, 5, 5, 5, 5, 0, 0} – Mike Honeychurch Oct 03 '12 at 01:30
  • Mike, so your code doesn't sort the integers either... it merely collects later occurrences along with the first. Mine does the same too. – rm -rf Oct 03 '12 at 02:48
2

There is one way.

list={{1,a},{1,b},{0,c},{2,d},{0,e},{2,f},{0,g},{4,h},{0,j}}

strangeSort[list_]:=Module[{r},
r=Split[list,#1[[1]]==0&];
r=GatherBy[r,#[[-1,1]]&];
r={#[[1]],Reverse@SortBy[Flatten[#[[2;;-1]],1],#[[1]]]}&/@r;
r=Flatten[r,2]]

strangeSort[list]

Result:

{{1,a},{1,b},{0,c},{2,d},{2,f},{0,e},{0,g},{4,h},{0,j}}

Take care with pattern to big lists, they are slow.

Murta
  • 26,275
  • 6
  • 76
  • 166