12

Say, we have a list:

m1 = {2, 2, 7, 0, 7, 7, 2, 2, 2}

It can be split easily:

Split @ m1
(* {{2, 2}, {7}, {0}, {7, 7}, {2, 2, 2}}  *)

Wanted: a method to get the following list:

{{1, 2}, {3}, {4}, {5, 6}, {7, 8, 9}}

It should be as simple as possible and fast for long lists.

C. E.
  • 70,533
  • 6
  • 140
  • 264
garej
  • 4,865
  • 2
  • 19
  • 42

11 Answers11

13

You can use

SplitBy[Range@Length@m1, m1[[#]] &]
Simon Woods
  • 84,945
  • 8
  • 175
  • 324
9

There is a complicated trade-off between the speed and compact form in this case, so I have decided to post the version with Range, which I consider simple enough (comprehensible for new users) and second fast among the conterparts (at least, on my machine).

It is heavily based on @Mr.Wizard solution farsightedly provided by ChrisDegnen, so I do not claim originality:

dynS[p_] := Range @@@ Thread[{Accumulate@p - p + 1, Accumulate@p}]

Method

And after looking at @Mr.Wizard SparseArray solution I finally realize that we may use Listable attribute of Range to get even more compact version (this time I would prefer to keep pure function notation #). So this is favorite method for me (not mine :)!

dynSP[p_] := Range[# - p + 1, #]& @ Accumulate @ p


Timing benchmarking

I use the long list for benchmarking:

m1 = Flatten[Table[#, # + 1] & /@ RandomInteger[{1, 200}, 10^5]];
Length[m1]
(* 10156647 *)

And packed version later (second timing output in each method).

m1 = Developer`ToPackedArray[m1];

Range[Prepend[# + 1, 1], Append[#, Length @ m1]] & 
@ SparseArray[Differences @ m1]["AdjacencyLists"] // Length // RepeatedTiming
(* {0.403, 99530} *)
(* {0.274, 99489} *)

dynSP[Length /@ Split @ m1] // Length // RepeatedTiming
(* {0.476, 99439} *)
(* {0.626, 99439} *)

dynS[Length /@ Split @ m1] // Length // RepeatedTiming
(* {0.506, 99495} *)
(* {0.715, 99489} *)

Internal`PartitionRagged[Range[Length@m1], Length /@ Split@m1] // Length // RepeatedTiming
(* {0.589, 99495} *)
(* {0.78, 99489}  *)

dynP[Range@Length@m1, Length /@ Split @ m1] // Length // RepeatedTiming
(* {0.613, 99495} *)
(* {0.83, 99489}  *)

Module[{i = 1}, Replace[Split@m1, _ :> i++, {-1}]] // Length // RepeatedTiming
(* {3.845, 99439} *)
(* {4.1, 99439}   *)

Module[{i = 0}, Map[++i &, Split[m1], {-1}]] // Length // RepeatedTiming
(* {6.57, 99495} *)
(* {6.85, 99489} *)

SplitBy[Range @ Length @ m1, m1[[#]] &] // Length // RepeatedTiming
(* {24.6, 99495} *)
(* {25., 99489}  *)

Note: fastest function with SparseArray has been added a bit later so its result in terms of length is slightly different. The same is for the Module with Split version and my favorite DynSP.

garej
  • 4,865
  • 2
  • 19
  • 42
8

Perhaps:

s = Split@m1;
Internal`PartitionRagged[Range[Length@m1], Length /@ s]
ubpdqn
  • 60,617
  • 3
  • 59
  • 148
7

Internal`CopyListStructure is quite fast:

Internal`CopyListStructure[Split @ #, Range @ Length @ #] & @ m1
 {{1, 2}, {3}, {4}, {5, 6}, {7, 8, 9}}
kglr
  • 394,356
  • 18
  • 477
  • 896
6

Too late for the party so here's something old style:

Module[{i = 0}, Map[++i &, Split[m1], {-1}]]

or

SplitBy[MapIndexed[Flatten@*List, m1], First][[;; , ;; , 2]]
Kuba
  • 136,707
  • 13
  • 279
  • 740
5

Using Mr.Wizard's ragged partition function here

dynP[l_, p_] := MapThread[l[[# ;; #2]] &,
  {{0}~Join~Most@# + 1, #} &@Accumulate@p]

m1 = {2, 2, 7, 0, 7, 7, 2, 2, 2};
m2 = Split@m1;

dynP[Range@Length@m1, Length /@ m2]

{{1, 2}, {3}, {4}, {5, 6}, {7, 8, 9}}

Chris Degnen
  • 30,927
  • 2
  • 54
  • 108
  • ,thank you, ironically I saw this solution a year ago but was incapable to grasp it :)) – garej Jan 23 '16 at 10:21
5
m1 = {2, 2, 7, 0, 7, 7, 2, 2, 2};
Module[{i = 1}, Replace[Split@m1, _ :> i++, {-1}]]
(* {{1, 2}, {3}, {4}, {5, 6}, {7, 8, 9}} *)
march
  • 23,399
  • 2
  • 44
  • 100
  • @Mr.Wizard, I remember warnings concerning Block, so never use(d) it. – garej Jan 24 '16 at 19:06
  • @Mr.Wizard. That's interesting. I should go read the use cases for different scoping constructs post again. – march Jan 24 '16 at 19:06
  • march, using Block incorrectly seems to get a lot of people; as stated I did it myself many times before more experienced users (Szabolcz or Leonid probably) pointed out my mistake. Even Wolfram developers do it! Using his code on numbering[{{a, b}, {m, n}, {x, y}}] for example note that n has been incorrectly replaced by 0 in the output. – Mr.Wizard Jan 24 '16 at 19:12
4

Adapted from my answer to a related question:

runs[a_List] := 
 Range[Prepend[# + 1, 1], Append[#, Length@a]] &@
  SparseArray[Differences@a]["AdjacencyLists"]

Now:

runs @ {2, 2, 7, 0, 7, 7, 2, 2, 2}
{{1, 2}, {3}, {4}, {5, 6}, {7, 8, 9}}
Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
  • Oh, SparseArray!!! I was about to ask how to apply it here :))) – garej Jan 24 '16 at 19:32
  • @garej Glad I could be of help. :-) Please add this to your benchmark. – Mr.Wizard Jan 24 '16 at 19:34
  • @garej I see that a new syntax has been added for Table, after version 10.1.0 (which I use). Anyway (at least in 10.1) your m1 is not packed; would you please try your benchmark also with m1 = Developer`ToPackedArray[m1]; ? – Mr.Wizard Jan 24 '16 at 19:42
3

I'll add another option, but be warned: It's rather slow.

FoldPairList[TakeDrop, Range@Length@m1, Length /@ (Split @ m1)]
V.E.
  • 1,700
  • 17
  • 16
  • yes, it hangs my machine with m1 = Flatten[Table[#, # + 1] & /@ RandomInteger[{1, 100}, 10^5]]; but it nice to have it here. It is as slow as using new SequencePosition ;) – garej Jan 23 '16 at 22:00
3
SplitBy[Transpose[{m1, Range@Length@m1}], First][[;; , ;; , -1]]

or

m2 = Range@Length@m1;
i = 1; Split[m2, m1[[j = i++]] === m1[[j + 1]] &]
Basheer Algohi
  • 19,917
  • 1
  • 31
  • 78
3

Just wanted to belatedly add a new-style spin on Kuba's classic old-style answer, using the "Counter" DataStructure

With[{counter = CreateDataStructure["Counter", 1]},
 Map[counter["Increment"] &, Split[m1], {2}]]
(* {{1, 2}, {3}, {4}, {5, 6}, {7, 8, 9}} *)
Pillsy
  • 18,498
  • 2
  • 46
  • 92