17

I have a list of numbers (say, for example, {1,2,3,4,3,2,1,2,3,4,3,2}), and I want to split it into sublists so that each sublist is the (smallest) sublist whose sum is larger than the preceding one. For the above example, my desired output is {{1},{2},{3},{4},{3,2},{1,2,3},{4,3},{2}} (the sums are 1, 2, 3, 4, 5, 6, 7, and the {2} at the end just uses the remaining leftover elements).

Neither Split nor SplitBy seems to be able to do what I need. Is there a primitive to do this, or would someone care to invent a clever function?

rogerl
  • 4,209
  • 3
  • 27
  • 42
  • How about showing what you've tried? (this is trivial to do...) – ciao Aug 28 '15 at 16:46
  • 2
    Well, the only thing I've tried is a complicated solution involving keeping track of the current sum and the tail of the list and using that to find the next greater head. – rogerl Aug 28 '15 at 16:54
  • I still think it's worth putting in what you've tried. Some here (me included, actually, but not everyone) like to see the effort that people have put into solving their problem. It could be that what you've tried is the right way to do it, but it just needs a little tweak. In correcting it, a couple of things can happen: you can learn a little more about MMA syntax, you can learn some coding tricks, you can better prepare for http://mathematica.stackexchange.com/questions/18393/what-are-the-most-common-pitfalls-awaiting-new-users, other users can learn from the post, etc – march Aug 28 '15 at 17:47
  • @march Fair enough. To be honest, I was hoping I was just missing something in SplitBy, or that another primitive would do the trick, and I wouldn't have to write anything. – rogerl Aug 28 '15 at 18:30
  • How big will the lists be? Is performance important (the methods so far can be handily beaten if it is), or will list be so small it does not matter? – ciao Aug 28 '15 at 18:34
  • Performance is not particularly important. You can assume the list will have under 1000 elements. – rogerl Aug 28 '15 at 18:45
  • @ciao But an elegant solution is always preferred. – rogerl Aug 28 '15 at 18:51

9 Answers9

9

Here is a purely functional solution (i.e. not using mutable state), based in FoldList (since the one based on linked lists has been already taken):

stepF = 
  Function @ With[{sum = First @ #1 + #2, len = #[[2]], prevsum = #1[[3]]},
    If[sum > prevsum, {0, 0, sum, len + 1}, {sum, len + 1, prevsum, 0}]
  ];

getLengths[lst_List] :=
  DeleteCases[0] @ Append[#, Length[lst] - Total[#]] & @
     FoldList[stepF, {0, 0, 0, 0}, lst][[All, -1]];

splitInc[lst_] := Internal`PartitionRagged[lst, getLengths[lst]]

For example:

splitInc[lst]

(* {{1}, {2}, {3}, {4}, {3, 2}, {1, 2, 3}, {4, 3}, {2}} *)

The idea is to use FoldList to determine the lengths of all sublists, and then split. At every iteration in FoldList the function stepF takes a list of 4 elements {currentSum, currentLength, previousSum, splitLengthOrZero} as a first argument, and a next element of the original list as a second argument, and returns a modified list of 4 elements - working basically as a state machine. We then just have to pick those states where the last element (splitLengthOrZero) is non-zero, and we get a list of lengths. In place of Internal`PartitionRagged one could also use Mr.Wizard's dynP.

Leonid Shifrin
  • 114,335
  • 15
  • 329
  • 420
  • @Xavier Thanks. Glad you liked it. You can search the site for dynP function of Mr.Wizard, which has comparable performance to Internal`PartitionRagged, while being implemented purely in top-level code. – Leonid Shifrin Aug 29 '15 at 01:45
  • Neat, but fails when negatives present. +1 though. Interestingly, dramatic slowdown with reals. – ciao Aug 29 '15 at 04:09
  • @ciao Thanks. It does work with negative numbers, for me. Can you provide an example where it fails for you? Re: slowdown - indeed. Probably related to the auto-compilation, although I still don't understand why - by default the arguments of compiled function are anyway reals. – Leonid Shifrin Aug 29 '15 at 13:35
  • @ciao One guess would be that, for integers, the list of 4 elements produced at each iteration of Fold, can still be packed, and is packed since Fold auto-compiles. For reals, the list contains 2 integers and 2 reals, and so can't be packed. Not sure if I am right, but I can't see any better explanation at the moment. One thing that kind of confirms my guess is that if you use N@{0, 0, 0, 0} as a starting value in FoldList, instead of {0, 0, 0, 0}, then the performance on integers also degrades just like it does for reals. – Leonid Shifrin Aug 29 '15 at 13:41
  • @LeonidShifrin: Range[-3,3] trivial example - s/b 7 distinct sets, but output is just the range... – ciao Aug 29 '15 at 19:20
  • @ciao This is not an error per se. The initial value of the sum, which the sum of elements in the first sublist of the result must be larger than, has not been specified by the OP. I took it to be zero. In this interpretation, the answer is right. Alternatative interpretation is that it it e.g. -Infinity (or, simply, any number smaller than the first element). The problem is simply underspecified in this respect. – Leonid Shifrin Aug 29 '15 at 19:24
  • @LeonidShifrin: Fair enough (depends on the what definition of "is" is, eh? ;-} ) - btw -cool way to do this for large lists is via bin search - quite fast... – ciao Aug 29 '15 at 19:42
  • @ciao You got me intrigued - how would you apply a binary search to this problem? – Leonid Shifrin Aug 29 '15 at 19:52
  • @LeonidShifrin: Shall I spoil the ending, or do you want the "ah-ha!" yourself? Hint: Look at the pattern of positions in the accumulation of the target list vs where the subset breaks happen... – ciao Aug 29 '15 at 20:02
  • @ciao As much as I'd like to solve this puzzle, I can't afford taking any more time from other things I need to take care of, at the moment. I did actually look at the accumulated original list before asking you, but I guess my pattern-recognition skills got seriously worse in the few last years. But the one thing that really stopped me from looking in that direction initially was that we can't exclude negative elements in the list, in which case accumulated list will not be necessarily sorted, and so binary search can't be used. But I see the direction of thought. – Leonid Shifrin Aug 29 '15 at 20:28
5

Split

With a special test function as the second argument Split does give the desired result:

ClearAll[f]
f = Module[{s = 0, t = 0}, Split[#, Or[(s += #) < t, t = s; s = 0] &]] &;

Examples:

f @ {1, 2, 3, 4, 3, 2, 1, 2, 3, 4, 3, 2}

{{1}, {2}, {3}, {4}, {3, 2}, {1, 2, 3}, {4, 3}, {2}}

lst = RandomInteger[9, 10]

{{1, 0, 8, 7, 7, 5, 3, 6, 8, 0}

f @ lst

{{1}, {0, 8}, {7, 7}, {5, 3, 6}, {8, 0}}}

See this and this for the origin of Or trick.

kglr
  • 394,356
  • 18
  • 477
  • 896
5
a = {1, 2, 3, 4, 3, 2, 1, 2, 3, 4, 3, 2};

f = Module[{b, c, d, n},
    b = {{First[#]}};
    c = Rest[#];
    Catch[
     While[True,
      n = 1; While[Total[d = Quiet@Check[Take[c, n],
            Throw[AppendTo[b, c]]]] <= Total@Last[b], n++];
      AppendTo[b, d];
      c = Drop[c, n]]];
    b] &;

f[a]

{{1}, {2}, {3}, {4}, {3, 2}, {1, 2, 3}, {4, 3}, {2}}

Chris Degnen
  • 30,927
  • 2
  • 54
  • 108
  • Nice solution, thanks. That's along the lines of what I was thinking in my comment above, but done more elegantly than I would probably have written it. I was hoping for a solution that didn't involve explicit loops. – rogerl Aug 28 '15 at 17:18
  • I think the Catch and inner While can be replaced by the following, which completely avoids an inner loop: While[Length@c > 0, d = Accumulate@c; n = Max@FirstPosition[d, x_ /; x > Total@Last@b, Length@d]; b = Append[b, Take[c, n]]; c = Drop[c, n]]; – rogerl Aug 28 '15 at 18:44
  • @Xavier - quite right. I have updated my answer. – Chris Degnen Aug 28 '15 at 20:48
5

A bit more concise, seems at least as fast as those posted so far:

setter[list_] := Module[{fs = 0, t = First@list - 1, f, u},
   f[x_] := If[(fs += x) > t, t = fs; fs = 0; True, False];
   u = Union[Pick[Range@Length@list, f /@ list], {Length@list}];
   MapThread[list[[#1 ;; #2]] &, {Prepend[Most@u, 0] + 1, u}]];
ciao
  • 25,774
  • 2
  • 58
  • 139
3

Here is another possible solution:

mySorting[{}, _] := {};

mySorting[list_, sum_] := 
  Block[{firstelem = First@list, listlength = Length@list,tempsum, poselem = 1},
   tempsum = firstelem;
   While[tempsum <= sum && poselem + 1 <= listlength, 
        tempsum = tempsum + list[[poselem++ + 1]]];
   {list[[Range@poselem]], Sequence @@ mySorting[Drop[list, poselem], tempsum]}];

mySorting[list_] := 
  With[{firstelem = First@list}, 
  {{firstelem}, Sequence @@ mySorting[Rest@list, firstelem]}];

For your example list = {1,2,3,4,3,2,1,2,3,4,3,2} we get

mySorting[list]
{{1}, {2}, {3}, {4}, {3, 2}, {1, 2, 3}, {4, 3}, {2}}
3

Yet another answer:

pos[list_] := Module[{currcounter = 0, currmax = 0},
   Map[(
    currcounter += #;
    If[currcounter > currmax, 
        currmax = currcounter; 
        currcounter = 0;
        False,
      currcounter += #;
       True]) &, 
    list]]

g[list_] := 
 With[{arr = pos[list]},
  With[{splitpos = Split[arr, #1 == True &]},
   ReplacePart[splitpos, 
    Rule @@@ Transpose@{Position[splitpos, True | False], list}]
  ]]

pos[{1,2,3,4,3,2,1}] is {False, False, False, False, True, False, True}, with True in any position that is not the end of a required sub-list. The function g then Splits that list, so as to get a resulting list of exactly the right shape but with False and True instead of the required list elements. Finally, we insert the list elements.

Patrick Stevens
  • 6,107
  • 1
  • 16
  • 42
  • 1
    The output starts {{1, 2} ...} but it should be {{1}, {2} ...}. – Chris Degnen Aug 28 '15 at 21:07
  • Thanks, I must have been asleep there. There's an irritating reason why that's true. – Patrick Stevens Aug 28 '15 at 21:11
  • @Xavier, this whole approach is badly flawed because SplitBy evaluates the function once for every pair of adjacent elements. Under the hood, it's just calling Split, which is exactly what I wanted to avoid happening. I was expecting it just to evaluate the function at each element, and then split according to that list. – Patrick Stevens Aug 28 '15 at 21:14
  • @Chris Algorithm now fixed. Slightly different method. – Patrick Stevens Aug 28 '15 at 21:28
2

I've made a very rough-and-ready one which uses linked lists:

toLinkedList = Fold[{#2, #1} &, {}, Reverse@#]&;
r[list_, currval_, currans_, curransval_] := 
 r[list[[2]], currval, {list[[1]], currans}, curransval + list[[1]]]
r[{}, c_, a_, v_] := a
r[list_, currval_, currans_, curransval_] /; curransval > currval := 
 {currans, r[list, curransval, {}, 0]}

partition[list_] := r[toLinkedList[list], 0, {}, 0]

Output:

{{1, {}}, {{2, {}}, {{3, {}}, {{4, {}}, {{2, {3, {}}}, {{3, {2, {1, {}}}}, {{3, {4, {}}}, {2, {}}}}}}}}}

which is linked-list speak for

{{1}, {2}, {3}, {4}, {3,2}, {1,2,3}, {4,3}, {2}}
Patrick Stevens
  • 6,107
  • 1
  • 16
  • 42
  • … although de-linked-listing it is more than my tired brain wants to handle at the moment, since Flatten won't work, being a linked list of linked lists. – Patrick Stevens Aug 28 '15 at 17:12
  • Here's a silly way: Flatten@*First /@ Drop[NestWhileList[Last@# &, output, Length@# == 2 &], -1] /. HoldPattern[Flatten[s_?AtomQ]] :> {s}. – march Aug 28 '15 at 18:16
  • Or: Drop[#, -1] & /@ Internal'PartitionRagged[#, Differences@Join[{0}, Flatten@Position[#, ll[]]]] &@ Flatten@(output /. {} -> ll[]) – march Aug 28 '15 at 19:02
  • 2
    This is better than the first (sorry if I'm being annoying; I'm just interested in how to do this): Flatten@*First /@ NestList[Last@# &, output, Count[output, {}, Infinity] - 1]/. HoldPattern[Flatten[s_?AtomQ]] :> {s}. – march Aug 28 '15 at 19:30
  • I have explained in a linked post, what to do when your elements are themselves lists - section named "Generalized linked lists". You have to use a symbolic container like ll (or any other symbol) instead of List, and then use Flatten with 3 args. I am a bit surprised that you didn't see that, given that you provided a link to that post. In any case, linked lists were the first thing that also came to my mind when I read the question. – Leonid Shifrin Aug 28 '15 at 22:07
  • @Leonid That involves making different heads for each linked list, does it not? I was tired and didn't want to think any harder. – Patrick Stevens Aug 29 '15 at 07:37
  • @PatrickStevens Yes, you are right. You pick a symbol which becomes the head of the linked list "cons cell", and in case when you work with several linked lists at the same time, you will indeed generally have to pick different symbols for them. In practice, however, it is pretty rare to have more than two linked lists interacting with each other at the same time. – Leonid Shifrin Aug 29 '15 at 13:37
2

I also thought immediatly about the Fold function using the #1 and #2 arguments of it. Here is my try:

Given the OP list:

mylist = {1, 2, 3, 4, 3, 2, 1, 2, 3, 4, 3, 2}

then:

(for versions<10.2 replace Nothing with Sequence[]):

foo = (If[(sn = #2 + Tr@Last@#1) > s, s = sn; lst = {}, 
     lst = Nothing]; {Sequence @@ Most@#1, Join[Last@#1, {#2}], lst}) &;

Fold[foo, {{s = First@mylist}, {}}, Rest@mylist]

{{1}, {2}, {3}, {4}, {3, 2}, {1, 2, 3}, {4, 3}, {2}}

SquareOne
  • 7,575
  • 1
  • 15
  • 34
2

Here is an option using Reap and Sow:

splitInc[list_List]:=Block[{i=0,previousTotal=0,current={}},
    Last@Reap@Scan[
        (
        AppendTo[current,#];
        If[Total[current] > previousTotal
            ,Sow[#,i];previousTotal=Total[current];current={};i++
            ,Sow[#,i]]
        )&
        , list
    ]
]
splitInc[list]

{{1},{2},{3},{4},{3,2},{1,2,3},{4,3},{2}}

Murta
  • 26,275
  • 6
  • 76
  • 166