21
SeedRandom[1];
alist = RandomInteger[{1, 10}, 20]

{2, 5, 1, 8, 1, 1, 9, 7, 1, 5, 2, 9, 6, 2, 2, 2, 4, 3, 2, 7}

I would like to divide this list into sublists (starting at the left) such that Total of each sublist remains less than 20. The output would look like:

{{2, 5, 1, 8, 1, 1}, {9, 7, 1}, {5, 2, 9}, {6, 2, 2, 2, 4, 3}, {2, 7}}

What are some of the idiomatic ways of doing so in Mathematica? Thanks for your help.

Alexey Popkov
  • 61,809
  • 7
  • 149
  • 368
Syed
  • 52,495
  • 4
  • 30
  • 85
  • 2
    It would be nice if you could compare all the answer's performances on the same computer. Aside from the time benchmark, I value readability and tidiness (keep a clean kernel), and full use of available resources. – rhermans May 21 '22 at 11:22
  • 1
    What is most interesting about your question is that all the available tools that run tests over a list do element by element comparisons, and we have no (AFAIK) efficient built-in functions that performed the test over the whole sublist. I hope to see other answers that do direct sequential tests over sublists which may scale well over long lists. Something like a TakeDropWhile that where the test is non on the element but on the Take output. – rhermans May 21 '22 at 12:47
  • 2
    I am studying various solutions and there is quite a variety of them; please allow me some time. The variance between performance of various commands achieving the same task within this software is remarkable. Whereas your answer is the most natural way of phrasing this task, it turns out to be much slower showing that a fundamentally different algorithm is at work behind the scenes. – Syed May 21 '22 at 12:56
  • 4
    Please forgive me, since I will not be accepting an answer for this post. There are many excellent answers and I can't decide. But if I did, it would be rherman's answer for clarity of thinking or Alexey Popkov's answers for speed. I have learned much from all of you. – Syed May 23 '22 at 12:30
  • 2
    Thanks Syed, it is difficult, I would add 268494 using Split by @BenIzd for being a good balance between fast, short and simple. – rhermans May 23 '22 at 13:05
  • 3
  • 1
    I'm still getting reputation points from this Q&A, which is embarrassing given that my answer is the slowest solution by far. This Q&A is a fine example of why this site is so useful. 12 answers of curated expert knowledge! I love it when many people take on the challenge. – rhermans Aug 09 '23 at 08:27

14 Answers14

18

SequenceSplit, Longest, Plus

My approach is to take advantage of the built-in SequenceSplit and ask for the Longest pattern with the Condition(/;) that it adds up (Plus) to less than 20 (<20).

SequenceSplit[alist, {Longest[a__]} /; Plus[a] < 20 :> {a}]

Analysis

This solution is reasonably short and idiomatic and doesn't create any lingering definitions. Good enough for short lists, but as it's based on pattern matching will not perform well for huge lists. See performance analysis below. The evaluation is not element by element, but on the whole sub-list (Plus[a] < 20).

What is most interesting about your question is that all the available tools like Split and SplitBy make element by element comparisons, and we have no (AFAIK) efficient built-in functions that split based on a test that is over the whole sublist. I hope to see other answers that do direct sequential tests over sublists which may scale well over long lists.

I am looking forward to seeing various other answers. Nice question!

Performance

This is extremely slow for long lists, compare with the best alternatives

  • 268494 split by @BenIzd (fast, short and simple)
  • 268537 accumLess5 by @AlexeyPopkov (impressive best performance, moderate length, complicated)

Measuring with AbsoluteTiming at various lengths.

ListPlot[
 Table[
  {l, First@AbsoluteTiming[
     SequenceSplit[
       RandomInteger[{1, 10}, l]
       , {Longest[a__]} /; Plus[a] < 20 :> {a}
       ];
     ]
   }
  , {l, 10, 1200, 50}
  ]
 , PlotTheme -> "Scientific"
 , FrameLabel -> {"List length", "Time [s]"}
 ]

enter image description here

This is in a rather old i7-4770 CPU @ 3.40GHz @BenIzd shows both a better performing algorithm and a newer computer.

rhermans
  • 36,518
  • 4
  • 57
  • 149
16

Split[ls, fn] calls fn with items of Partition[ls,2,1], so if we create a temporary variable (here as i) that in every comparison, the first value gets added to the i and if i + second element is under 20, it returns True meaning keeping them together, otherwise, i=0 and return False, meaning adding a breakpoint.

Block[{i = 0},
 Split[alist, (i += #1; If[i + #2 < 20, True, i = 0; False]) &]
 ]

(* Out: {{2, 5, 1, 8, 1, 1}, {9, 7, 1}, {5, 2, 9}, {6, 2, 2, 2, 4, 3}, {2, 7}} *)

Using @rhermans benchmark code on Ryzen 1700 (AbsoluteTiming):

enter image description here

Update 1

I never thought it would be this competitive. If you're looking to squeeze as much as possible, we could have:

Block[{i = 0}, Split[#, If[(i += #1) + #2 < 20, True, i = 0] &]] &

It doesn't return False probably thanks to a TrueQ in its internal.

Update 2

Reading @Syed answer, inspired me to add a general function just like his cSplit3 but faster ;)

It starts from i=1 to j and applies the function until it returns False, which then the current position will be Sowed and i=j. After that Differences is applied and the rest is in TakeList hands.

ClearAll[customSplit];

customSplit[data_, fn_] := Block[{i = 1}, TakeList[data, Differences@ Reap[Sow[1]; Do[If[fn[data[[i ;; j]]], Null, Sow[i = j];], {j, Length@data}]; Sow[Length@data + 1];][[2, 1]]] ]

It doesn't score from a readability point of view but it's fast.

Test

Assuming:

SeedRandom[1];
alist = RandomInteger[{1, 10}, 10000];
blist = RandomReal[{0, 3}, 10000];
wtdata = Transpose[{alist, blist}];
tex1 = cSplit3[wtdata, Total@(Times @@@ #) < 30 &]; // MaxMemoryUsed // RepeatedTiming

(* Out: {0.163039, 282136} *)

tex2 = customSplit[wtdata, Total@(Times @@@ #) < 30 &]; // MaxMemoryUsed // RepeatedTiming

(* Out: {0.062228, 278920} *)

tex2 == tex1

(* Out: True *)

And Finally, the most delicious part (courtesy of @AlexeyPopkov and @ChrisDegnen):

enter image description here

Ben Izd
  • 9,229
  • 1
  • 14
  • 45
14
alist = {2, 5, 1, 8, 1, 1, 9, 7, 1, 5, 2, 9, 6, 2, 2, 2, 4, 3, 2, 7};

fun[alist_List] := Module[{sub = alist, a, len = 1},
  Most[Reap[While[len > 0,
      len = LengthWhile[Accumulate[sub], # < 20 &];
      {a, sub} = TakeDrop[sub, len];
      Sow[a]]][[2, 1]]]]

fun[alist]

{{2, 5, 1, 8, 1, 1}, {9, 7, 1}, {5, 2, 9}, {6, 2, 2, 2, 4, 3}, {2, 7}}

Slightly slower than Ben Izd's Split method.

ben[alist_List] := Block[{i = 0}, Split[alist, (i += #1;
     If[i + #2 < 20, True, i = 0; False]) &]]

Catch[ListPlot[ Transpose[Table[alist = RandomInteger[{1, 10}, l]; a = {l, First@AbsoluteTiming[c = fun[alist];]}; b = {l, First@AbsoluteTiming[d = ben[alist];]}; If[c != d, Throw["mismatch"]]; {a, b}, {l, 10, 1200, 50}]], PlotTheme -> "Scientific", FrameLabel -> {"List length", "Time [s]"}, PlotLegends -> {"fun", "Split"}]]

enter image description here

Chris Degnen
  • 30,927
  • 2
  • 54
  • 108
13

Update

A more functional style might be

alist = {2, 5, 1, 8, 1, 1, 9, 7, 1, 5, 2, 9, 6, 2, 2, 2, 4, 3, 2, 7};

accum = {}; theMax = 20; First@Last@Reap[ Scan[If[Total[accum] + # <= theMax, AppendTo[accum, #] , Sow[accum]; accum = {}; AppendTo[accum, #]] &, alist]; If[Length@accum != 0, Sow[accum]] ]

gives

{{2, 5, 1, 8, 1, 1}, {9, 7, 1}, {5, 2, 9}, {6, 2, 2, 2, 4, 3}, {2, 7}}

This assumes all elements of lists are <= theMax to start with.

Nasser
  • 143,286
  • 11
  • 154
  • 359
  • Thank you for your reply and time. Similar tasks have been bothering me for some time and I will wait for more solutions. – Syed May 21 '22 at 09:13
13

An alternative way to use Split:

ClearAll[f]
f = Module[{s = First @ #, t = #2}, Split[#, Or[(s += #2) < t, s = #2] &]]&

Examples:

alist ~ f ~ 20
{{2, 5, 1, 8, 1, 1}, {9, 7, 1}, {5, 2, 9}, {6, 2, 2, 2, 4, 3}, {2, 7}}
alist ~ f ~ 10
{{2, 5, 1}, {8, 1}, {1}, {9}, {7,1}, {5, 2}, {9}, {6, 2}, {2, 2, 4}, {3, 2}, {7}}
kglr
  • 394,356
  • 18
  • 477
  • 896
12

Here are five solutions:

accumLess[arr_List, thr_] := Module[{acc = 0, lst},
   lst = Table[
      If[(acc += arr[[i]]) < thr, Nothing, acc = arr[[i]]; i], {i, Length[arr]}] - 1;
   lst = If[lst[[1]] == 0, Differences[Append[lst, Length[arr]]],
     Prepend[Differences[Append[lst, Length[arr]]], lst[[1]]]];
   TakeList[arr, lst]];

accumLess2[arr_List, thr_] := Module[{acc = 0, lst, n = 1}, lst = Table[If[(acc += v) < thr, {n, v}, acc = v; ++n; {n, v}], {v, arr}]; SplitBy[lst, First][[All, All, 2]]];

accumLess3 = Function[{arr, thr}, SplitBy[FoldList[ If[#1[[1]] + #2 < thr, {#1[[1]] + #2, #1[[2]], #2}, {#2, #1[[2]] + 1, #2}] &, {arr[[1]], 1, arr[[1]]}, Rest[arr]], #[[2]] &][[All, All, 3]]];

accumLess4[arr_List, thr_] := Module[{acc = 0, lst}, lst = Reap[Do[If[(acc += arr[[i]]) >= thr, Sow[acc = arr[[i]]; i]], {i, Length[arr]}]][[2, 1]] - 1; lst = If[lst[[1]] == 0, Differences[Append[lst, Length[arr]]], Prepend[Differences[Append[lst, Length[arr]]], lst[[1]]]]; TakeList[arr, lst]];

accumLess5[arr_List, thr_] := Module[{acc = 0, lst, i = 0}, lst = Reap[Do[If[(acc += v) < thr, i++, acc = v; Sow[i++]], {v, arr}]][[2, 1]]; lst = If[lst[[1]] == 0, Differences[Append[lst, i]], Prepend[Differences[Append[lst, i]], lst[[1]]]]; TakeList[arr, lst]];

Comparison with solutions by Ben Izd (both), Chris Degnen, kglr (corrected), and user1066:

ben[alist_List] := Block[{i = 0}, Split[alist, (i += #1;
      If[i + #2 < 20, True, i = 0; False]) &]];
ben2 = Block[{i = 0}, Split[#, If[(i += #1) + #2 < 20, True, i = 0] &]] &;
fun[alist_List] := 
 Module[{sub = alist, a, len = 1}, 
  Most[Reap[While[len > 0, len = LengthWhile[Accumulate[sub], # < 20 &];
      {a, sub} = TakeDrop[sub, len];
      Sow[a]]][[2, 1]]]]
f = Module[{s = First@#, t = #2}, Split[#, Or[(s += #2) < t, s = #2] &]] &;
user1066[list_, limit_] := 
  Module[{i = 0, j = 0}, 
   TakeList[list, 
    Catenate[Reap[
       Scan[If[i + # < limit, i += #; j += 1, i = #; Sow[j]; j = 1] &, list], _, 
       DeleteCases[Join[#2, {j}], 0] &][[2]]]]];
Catch[ListLinePlot[Transpose[Table[alist = RandomInteger[{0, 10}, l];
    a = {l, First@RepeatedTiming[aa = fun[alist];, 0.2]};
    b = {l, First@RepeatedTiming[bb = ben[alist];, 0.2]};
    c = {l, First@RepeatedTiming[cc = accumLess[alist, 20];, 0.2]};
    d = {l, First@RepeatedTiming[dd = accumLess2[alist, 20];, 0.2]};
    e = {l, First@RepeatedTiming[ee = f[alist, 20];, 0.2]};
    ff = {l, First@RepeatedTiming[fff = ben2[alist];, 0.2]};
    g = {l, First@RepeatedTiming[gg = accumLess3[alist, 20];, 0.2]};
    i = {l, First@RepeatedTiming[ii = accumLess4[alist, 20];, 0.2]};
    j = {l, First@RepeatedTiming[jj = accumLess5[alist, 20];, 0.2]};
    k = {l, First@RepeatedTiming[kk = user1066[alist, 20];, 0.2]};
    If[Not[aa == bb == cc == dd == ee == fff == gg == ii == jj == kk], Throw["mismatch"]];
    {a, b, c, d, e, ff, g, i, j, k}, {l, 10, 2010, 200}]], ImageSize -> 700, 
  PlotMarkers -> All, PlotTheme -> "Scientific", 
  FrameLabel -> {"List length", "Time [s]"}, 
  PlotLabels -> {"fun", "ben", "accumLess", "accumLess2", "f", "ben2", "accumLess3", 
    "accumLess4", "accumLess5", "user1066"}]]

plot


P.S. Note on the implementations

All five implementations assume that the values in the input array are non-negative Real or Integer numbers, the threshold must be a Real or Integer number (may be negative). The maximum values are not limited.

Alexey Popkov
  • 61,809
  • 7
  • 149
  • 368
  • Excellent to see the comparative benchmark. +1. . I know my solution is too slow to compare with such long lists. – rhermans May 22 '22 at 10:50
  • @Alexey Can I please use your benchmarking code? I will be writing an answer tomorrow. – Syed May 22 '22 at 19:03
  • 1
    @Syed Of course, you can. My code is just an extension of the code from the answer by Chris Degnen. – Alexey Popkov May 22 '22 at 23:28
  • 1
    @Syed I've added two more solutions which are even faster. – Alexey Popkov May 23 '22 at 08:59
  • 2
    You write: If[(acc += v)... in the fifth version, and it works, but where did you define v. I am not familiar with this usage. Could you please say a few words about it. – Syed May 23 '22 at 09:37
  • 2
    @Syed v is a local variable of Do, it is defined in its second argument: Do[<...>, {v, arr}]. Try: Do[Print@v, {v, {1, 2, 3}}]. This syntax is document under "Generalizations&Extensions" on the Docs page for Do. – Alexey Popkov May 23 '22 at 09:39
9
SeedRandom[1];
alist = RandomInteger[{1, 10}, 20];
goal = {{2, 5, 1, 8, 1, 1}, {9, 7, 1}, {5, 2, 9}, {6, 2, 2, 2, 4, 3}, {2, 7}};

Fold and Tagged Sow

Since Accumulate is a special case of FoldList we can tweak the inner workings to better suit our needs (and do without the list). Specifically, once the accumulation reaches a value greater than or equal to our threshold (in this case 20), instead of continuing the accumulation, we instead get the value that would have been added by itself. In either case, we Sow each value into a tag to keep them together, and increment the tag/"run number" when the accumulation reaches our threshold.

runSumSplit[list : {__?NumericQ}, threshold_?NumericQ, Method -> "Fold"] := Block[
  {tag = 1},
  Fold[
    If[Plus@## < threshold, Sow[#2, tag]; Plus@##, Sow[#2, (*PreIncrement*) ++tag]] &,
    0,
    list
  ]
] // Reap // Last

runSumSplit[alist, 20, Method -> "Fold"] == goal

True

This method also has the advantage (as seen below in the timing comparison) over the below because it doesn't do the accumulation each time.

While

Using Accumulate and Pick, we can get the first sublist that totals less than our threshold, in this case 20. Then we can simply drop those first elements from the list and repeat.

runSumSplit[list : {__?NumericQ}, threshold_?NumericQ, Method -> "While"] := Block[
 {tempList = list, sublist},
 While[
  tempList != {},
  sublist   = Sow@Pick[tempList, Accumulate@tempList, x_ /; x < threshold];
  tempList  = tempList[[Length@sublist + 1 ;;]]
  ]
 ] // Reap // #[[2, 1]] &

runSumSplit[alist, 20, Method -> "While"] == goal

True

Timing Comparison

ListLogPlot[
 Transpose@Table[
   {RepeatedTiming[runSumSplit[#, 20, Method -> "While"]][[1]], 
      RepeatedTiming[runSumSplit[#, 20, Method -> "Fold"]][[1]]} &@RandomInteger[{1, 10}, n],
   {n, 2, 1200, 50}
 ],
 Joined -> True,
 Filling -> None,
 Frame -> True,
 BaseStyle -> {FontSize -> 11},
 FrameLabel -> {"List Length", "Time [s]"},
 PlotLabel -> "i5-6500 CPU @ 3.20GHz\n",
 PlotLegends -> {"While", "Fold"},
 DataRange -> {2, 1200},
 ImageSize -> 72*6
]

enter image description here

NonDairyNeutrino
  • 7,810
  • 1
  • 14
  • 29
8

Just a simple sicp practise. https://en.wikipedia.org/wiki/Tail_call

arr = {2, 5, 1, 8, 1, 1, 9, 7, 1, 5, 2, 9, 6, 2, 2, 2, 4, 3, 2, 7};
len = arr // Length;
fun[cur_List,i_Integer,result_List]= Which[
i &gt; len,
Append[result, cur],

Total[cur]+arr[[i]] &gt; 20,
fun[{arr[[i]]}, i+1, Append[result, cur]],

Total[cur] + arr[[i]] &lt;= 20,
fun[Append[cur, arr[[i]]], i+1, result]

]

fun[{},1,{}]

{{2, 5, 1, 8, 1, 1}, {9, 7, 1}, {5, 2, 9}, {6, 2, 2, 2, 4, 3}, {2, 7}}

This function is very slow, so we speed up it.

1

notice that for each recursion, we Total it,

add a curSum to store it.

fun1[cur_List, curSum_Integer,i_Integer,result_List]= Which[
i &gt; len,
Append[result, cur],

curSum + arr[[i]] &gt; 20,
fun1[{arr[[i]]}, arr[[i]], i+1, Append[result, cur]],

curSum + arr[[i]] &lt;= 20,
fun1[Append[cur, arr[[i]]], curSum + arr[[i]] , i+1, result]

] fun1[{},0,1,{}]

2

In FP language, there is LinkedList, it's fast to append. https://stackoverflow.com/a/39658004/13040423

So we could try it.

fun2[cur_List, curSum_Integer,i_Integer,result_List]= Which[
i &gt; len,
Append[result, cur],

curSum+arr[[i]] &gt; 20,
fun2[{arr[[i]]}, arr[[i]], i+1, Append[result, Flatten@cur]],

curSum + arr[[i]] &lt;= 20,
fun2[{cur, arr[[i]]}, curSum + arr[[i]] , i+1, result]

] fun2[{},0,1,{}]

3

Each Append is to add the answer to the answer list, so use Reap and Sow

fun3[cur_List, curSum_Integer,i_Integer] = Which[
i &gt; len,
Sow[Flatten@cur],

curSum+arr[[i]] &gt; 20,
Sow[Flatten@cur];
fun3[{arr[[i]]}, arr[[i]], i+1],

curSum + arr[[i]] &lt;= 20,
fun3[{cur, arr[[i]]}, curSum + arr[[i]] , i+1]

] fun3[{},0,1] // Reap // Last // First

it becomes @Nasser's answer. (But no var has been changed.)

AsukaMinato
  • 9,758
  • 1
  • 14
  • 40
8

Answers are still coming in. Please accept my thanks for spending time and participating. I am going to post my humble and long answer and it is clearly not going to break records. You can go directly to the Usage section.


My solution called cSplit3 follows. I would like to emphasize the idea that keeping the function applied to sublists (f) separate keeps the interface user friendly.

The idea is to take one element from the list at a time and to apply f to the list accumulated so far. If the function application fails, then Sow the previously accumulated list and drop the sown elements. Indeed this theme is similar to some solutions presented here.

cSplit3[inList_List, f_Function] := Module[{rem = inList, i},
  Reap[
    While[rem != {},
     i = 1;
     While[i <= Length@rem && f@Take[rem, i],
      i++;
      ];
     i--;
     Sow@rem[[1 ;; i]];
     rem = Drop[rem, i];
     ]
    ][[2, 1]]
  ]

To test initially:

Manipulate[
 t = cSplit3[alist, Total[#] < i &]
 , {{i, 10}, 10, 30, 1}
 ]

Using the testbench by Chris Degnen (Big thanks):

enter image description here

If you are curious as to why earlier attempts were so horrible, I was using the following statement in the loop (instead of While as the main extraction mechanism:

Last@FoldWhileList[Join[#1, {#2}] &, {}, rem, f@# &]

For larger thresholds, the performance improves since a threshold of 10 means that lists are being sliced on average after every two elements. For a threshold of 60, it means that the average slice length is 12.

enter image description here

So I know that fine slicing is incurring the most burden, but it could be the function application and I will focus on that later on. This is just one example function. The performance will change based on the function complexity used to split sublists, but it should stay reasonable.


Usage

Since I am clearly not in the running for performance, I will add some usage examples:

Example 1

Assume that a list of weighted data is available where the first element is an integer and the last element is the weight.

SeedRandom[1];
alist = RandomInteger[{1, 10}, 100];
blist = RandomReal[{0, 3}, 100];
wtdata = Transpose[{alist, blist}];

This extracts sublists such that the total weight remains less than 30.0.

tex1 = cSplit3[wtdata, Total@(Times @@@ #) < 30 &]

Total@(Times @@@ #) & /@ tex1

{26.1933, 26.05, 26.2313, 25.3104, 15.182, 21.183, 29.1472, 29.8455,
16.3524, 17.7294, 20.103, 29.4523, 29.9613, 28.5688, 25.4966,
18.9773, 20.3104, 29.7772, 28.4467, 14.5844, 29.6726, 20.8483,
24.1094, 29.5638, 20.3955, 29.4831, 15.8998}

Example 2

This would separate sublists such that there are at most four primes in each sublist and the total of the sublist remains less than 50.

r1 = cSplit3[alist, (Count[PrimeQ@#, True] < 5 && Total[#] < 50) &]

{{2, 5, 1, 8, 1, 1, 9, 7, 1, 5}, {2, 9, 6, 2, 2, 2, 4}, {3, 2, 7, 1,
3}, {7, 5, 6, 5, 4, 1, 2, 4, 6, 4, 1, 4}, {3, 4, 10, 6, 2, 6, 3, 4,
10}, {2, 1, 5, 5, 2, 6}, {3, 8, 10, 10, 9, 1, 8}, {5, 10, 3, 7, 4,
3}, {2, 2, 7, 2}, {2, 7, 9, 7, 6, 7, 1, 8}, {10, 2, 5, 5, 4, 6, 3,
4}, {2, 3, 6, 9, 4, 7, 4, 4, 4, 5}, {8, 1, 8, 8, 4, 6, 6, 1}}

Final comment

For me, there is much to learn that is not documented. One can see that I am using the most basic language constructs and don't know for sure what else to modify in there; but more importantly, why any of it should be modified/tuned by the programmer in the first place. Mathematica, being a domain specific high-level environment, must allow users to retain focus on the task and not on tuning low-level constructs based on individual experience.

Syed
  • 52,495
  • 4
  • 30
  • 85
7

A solution that uses Sow and Reap

(a)

Generate a list of the number of values in each required sublist, which can be used by TakeList:

Update

Some slight tweaks of the original posted method (g[list_,limit_]) gives a significantly faster method (but still not as fast as other methods)

takeValues[list_, limit_]:= Module[{i=0,j=Nothing},
(list[[1]]&gt;=limit) || (j=0); (* Prevent '0' at output[[1]] *)

Reap[
    Scan[
        If[(i+=#)&lt;limit, j+=1, i=#; Sow[j]; j=1]&amp;,
        list
    ],
    _,
    Join[#2,{j}]&amp;
][[2]]//Catenate

]

For alist:

takeValues[alist,20]

(* {6, 3, 3, 6, 2} *)

TakeList[#,takeValues[#,20]]&@alist

(* {{2, 5, 1, 8, 1, 1}, {9, 7, 1}, {5, 2, 9}, {6, 2, 2, 2, 4, 3}, {2, 7}} *)

TakeList[#,takeValues[#,10]]&@alist

(* {{2, 5, 1}, {8, 1}, {1}, {9}, {7, 1}, {5, 2}, {9}, {6, 2}, {2, 2, 4}, {3, 2}, {7}} *)

(b)

The same as above, but with TakeList as part of the final argument to Reap:

splitLessThan[list_, limit_]:= Module[{i=0,j=Nothing},
(list[[1]]&gt;=limit) || (j=0); (*Prevent '0' at #2[[1]]*)

Reap[
    Scan[
        If[(i+=#)&lt;limit, j+=1, i=#; Sow[j]; j=1]&amp;,
        list
    ],
    _,
    TakeList[list,Join[#2,{j}]]&amp;
][[2]]//Catenate

]

Example

splitLessThan[alist,20]

(* {{2, 5, 1, 8, 1, 1}, {9, 7, 1}, {5, 2, 9}, {6, 2, 2, 2, 4, 3}, {2, 7}} *)

Timing
testlist=RandomInteger[{1, 10}, 1000000];

TakeList[testlist, takeValues[testlist,20]];//AbsoluteTiming

(* {2.46277, Null} *)

Takelist adds very little 'overhead':

takeValues[testlist,20];//AbsoluteTiming

(* {2.25093, Null} *)

Combined function:

splitLessThan[testlist,20];//AbsoluteTiming//OutputForm

(* {2.37236, Null} *)

A comparison with the neat method given by Ben Izd

(Module[{i = 0},
    Split[testlist, ((i += #1) + #2 < 20) || (i = 0) &]
  ]);//AbsoluteTiming//OutputForm

(* {2.36251, Null} *)

However, using FoldPairList with TakeDrop is much slower.

Compare:

testlistSmall=RandomInteger[{1, 10}, 100000];

FoldPairList[TakeDrop, testlistSmall, takeValues[testlistSmall,23]];//AbsoluteTiming

(* {1.01953, Null} *)

TakeList[testlistSmall, takeValues[testlistSmall,21]];//AbsoluteTiming

(* {0.275854, Null} *)

A comparison with the originally posted method:

TakeList[testlist, g[testlist,20]];//AbsoluteTiming

(* {2.9718, Null} *)

Originally posted function:

g[list_, limit_]:=Module[{i=0,j=0},
                    Reap[
                      Scan[If[i+#<limit, i+=#;j+=1, i=#;Sow[j];j=1]&,list],
                      _,
                      DeleteCases[Join[#2,{j}],0]&
                    ][[2]]//Catenate
                  ]
user1066
  • 17,923
  • 3
  • 31
  • 49
  • 2
    +1 for clever usage of Reap. It is strange that Scan in this case works slower than Do or Table. Probably the reason is that Function adds significant overhead. – Alexey Popkov May 23 '22 at 11:46
  • (+1) Sorry to respond with a delay, nice suggestion thanks. – Ben Izd May 27 '22 at 07:54
7

This should be the fastest so far

findLimitedTotalPartStartIndices =
    Compile[
        {{list, _Integer, 1}, {max, _Integer}},
        Module[{sum, bag, el},
        sum = 0;
        bag = Internal`Bag[Most[{0}]];
        Do[
            el = list[[ii]];
            If[el + sum < max
               ,
               sum += el
               ,
               sum = el;
               Internal`StuffBag[bag, ii]
              ]
            ,
            {ii, 1, Length@ list} 
        ];
        Internal`BagPart[bag, All]
    ]];
partitionByLimitedTotal[list_, max_]:=
    Module[{indices, partInd},
        indices = findLimitedTotalPartStartIndices[list, max];
        partInd = Partition[Join[{1}, indices, {Length@list + 1}], 2, 1];
        list[[#;;#2-1]]&@@@partInd
    ];

Comparison

SeedRandom[1];
alist = RandomInteger[{1, 10}, 10000];
accumLess5[arr_List, thr_] := Module[{acc = 0, lst, i = 0},
   lst = Reap[Do[If[(acc += v) < thr, i++, acc = v; Sow[i++]], {v, arr}]][[2, 1]];
   lst = If[lst[[1]] == 0, Differences[Append[lst, i]],
     Prepend[Differences[Append[lst, i]], lst[[1]]]];
   TakeList[arr, lst]];
(bigRes = partitionByLimitedTotal[alist, 20])//RepeatedTiming//First
(bigRes5 = accumLess5[alist, 20])//RepeatedTiming//First
bigRes === bigRes5

0.00567637
0.0120843
True

Jacob Akkerboom
  • 12,215
  • 45
  • 79
4

Solution based on the Accumulated list

a = {2, 5, 1, 8, 1, 1, 9, 7, 1, 5, 2, 9, 6, 2, 2, 2, 4, 3, 2, 7}

max = 20; t = Accumulate[a]; AppendTo[t, Last[t] + 2 max]; tLength = Length[t]; runningTotal = 0; movingIndex = 1;

indices = Join[{1}, Reap[ While[ movingIndex < tLength, Sow[ movingIndex = Position[t, SelectFirst[ t[[movingIndex ;;]], # > runningTotal + max &] ] [[1, 1]] ]; runningTotal = t[[movingIndex - 1]] ; ] ][[2, 1]]]; out = Table[a[[indices[[i]] ;; indices[[i + 1]] - 1]], {i,Length[indices] - 1}];

It takes longer than the other solutions (0.2 s for a list with 3000 integers), about 0.2 in a notebook with i7-10510.

Vito Vanin
  • 568
  • 2
  • 8
3
list = {5, 10, 3, 8, 9, 8, 8, 4, 1, 7};

A variant of rhermans SequenceSplit using SequenceCases

SequenceCases[list, a_ /; Total[a] <= 20]

{{5, 10, 3}, {8, 9}, {8, 8, 4}, {1, 7}}

Not a speed contender for long lists.

eldo
  • 67,911
  • 5
  • 60
  • 168
  • How do we know this will return the longest solution? – rhermans Mar 01 '24 at 08:10
  • The answer gives the expected result (try it with the OP's {2, 5, 1, 8, 1, 1, 9, 7, 1, 5, 2, 9, 6, 2, 2, 2, 4, 3, 2, 7}) The question doesn't mention "a longest solution". – eldo Mar 01 '24 at 08:19
  • Well, I think it's implied, otherwise List/@ alist would be trivially a valid answer. My question is: How do we know if this will return always one of the extreme non-trivial solutions? Hopefully @syed can comment. – rhermans Mar 01 '24 at 08:36
  • 1
    Ahhh, now I understand what you mean. Because of the many answers I did not notice your SequenceSplit. Eliminating the Longest from it, the result doesn't change. Please excuse that I didn't mention your answer . I do it now. – eldo Mar 01 '24 at 08:51
  • Haha, thanks, but I am NOT after the compliment/reference. I'm honestly after understanding if and why your answer is enough. – rhermans Mar 01 '24 at 09:02
2
list = {5, 10, 3, 8, 9, 8, 8, 4, 1, 7};

A variant of rhermans SequenceSplit using SequenceReplace:

SequenceReplace[list, {a__} /; Total[{a}] <= 20 :> {a}]

{{5, 10, 3}, {8, 9}, {8, 8, 4}, {1, 7}}

The sister version of SequenceCases, which is also not fast for large lists, but is short and clear.

E. Chan-López
  • 23,117
  • 3
  • 21
  • 44