Partitioning a list based on a criterion for sublists

Question

SeedRandom[1];
alist = RandomInteger[{1, 10}, 20]

{2, 5, 1, 8, 1, 1, 9, 7, 1, 5, 2, 9, 6, 2, 2, 2, 4, 3, 2, 7}

I would like to divide this list into sublists (starting at the left) such that Total of each sublist remains less than 20. The output would look like:

{{2, 5, 1, 8, 1, 1}, {9, 7, 1}, {5, 2, 9}, {6, 2, 2, 2, 4, 3}, {2, 7}}

What are some of the idiomatic ways of doing so in Mathematica? Thanks for your help.

It would be nice if you could compare all the answer's performances on the same computer. Aside from the time benchmark, I value readability and tidiness (keep a clean kernel), and full use of available resources. — rhermans, May 21 '22 at 11:22
What is most interesting about your question is that all the available tools that run tests over a list do element by element comparisons, and we have no (AFAIK) efficient built-in functions that performed the test over the whole sublist. I hope to see other answers that do direct sequential tests over sublists which may scale well over long lists. Something like a TakeDropWhile that where the test is non on the element but on the Take output. — rhermans, May 21 '22 at 12:47
I am studying various solutions and there is quite a variety of them; please allow me some time. The variance between performance of various commands achieving the same task within this software is remarkable. Whereas your answer is the most natural way of phrasing this task, it turns out to be much slower showing that a fundamentally different algorithm is at work behind the scenes. — Syed, May 21 '22 at 12:56
Please forgive me, since I will not be accepting an answer for this post. There are many excellent answers and I can't decide. But if I did, it would be rherman's answer for clarity of thinking or Alexey Popkov's answers for speed. I have learned much from all of you. — Syed, May 23 '22 at 12:30
Thanks Syed, it is difficult, I would add 268494 using Split by @BenIzd for being a good balance between fast, short and simple. — rhermans, May 23 '22 at 13:05
Related (but not the same): Partitioning a list when the cumulative sum exceeds 1 and Getting lengths of sublists that sum to more than one. In particular, see this answer by Mr Wizard — user1066, May 24 '22 at 07:10
I'm still getting reputation points from this Q&A, which is embarrassing given that my answer is the slowest solution by far. This Q&A is a fine example of why this site is so useful. 12 answers of curated expert knowledge! I love it when many people take on the challenge. — rhermans, Aug 09 '23 at 08:27

rhermans · Answer 1 · 2023-01-06T12:00:16.390

SequenceSplit, Longest, Plus

My approach is to take advantage of the built-in SequenceSplit and ask for the Longest pattern with the Condition(/;) that it adds up (Plus) to less than 20 (<20).

SequenceSplit[alist, {Longest[a__]} /; Plus[a] < 20 :> {a}]

Analysis

This solution is reasonably short and idiomatic and doesn't create any lingering definitions. Good enough for short lists, but as it's based on pattern matching will not perform well for huge lists. See performance analysis below. The evaluation is not element by element, but on the whole sub-list (Plus[a] < 20).

What is most interesting about your question is that all the available tools like Split and SplitBy make element by element comparisons, and we have no (AFAIK) efficient built-in functions that split based on a test that is over the whole sublist. I hope to see other answers that do direct sequential tests over sublists which may scale well over long lists.

I am looking forward to seeing various other answers. Nice question!

Performance

This is extremely slow for long lists, compare with the best alternatives

268494 split by @BenIzd (fast, short and simple)
268537 accumLess5 by @AlexeyPopkov (impressive best performance, moderate length, complicated)

Measuring with AbsoluteTiming at various lengths.

ListPlot[
 Table[
  {l, First@AbsoluteTiming[
     SequenceSplit[
       RandomInteger[{1, 10}, l]
       , {Longest[a__]} /; Plus[a] < 20 :> {a}
       ];
     ]
   }
  , {l, 10, 1200, 50}
  ]
 , PlotTheme -> "Scientific"
 , FrameLabel -> {"List length", "Time [s]"}
 ]

This is in a rather old i7-4770 CPU @ 3.40GHz @BenIzd shows both a better performing algorithm and a newer computer.

I need to learn the Sequence* commands more. Thanks! – Syed May 21 '22 at 10:06 — Syed, May 21 '22 at 10:06

Ben Izd · Answer 2 · 2022-05-23T15:27:41.273

Split[ls, fn] calls fn with items of Partition[ls,2,1], so if we create a temporary variable (here as i) that in every comparison, the first value gets added to the i and if i + second element is under 20, it returns True meaning keeping them together, otherwise, i=0 and return False, meaning adding a breakpoint.

Block[{i = 0},
 Split[alist, (i += #1; If[i + #2 < 20, True, i = 0; False]) &]
 ]
(* Out: {{2, 5, 1, 8, 1, 1}, {9, 7, 1}, {5, 2, 9}, {6, 2, 2, 2, 4, 3}, {2, 7}} *)

Using @rhermans benchmark code on Ryzen 1700 (AbsoluteTiming):

Update 1

I never thought it would be this competitive. If you're looking to squeeze as much as possible, we could have:

Block[{i = 0}, Split[#, If[(i += #1) + #2 < 20, True, i = 0] &]] &

It doesn't return False probably thanks to a TrueQ in its internal.

Update 2

Reading @Syed answer, inspired me to add a general function just like his cSplit3 but faster ;)

It starts from i=1 to j and applies the function until it returns False, which then the current position will be Sowed and i=j. After that Differences is applied and the rest is in TakeList hands.

ClearAll[customSplit];
customSplit[data_, fn_] := Block[{i = 1},
  TakeList[data, 
   Differences@
    Reap[Sow[1]; 
      Do[If[fn[data[[i ;; j]]], Null, Sow[i = j];], {j, Length@data}];
       Sow[Length@data + 1];][[2, 1]]]
  ]

It doesn't score from a readability point of view but it's fast.

Test

Assuming:

SeedRandom[1];
alist = RandomInteger[{1, 10}, 10000];
blist = RandomReal[{0, 3}, 10000];
wtdata = Transpose[{alist, blist}];

tex1 = cSplit3[wtdata, Total@(Times @@@ #) < 30 &]; // MaxMemoryUsed // RepeatedTiming
(* Out: {0.163039, 282136} *)
tex2 = customSplit[wtdata, Total@(Times @@@ #) < 30 &]; // MaxMemoryUsed // RepeatedTiming
(* Out: {0.062228, 278920} *)
tex2 == tex1
(* Out: True *)

And Finally, the most delicious part (courtesy of @AlexeyPopkov and @ChrisDegnen):

On the Update 1. It is really surprizing how much overhead adds innocent looking little complication: i = 0; False instead of just i = 0. — Alexey Popkov, May 22 '22 at 09:03
@AlexeyPopkov Yes, and you'll be more surprised if you compare i=0, i=0; and i = 0; False. — Ben Izd, May 22 '22 at 18:43

Chris Degnen · Answer 3 · 2022-05-21T13:32:24.317

alist = {2, 5, 1, 8, 1, 1, 9, 7, 1, 5, 2, 9, 6, 2, 2, 2, 4, 3, 2, 7};

fun[alist_List] := Module[{sub = alist, a, len = 1},
  Most[Reap[While[len > 0,
      len = LengthWhile[Accumulate[sub], # < 20 &];
      {a, sub} = TakeDrop[sub, len];
      Sow[a]]][[2, 1]]]]

fun[alist]

{{2, 5, 1, 8, 1, 1}, {9, 7, 1}, {5, 2, 9}, {6, 2, 2, 2, 4, 3}, {2, 7}}

Slightly slower than Ben Izd's Split method.

ben[alist_List] := Block[{i = 0}, Split[alist, (i += #1;
     If[i + #2 < 20, True, i = 0; False]) &]]
Catch[ListPlot[
  Transpose[Table[alist = RandomInteger[{1, 10}, l];
    a = {l, First@AbsoluteTiming[c = fun[alist];]}; 
    b = {l, First@AbsoluteTiming[d = ben[alist];]};
    If[c != d, Throw["mismatch"]];
    {a, b}, {l, 10, 1200, 50}]], PlotTheme -> "Scientific", 
  FrameLabel -> {"List length", "Time [s]"},
  PlotLegends -> {"fun", "Split"}]]

Nasser · Answer 4 · 2022-05-21T12:47:25.353

13

Update

A more functional style might be

alist = {2, 5, 1, 8, 1, 1, 9, 7, 1, 5, 2, 9, 6, 2, 2, 2, 4, 3, 2, 7};
accum = {};
theMax = 20;
First@Last@Reap[
   Scan[If[Total[accum] + # <= theMax,
      AppendTo[accum, #]
      ,
      Sow[accum];
      accum = {};
      AppendTo[accum, #]] &,
    alist];
    If[Length@accum != 0, Sow[accum]]
   ]

gives

{{2, 5, 1, 8, 1, 1}, {9, 7, 1}, {5, 2, 9}, {6, 2, 2, 2, 4, 3}, {2, 7}}

This assumes all elements of lists are <= theMax to start with.

edited May 21 '22 at 12:47

answered May 21 '22 at 09:11

Nasser

143,286
11
154
359

Thank you for your reply and time. Similar tasks have been bothering me for some time and I will wait for more solutions. – Syed May 21 '22 at 09:13

kglr · Answer 5 · 2022-05-22T11:00:50.200

13

An alternative way to use Split:

ClearAll[f]
f = Module[{s = First @ #, t = #2}, Split[#, Or[(s += #2) < t, s = #2] &]]&

Examples:

alist ~ f ~ 20

{{2, 5, 1, 8, 1, 1}, {9, 7, 1}, {5, 2, 9}, {6, 2, 2, 2, 4, 3}, {2, 7}}

alist ~ f ~ 10

{{2, 5, 1}, {8, 1}, {1}, {9}, {7,1}, {5, 2}, {9}, {6, 2}, {2, 2, 4}, {3, 2}, {7}}

edited May 22 '22 at 11:00

answered May 21 '22 at 23:00

kglr

394,356
18
477
896

2

(+1) The condition in your solution should be (s += #2) < t (strictly less). – Alexey Popkov May 22 '22 at 07:37
Thank you @AlexeyPopkov; fixed it. – kglr May 22 '22 at 11:01

Alexey Popkov · Answer 6 · 2022-05-23T11:57:30.493

Here are five solutions:

accumLess[arr_List, thr_] := Module[{acc = 0, lst},
   lst = Table[
      If[(acc += arr[[i]]) < thr, Nothing, acc = arr[[i]]; i], {i, Length[arr]}] - 1;
   lst = If[lst[[1]] == 0, Differences[Append[lst, Length[arr]]],
     Prepend[Differences[Append[lst, Length[arr]]], lst[[1]]]];
   TakeList[arr, lst]];
accumLess2[arr_List, thr_] := Module[{acc = 0, lst, n = 1},
   lst = Table[If[(acc += v) < thr, {n, v}, acc = v; ++n; {n, v}], {v, arr}];
   SplitBy[lst, First][[All, All, 2]]];
accumLess3 = Function[{arr, thr}, SplitBy[FoldList[
      If[#1[[1]] + #2 < thr, {#1[[1]] + #2, #1[[2]], #2}, {#2, #1[[2]] + 1, #2}] &,
      {arr[[1]], 1, arr[[1]]}, Rest[arr]], #[[2]] &][[All, All, 3]]];
accumLess4[arr_List, thr_] := Module[{acc = 0, lst},
   lst = Reap[Do[If[(acc += arr[[i]]) >= thr,
         Sow[acc = arr[[i]]; i]], {i, Length[arr]}]][[2, 1]] - 1;
   lst = If[lst[[1]] == 0, Differences[Append[lst, Length[arr]]],
     Prepend[Differences[Append[lst, Length[arr]]], lst[[1]]]];
   TakeList[arr, lst]];
accumLess5[arr_List, thr_] := Module[{acc = 0, lst, i = 0},
   lst = Reap[Do[If[(acc += v) < thr, i++, acc = v; Sow[i++]], {v, arr}]][[2, 1]];
   lst = If[lst[[1]] == 0, Differences[Append[lst, i]],
     Prepend[Differences[Append[lst, i]], lst[[1]]]];
   TakeList[arr, lst]];

Comparison with solutions by Ben Izd (both), Chris Degnen, kglr (corrected), and user1066:

ben[alist_List] := Block[{i = 0}, Split[alist, (i += #1;
      If[i + #2 < 20, True, i = 0; False]) &]];
ben2 = Block[{i = 0}, Split[#, If[(i += #1) + #2 < 20, True, i = 0] &]] &;
fun[alist_List] := 
 Module[{sub = alist, a, len = 1}, 
  Most[Reap[While[len > 0, len = LengthWhile[Accumulate[sub], # < 20 &];
      {a, sub} = TakeDrop[sub, len];
      Sow[a]]][[2, 1]]]]
f = Module[{s = First@#, t = #2}, Split[#, Or[(s += #2) < t, s = #2] &]] &;
user1066[list_, limit_] := 
  Module[{i = 0, j = 0}, 
   TakeList[list, 
    Catenate[Reap[
       Scan[If[i + # < limit, i += #; j += 1, i = #; Sow[j]; j = 1] &, list], _, 
       DeleteCases[Join[#2, {j}], 0] &][[2]]]]];
Catch[ListLinePlot[Transpose[Table[alist = RandomInteger[{0, 10}, l];
    a = {l, First@RepeatedTiming[aa = fun[alist];, 0.2]};
    b = {l, First@RepeatedTiming[bb = ben[alist];, 0.2]};
    c = {l, First@RepeatedTiming[cc = accumLess[alist, 20];, 0.2]};
    d = {l, First@RepeatedTiming[dd = accumLess2[alist, 20];, 0.2]};
    e = {l, First@RepeatedTiming[ee = f[alist, 20];, 0.2]};
    ff = {l, First@RepeatedTiming[fff = ben2[alist];, 0.2]};
    g = {l, First@RepeatedTiming[gg = accumLess3[alist, 20];, 0.2]};
    i = {l, First@RepeatedTiming[ii = accumLess4[alist, 20];, 0.2]};
    j = {l, First@RepeatedTiming[jj = accumLess5[alist, 20];, 0.2]};
    k = {l, First@RepeatedTiming[kk = user1066[alist, 20];, 0.2]};
    If[Not[aa == bb == cc == dd == ee == fff == gg == ii == jj == kk], Throw["mismatch"]];
    {a, b, c, d, e, ff, g, i, j, k}, {l, 10, 2010, 200}]], ImageSize -> 700, 
  PlotMarkers -> All, PlotTheme -> "Scientific", 
  FrameLabel -> {"List length", "Time [s]"}, 
  PlotLabels -> {"fun", "ben", "accumLess", "accumLess2", "f", "ben2", "accumLess3", 
    "accumLess4", "accumLess5", "user1066"}]]

P.S. Note on the implementations

All five implementations assume that the values in the input array are non-negative Real or Integer numbers, the threshold must be a Real or Integer number (may be negative). The maximum values are not limited.

Excellent to see the comparative benchmark. +1. . I know my solution is too slow to compare with such long lists. — rhermans, May 22 '22 at 10:50
@Alexey Can I please use your benchmarking code? I will be writing an answer tomorrow. — Syed, May 22 '22 at 19:03
@Syed Of course, you can. My code is just an extension of the code from the answer by Chris Degnen. — Alexey Popkov, May 22 '22 at 23:28
You write: If[(acc += v)... in the fifth version, and it works, but where did you define v. I am not familiar with this usage. Could you please say a few words about it. — Syed, May 23 '22 at 09:37
@Syed v is a local variable of Do, it is defined in its second argument: Do[<...>, {v, arr}]. Try: Do[Print@v, {v, {1, 2, 3}}]. This syntax is document under "Generalizations&Extensions" on the Docs page for Do. — Alexey Popkov, May 23 '22 at 09:39

NonDairyNeutrino · Answer 7 · 2022-05-23T01:59:32.597

SeedRandom[1];
alist = RandomInteger[{1, 10}, 20];
goal = {{2, 5, 1, 8, 1, 1}, {9, 7, 1}, {5, 2, 9}, {6, 2, 2, 2, 4, 3}, {2, 7}};

`Fold` and Tagged `Sow`

Since Accumulate is a special case of FoldList we can tweak the inner workings to better suit our needs (and do without the list). Specifically, once the accumulation reaches a value greater than or equal to our threshold (in this case 20), instead of continuing the accumulation, we instead get the value that would have been added by itself. In either case, we Sow each value into a tag to keep them together, and increment the tag/"run number" when the accumulation reaches our threshold.

runSumSplit[list : {__?NumericQ}, threshold_?NumericQ, Method -> "Fold"] := Block[
  {tag = 1},
  Fold[
    If[Plus@## < threshold, Sow[#2, tag]; Plus@##, Sow[#2, (*PreIncrement*) ++tag]] &,
    0,
    list
  ]
] // Reap // Last
runSumSplit[alist, 20, Method -> "Fold"] == goal

True

This method also has the advantage (as seen below in the timing comparison) over the below because it doesn't do the accumulation each time.

While

Using Accumulate and Pick, we can get the first sublist that totals less than our threshold, in this case 20. Then we can simply drop those first elements from the list and repeat.

runSumSplit[list : {__?NumericQ}, threshold_?NumericQ, Method -> "While"] := Block[
 {tempList = list, sublist},
 While[
  tempList != {},
  sublist   = Sow@Pick[tempList, Accumulate@tempList, x_ /; x < threshold];
  tempList  = tempList[[Length@sublist + 1 ;;]]
  ]
 ] // Reap // #[[2, 1]] &
runSumSplit[alist, 20, Method -> "While"] == goal

True

Timing Comparison

ListLogPlot[
 Transpose@Table[
   {RepeatedTiming[runSumSplit[#, 20, Method -> "While"]][[1]], 
      RepeatedTiming[runSumSplit[#, 20, Method -> "Fold"]][[1]]} &@RandomInteger[{1, 10}, n],
   {n, 2, 1200, 50}
 ],
 Joined -> True,
 Filling -> None,
 Frame -> True,
 BaseStyle -> {FontSize -> 11},
 FrameLabel -> {"List Length", "Time [s]"},
 PlotLabel -> "i5-6500 CPU @ 3.20GHz\n",
 PlotLegends -> {"While", "Fold"},
 DataRange -> {2, 1200},
 ImageSize -> 72*6
]

AsukaMinato · Answer 8 · 2022-06-17T06:10:44.210

Just a simple sicp practise. https://en.wikipedia.org/wiki/Tail_call

arr = {2, 5, 1, 8, 1, 1, 9, 7, 1, 5, 2, 9, 6, 2, 2, 2, 4, 3, 2, 7};
len = arr // Length;
fun[cur_List,i_Integer,result_List]= Which[
i &gt; len,
Append[result, cur],

Total[cur]+arr[[i]] &gt; 20,
fun[{arr[[i]]}, i+1, Append[result, cur]],

Total[cur] + arr[[i]] &lt;= 20,
fun[Append[cur, arr[[i]]], i+1, result]


]
fun[{},1,{}]

{{2, 5, 1, 8, 1, 1}, {9, 7, 1}, {5, 2, 9}, {6, 2, 2, 2, 4, 3}, {2, 7}}

This function is very slow, so we speed up it.

1

notice that for each recursion, we Total it,

add a curSum to store it.

fun1[cur_List, curSum_Integer,i_Integer,result_List]= Which[
i &gt; len,
Append[result, cur],

curSum + arr[[i]] &gt; 20,
fun1[{arr[[i]]}, arr[[i]], i+1, Append[result, cur]],

curSum + arr[[i]] &lt;= 20,
fun1[Append[cur, arr[[i]]], curSum + arr[[i]] , i+1, result]


]
fun1[{},0,1,{}]

2

In FP language, there is LinkedList, it's fast to append. https://stackoverflow.com/a/39658004/13040423

So we could try it.

fun2[cur_List, curSum_Integer,i_Integer,result_List]= Which[
i &gt; len,
Append[result, cur],

curSum+arr[[i]] &gt; 20,
fun2[{arr[[i]]}, arr[[i]], i+1, Append[result, Flatten@cur]],

curSum + arr[[i]] &lt;= 20,
fun2[{cur, arr[[i]]}, curSum + arr[[i]] , i+1, result]


]
fun2[{},0,1,{}]

3

Each Append is to add the answer to the answer list, so use Reap and Sow

fun3[cur_List, curSum_Integer,i_Integer] = Which[
i &gt; len,
Sow[Flatten@cur],

curSum+arr[[i]] &gt; 20,
Sow[Flatten@cur];
fun3[{arr[[i]]}, arr[[i]], i+1],

curSum + arr[[i]] &lt;= 20,
fun3[{cur, arr[[i]]}, curSum + arr[[i]] , i+1]


]
fun3[{},0,1] // Reap // Last // First

it becomes @Nasser's answer. (But no var has been changed.)

score 8 · Answer 9 · answered May 23 '22 at 12:10

Answers are still coming in. Please accept my thanks for spending time and participating. I am going to post my humble and long answer and it is clearly not going to break records. You can go directly to the Usage section.

My solution called cSplit3 follows. I would like to emphasize the idea that keeping the function applied to sublists (f) separate keeps the interface user friendly.

The idea is to take one element from the list at a time and to apply f to the list accumulated so far. If the function application fails, then Sow the previously accumulated list and drop the sown elements. Indeed this theme is similar to some solutions presented here.

cSplit3[inList_List, f_Function] := Module[{rem = inList, i},
  Reap[
    While[rem != {},
     i = 1;
     While[i <= Length@rem && f@Take[rem, i],
      i++;
      ];
     i--;
     Sow@rem[[1 ;; i]];
     rem = Drop[rem, i];
     ]
    ][[2, 1]]
  ]

To test initially:

Manipulate[
 t = cSplit3[alist, Total[#] < i &]
 , {{i, 10}, 10, 30, 1}
 ]

Using the testbench by Chris Degnen (Big thanks):

If you are curious as to why earlier attempts were so horrible, I was using the following statement in the loop (instead of While as the main extraction mechanism:

Last@FoldWhileList[Join[#1, {#2}] &, {}, rem, f@# &]

For larger thresholds, the performance improves since a threshold of 10 means that lists are being sliced on average after every two elements. For a threshold of 60, it means that the average slice length is 12.

So I know that fine slicing is incurring the most burden, but it could be the function application and I will focus on that later on. This is just one example function. The performance will change based on the function complexity used to split sublists, but it should stay reasonable.

Usage

Since I am clearly not in the running for performance, I will add some usage examples:

Example 1

Assume that a list of weighted data is available where the first element is an integer and the last element is the weight.

SeedRandom[1];
alist = RandomInteger[{1, 10}, 100];
blist = RandomReal[{0, 3}, 100];
wtdata = Transpose[{alist, blist}];

This extracts sublists such that the total weight remains less than 30.0.

tex1 = cSplit3[wtdata, Total@(Times @@@ #) < 30 &]
Total@(Times @@@ #) & /@ tex1

{26.1933, 26.05, 26.2313, 25.3104, 15.182, 21.183, 29.1472, 29.8455,
16.3524, 17.7294, 20.103, 29.4523, 29.9613, 28.5688, 25.4966,
18.9773, 20.3104, 29.7772, 28.4467, 14.5844, 29.6726, 20.8483,
24.1094, 29.5638, 20.3955, 29.4831, 15.8998}

Example 2

This would separate sublists such that there are at most four primes in each sublist and the total of the sublist remains less than 50.

r1 = cSplit3[alist, (Count[PrimeQ@#, True] < 5 && Total[#] < 50) &]

{{2, 5, 1, 8, 1, 1, 9, 7, 1, 5}, {2, 9, 6, 2, 2, 2, 4}, {3, 2, 7, 1,
3}, {7, 5, 6, 5, 4, 1, 2, 4, 6, 4, 1, 4}, {3, 4, 10, 6, 2, 6, 3, 4,
10}, {2, 1, 5, 5, 2, 6}, {3, 8, 10, 10, 9, 1, 8}, {5, 10, 3, 7, 4,
3}, {2, 2, 7, 2}, {2, 7, 9, 7, 6, 7, 1, 8}, {10, 2, 5, 5, 4, 6, 3,
4}, {2, 3, 6, 9, 4, 7, 4, 4, 4, 5}, {8, 1, 8, 8, 4, 6, 6, 1}}

Final comment

For me, there is much to learn that is not documented. One can see that I am using the most basic language constructs and don't know for sure what else to modify in there; but more importantly, why any of it should be modified/tuned by the programmer in the first place. Mathematica, being a domain specific high-level environment, must allow users to retain focus on the task and not on tuning low-level constructs based on individual experience.

user1066 · Answer 10 · 2022-05-26T22:51:26.867

A solution that uses Sow and Reap

(a)

Generate a list of the number of values in each required sublist, which can be used by TakeList:

Update

Some slight tweaks of the original posted method (g[list_,limit_]) gives a significantly faster method (but still not as fast as other methods)

takeValues[list_, limit_]:= Module[{i=0,j=Nothing},
(list[[1]]&gt;=limit) || (j=0); (* Prevent '0' at output[[1]] *)

Reap[
    Scan[
        If[(i+=#)&lt;limit, j+=1, i=#; Sow[j]; j=1]&amp;,
        list
    ],
    _,
    Join[#2,{j}]&amp;
][[2]]//Catenate

]

For alist:

takeValues[alist,20]
(* {6, 3, 3, 6, 2} *)
TakeList[#,takeValues[#,20]]&@alist
(* {{2, 5, 1, 8, 1, 1}, {9, 7, 1}, {5, 2, 9}, {6, 2, 2, 2, 4, 3}, {2, 7}} *)
TakeList[#,takeValues[#,10]]&@alist
(* {{2, 5, 1}, {8, 1}, {1}, {9}, {7, 1}, {5, 2}, {9}, {6, 2}, {2, 2, 4}, {3, 2}, {7}} *)

(b)

The same as above, but with TakeList as part of the final argument to Reap:

splitLessThan[list_, limit_]:= Module[{i=0,j=Nothing},
(list[[1]]&gt;=limit) || (j=0); (*Prevent '0' at #2[[1]]*)

Reap[
    Scan[
        If[(i+=#)&lt;limit, j+=1, i=#; Sow[j]; j=1]&amp;,
        list
    ],
    _,
    TakeList[list,Join[#2,{j}]]&amp;
][[2]]//Catenate

]

Example

splitLessThan[alist,20]
(* {{2, 5, 1, 8, 1, 1}, {9, 7, 1}, {5, 2, 9}, {6, 2, 2, 2, 4, 3}, {2, 7}} *)

Timing

testlist=RandomInteger[{1, 10}, 1000000];
TakeList[testlist, takeValues[testlist,20]];//AbsoluteTiming
(* {2.46277, Null} *)

Takelist adds very little 'overhead':

takeValues[testlist,20];//AbsoluteTiming
(* {2.25093, Null} *)

Combined function:

splitLessThan[testlist,20];//AbsoluteTiming//OutputForm
(* {2.37236, Null} *)

A comparison with the neat method given by Ben Izd

(Module[{i = 0},
    Split[testlist, ((i += #1) + #2 < 20) || (i = 0) &]
  ]);//AbsoluteTiming//OutputForm
(* {2.36251, Null} *)

However, using FoldPairList with TakeDrop is much slower.

Compare:

testlistSmall=RandomInteger[{1, 10}, 100000];
FoldPairList[TakeDrop, testlistSmall, takeValues[testlistSmall,23]];//AbsoluteTiming
(* {1.01953, Null} *)
TakeList[testlistSmall, takeValues[testlistSmall,21]];//AbsoluteTiming
(* {0.275854, Null} *)

A comparison with the originally posted method:

TakeList[testlist, g[testlist,20]];//AbsoluteTiming
(* {2.9718, Null} *)

Originally posted function:

g[list_, limit_]:=Module[{i=0,j=0},
                    Reap[
                      Scan[If[i+#<limit, i+=#;j+=1, i=#;Sow[j];j=1]&,list],
                      _,
                      DeleteCases[Join[#2,{j}],0]&
                    ][[2]]//Catenate
                  ]

+1 for clever usage of Reap. It is strange that Scan in this case works slower than Do or Table. Probably the reason is that Function adds significant overhead. — Alexey Popkov, May 23 '22 at 11:46

score 7 · Answer 11 · answered Jun 06 '22 at 20:12

This should be the fastest so far

findLimitedTotalPartStartIndices =
    Compile[
        {{list, _Integer, 1}, {max, _Integer}},
        Module[{sum, bag, el},
        sum = 0;
        bag = Internal`Bag[Most[{0}]];
        Do[
            el = list[[ii]];
            If[el + sum < max
               ,
               sum += el
               ,
               sum = el;
               Internal`StuffBag[bag, ii]
              ]
            ,
            {ii, 1, Length@ list} 
        ];
        Internal`BagPart[bag, All]
    ]];
partitionByLimitedTotal[list_, max_]:=
    Module[{indices, partInd},
        indices = findLimitedTotalPartStartIndices[list, max];
        partInd = Partition[Join[{1}, indices, {Length@list + 1}], 2, 1];
        list[[#;;#2-1]]&@@@partInd
    ];

Comparison

SeedRandom[1];
alist = RandomInteger[{1, 10}, 10000];
accumLess5[arr_List, thr_] := Module[{acc = 0, lst, i = 0},
   lst = Reap[Do[If[(acc += v) < thr, i++, acc = v; Sow[i++]], {v, arr}]][[2, 1]];
   lst = If[lst[[1]] == 0, Differences[Append[lst, i]],
     Prepend[Differences[Append[lst, i]], lst[[1]]]];
   TakeList[arr, lst]];
(bigRes = partitionByLimitedTotal[alist, 20])//RepeatedTiming//First
(bigRes5 = accumLess5[alist, 20])//RepeatedTiming//First
bigRes === bigRes5

0.00567637
0.0120843
True

score 4 · Answer 12 · answered May 22 '22 at 14:51

Solution based on the Accumulated list

a = {2, 5, 1, 8, 1, 1, 9, 7, 1, 5, 2, 9, 6, 2, 2, 2, 4, 3, 2, 7}
max = 20;
t = Accumulate[a];
AppendTo[t, Last[t] + 2 max];
tLength = Length[t];
runningTotal = 0;
movingIndex = 1;
indices = Join[{1}, Reap[
While[ movingIndex < tLength,
  Sow[ 
  movingIndex = 
   Position[t, 
     SelectFirst[
      t[[movingIndex ;;]], # > runningTotal + max &] ] [[1, 1]] ];
 runningTotal = t[[movingIndex - 1]] ;
 ]
][[2, 1]]];
out = Table[a[[indices[[i]] ;; indices[[i + 1]] - 1]], {i,Length[indices] - 1}];

It takes longer than the other solutions (0.2 s for a list with 3000 integers), about 0.2 in a notebook with i7-10510.

eldo · Answer 13 · 2024-03-01T08:51:37.813

3

list = {5, 10, 3, 8, 9, 8, 8, 4, 1, 7};

A variant of rhermans SequenceSplit using SequenceCases

SequenceCases[list, a_ /; Total[a] <= 20]

{{5, 10, 3}, {8, 9}, {8, 8, 4}, {1, 7}}

Not a speed contender for long lists.

edited Mar 01 '24 at 08:51

answered Mar 01 '24 at 00:14

eldo

67,911
5
60
168

How do we know this will return the longest solution? – rhermans Mar 01 '24 at 08:10
The answer gives the expected result (try it with the OP's {2, 5, 1, 8, 1, 1, 9, 7, 1, 5, 2, 9, 6, 2, 2, 2, 4, 3, 2, 7}) The question doesn't mention "a longest solution". – eldo Mar 01 '24 at 08:19
Well, I think it's implied, otherwise List/@ alist would be trivially a valid answer. My question is: How do we know if this will return always one of the extreme non-trivial solutions? Hopefully @syed can comment. – rhermans Mar 01 '24 at 08:36
1

Ahhh, now I understand what you mean. Because of the many answers I did not notice your SequenceSplit. Eliminating the Longest from it, the result doesn't change. Please excuse that I didn't mention your answer . I do it now. – eldo Mar 01 '24 at 08:51
Haha, thanks, but I am NOT after the compliment/reference. I'm honestly after understanding if and why your answer is enough. – rhermans Mar 01 '24 at 09:02

E. Chan-López · Answer 14 · 2024-03-01T09:20:30.563

2

list = {5, 10, 3, 8, 9, 8, 8, 4, 1, 7};

A variant of rhermans SequenceSplit using SequenceReplace:

SequenceReplace[list, {a__} /; Total[{a}] <= 20 :> {a}]

{{5, 10, 3}, {8, 9}, {8, 8, 4}, {1, 7}}

The sister version of SequenceCases, which is also not fast for large lists, but is short and clear.

edited Mar 01 '24 at 09:20

answered Mar 01 '24 at 03:56

E. Chan-López

23,117
3
21
44

1

Please see the comments on @eldo 's answer. – rhermans Mar 01 '24 at 08:37
Oops! Please excuse me for that, because I didn't mention your answer too. I do it now. – E. Chan-López Mar 01 '24 at 09:25