7

Let’s say, we have a list of lists:

Table[0, {30}, RandomInteger[{1, 5}]]

which might look like this

{{0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0}, {0, 0}, {0, 0, 0, 0}, {0}, {0, 0}, {0, 0, 0, 0}, {0}, {0}, {0, 0, 0, 0}, {0}, {0, 0, 0}, {0, 0, 0}, {0, 0}, {0, 0, 0, 0}, {0, 0, 0}, {0, 0, 0}, {0, 0, 0}, {0}, {0, 0, 0, 0}, {0, 0, 0}, {0, 0, 0, 0}, {0}, {0}, {0, 0, 0}, {0}, {0, 0, 0, 0}, {0}, {0, 0, 0}}

My function now groups the lists so that each group contains not more than 10 elements if the group gets flattened once. In this case it results in:

{{{0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0}}, {{0, 0}, {0, 0, 0, 0}, {0}, {0, 0}}, {{0, 0, 0, 0}, {0}, {0}, {0, 0, 0, 0}}, {{0}, {0, 0, 0}, {0, 0, 0}, {0, 0}}, {{0, 0, 0, 0}, {0, 0, 0}, {0, 0, 0}}, {{0, 0, 0}, {0}, {0, 0, 0, 0}}, {{0, 0, 0}, {0, 0, 0, 0}, {0}, {0}}, {{0, 0, 0}, {0}, {0, 0, 0, 0}, {0}}, {{0, 0, 0}}}

With group lengths of {10, 9, 10, 9, 10, 8, 9, 9, 3}.

I am asking myself, if there is a better way to implement it.

RegroupWithListLength[list_, limit_: 10] := Block[
  {result = {{}}},
  Do[If[
    Length[Flatten[result[[-1]], 1]] + Length[list[[i]]] <= limit,
    AppendTo[result[[-1]], list[[i]]],
    AppendTo[result, {list[[i]]}]],
   {i, Length[list]}];
  result
  ]

$\tiny\textit{For the code golfers around ... in my small world, this might be a nice challenge, no?}$

4 Answers4

6

A simpler solution using FoldPairList

f[n_, x_, size_] := 
  If[n - Length[x] < 0, {x, size - Length[x]}, {x, n - Length[x]}];
groupElementsBySize[lst_, size_] := Split[FoldPairList[f[#1, #2, size] &, size, lst, Identity], 
   Last[#1] - Last[#2] > 0 &] // Apply[#1 &, #, {2}] &

Test:

groupElementsBySize[list, 10]

{{{0}, {0, 0, 0, 0, 0}, {0}, {0, 0}}, {{0, 0, 0, 0}, {0, 0, 0}, {0, 0}}, {{0, 0}, {0}, {0, 0, 0}, {0}, {0}, {0, 0}}, {{0, 0, 0}, {0, 0, 0}, {0, 0, 0}}, {{0, 0, 0}, {0, 0, 0, 0, 0}}, {{0, 0, 0}, {0, 0, 0, 0}}, {{0, 0, 0, 0}, {0, 0, 0}, {0}}, {{0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0}}, {{0}, {0, 0, 0}, {0, 0, 0, 0}}, {{0, 0, 0}}}

Timing comparison:

SeedRandom[42];
tab = Table[0, 10^5, RandomInteger[{1, 5}]];
res1 = RegroupWithListLength2[tab, 10];// RepeatedTiming
res2 = groupElementsBySize[tab, 10];// RepeatedTiming
res1==res2

{0.46, Null}

{0.880, Null}

True

Anjan Kumar
  • 4,979
  • 1
  • 15
  • 28
5

This is just using Reap/Sow instead of AppendTo - with a very mild gain in performance

RegroupWithListLength[list_List, limit_: 10] := 
 Block[{i = 0, currentTotal = 0, len},
  Reap[
    Scan[
     (len = Length@#;
       If[currentTotal + len <= limit,
        currentTotal += len;
        Sow[#, i],
        currentTotal = len;
        Sow[#, ++i]
        ]) &
     ,
     list
     ]
    ][[2]]
  ]
Jason B.
  • 68,381
  • 3
  • 139
  • 286
5

I just had the same Idea as Jason B, but didn't see his post until I finished testing my version. I post it anyway, since it is slightly different:

ClearAll[RegroupWithListLength2]

RegroupWithListLength2::bdlen = 
  "The list `1` contains a list which is longer than `2`.";

RegroupWithListLength2[list_, limit_: 10] := 
 Module[{count = 0, bin = 0},
   Last@Reap[
     (
         If[(count += Length[#]) > limit, bin++; count = Length[#]];
         Sow[#, bin];
     ) & /@ list;
    ]
   ] /; If[Max[Length /@ list] > limit, 
      Message[RegroupWithListLength2::bdlen, Short[list], limit]; False, 
      True
     ]

For long lists the performance increase is significant:

SeedRandom[42];
With[{
  tab = Table[0, 10^5, RandomInteger[{1, 5}]],
  n = 10
  },
 First@RepeatedTiming[RegroupWithListLength[tab, n]]
 ]
(* 8.5 *)

SeedRandom[42];
With[{
  tab = Table[0, 10^5, RandomInteger[{1, 5}]],
  n = 10
  },
 First@RepeatedTiming[RegroupWithListLength2[tab, n]]
 ]
(* 0.942 *)

I also added the /; test at the end of my function definition so it does not evaluate if one of the lists is shorter than the limit.

JEM_Mosig
  • 3,003
  • 15
  • 28
2

I'm posting this, not because it's fast (it isn't), but because I like recursion.

Test case data

data = {{0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0}, {0, 0}, {0, 0, 0, 0}, {0}, {0, 0}, {0, 0, 0, 0}, {0}, {0}, {0, 0, 0, 0}, {0}, {0, 0,0}, {0, 0, 0}, {0, 0}, {0, 0, 0, 0}, {0, 0, 0}, {0, 0, 0}, {0, 0,0}, {0}, {0, 0, 0, 0}, {0, 0, 0}, {0, 0, 0, 0}, {0}, {0}, {0, 0, 0}, {0}, {0, 0, 0, 0}, {0}, {0, 0, 0}};

Test case results

grouped = {{{0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0}}, {{0, 0}, {0, 0, 0, 0}, {0}, {0, 0}}, {{0, 0, 0, 0}, {0}, {0}, {0, 0, 0, 0}}, {{0}, {0, 0, 0}, {0, 0, 0}, {0, 0}}, {{0, 0, 0, 0}, {0, 0, 0}, {0, 0, 0}}, {{0, 0, 0}, {0}, {0, 0, 0, 0}}, {{0, 0, 0}, {0, 0, 0, 0}, {0}, {0}}, {{0, 0, 0}, {0}, {0, 0, 0, 0}, {0}}, {{0, 0,0}}};

Recursive function solution based on the principal-helper design pattern.

grouper[items:{{__} ..}, Optional[limit_Integer?Positive, 10]] := 
  helper[items, limit, {}, {}]

helper[{}, , grps, grp_] := Join[new, {grp}] helper[{item_, rest___}, limit_, new_, grp_] := If[Total[Length /@ grp] + Length @ item <= limit, helper[{rest}, limit, grps_, Join[grp, {item}]], helper[{rest}, limit, Join[grps_, {grp}], {item}]]

grouper[data] == grouped

True

Update

One very good reason for using the principal-helper design pattern when writing recursive code is that it allows the principal function to do heavy duty argument checking before calling the helper function the carries out the recursion. Since the principal function is only called once, the time-cost of argument checking is not too great. If no helper was used, the argument checking could become very expensive because it would done over and over again.

Here is grouper upgraded to near industrial grade argument checking.

Clear[grouper]
grouper::toolong = "`1` is longer than the length limit `2`";
grouper[items : {{__} ..}, Optional[limit_Integer?Positive, 10]] := 
  Check[
    Do[If[Length[i] > limit, Message[grouper::toolong, i, limit]], {i, items}];
    helper[items, limit, {}, {}],
    $Failed,
    {grouper::toolong}]
m_goldberg
  • 107,779
  • 16
  • 103
  • 257