2

I'm lookin for binning of

list1={{"1A",1},{"2A",2},{"170A",170},{"3A",3},{"90A",90},{"80A",80},{"2A",2},{"110A",110},{"222A",222},{"200A",200},{"215A",215},{"30A",30}}

into

bins={{0,20,100,∞}}

according to 2nd element in sublists as bin criterion?

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
Dragutin
  • 920
  • 5
  • 14

3 Answers3

3

This is an alternative implementation of the association version:

binBy[dat_, bins_] := With[{int = Partition[First[bins], 2, 1]},
                        GroupBy[dat, FirstCase[int, {a_, b_} /; a <= #[[2]] < b] &]]

binBy[list1, bins]
<|{0, 20} -> {{"1A", 1}, {"2A", 2}, {"3A", 3}, {"2A", 
2}}, {100, ∞}  -> {{"170A", 170}, {"110A", 110}, {"222A",
 222}, {"200A", 200}, {"215A", 215}}, {20, 100} -> {{"90A", 90},
  {"80A", 80}, {"30A", 30}}|>

List version:

binBy[dat_, bins_] := With[{sort = SortBy[dat, Last]},
  Internal`PartitionRagged[sort, Length /@ BinLists[sort[[All, 2]], bins]]]
Coolwater
  • 20,257
  • 3
  • 35
  • 64
1

I think this should work for you:

binBy1[dat_, bins_, fn_] :=
  With[{intv = Interval /@ Partition[bins, 2, 1]}, 
    dat //
      GroupBy[IntervalMemberQ[intv, fn@#] &] //
      KeyMap[Pick[intv, #][[1, 1]] & ] // 
      KeySort
  ]

Use:

binBy1[list1, {0, 20, 100, ∞}, Last]
<|{0, 20} -> {{"1A", 1}, {"2A", 2}, {"3A", 3}, {"2A", 2}},
  {20, 100} -> {{"90A", 90}, {"80A", 80}, {"30A", 30}},
  {100, ∞} -> {{"170A", 170}, {"110A", 110}, {"222A", 222},
   {"200A", 200}, {"215A", 215}}|>

If you just want the values:

binBy2[dat_, bins_, fn_] :=
  With[{intv = Interval /@ Partition[bins, 2, 1]}, 
    dat //
      GroupBy[IntervalMemberQ[intv, fn@#] &] //
      KeyMap[Pick[intv, #][[1]] & ] //
      Lookup[#, intv, {}] &
  ]

binBy2[{ {"90A", 90}, {"3A", 3}}, {-50, 0, 20, 100, ∞}, Last]
{{}, {{"3A", 3}}, {{"90A", 90}}, {}}

Performance

This ends up less clean than the code above, which you already feel is complicated, but for performance Interpolation can be far superior to IntervalMemberQ as I used it above.

binsToIFn[bins_List] :=
 Interpolation[{Join[{$MinMachineNumber}, bins, {$MaxMachineNumber}], 
    Range[0, Length@bins + 1]}\[Transpose], InterpolationOrder -> 0]

binBy3[dat_, bins_, fn_] := 
  With[{IFn = binsToIFn @ bins}, 
    dat //
      GroupBy[IFn @* fn] //
      KeyMap[Round] // 
      Lookup[#, Range[Length@bins + 1], {}] &
  ]

Note that with this function $MinMachineNumber and $MaxMachineNumber are automatically used as the bounding intervals so they may be omitted from the list.

Timings compared to my first two functions on a large problem:

bins = Union @ RandomInteger[999, 300];
bins = Join[{-10}, bins, {1200}];

big = RandomReal[999, {50000, 2}];

binBy1[big, bins, Last] // Length // Timing
binBy2[big, bins, Last] // Length // Timing
binBy3[big, bins, Last] // Length // Timing
{5.63164, 269}

{5.60044, 269}

{0.109201, 271}

Coolwater's function on my machine:

binBy[big, {bins}] // Length // Timing
{9.36006, 269}
Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
  • It seems 'complicated' with respect to simple task that should perform. Just for one 'dimension' greater than BinLists do very well. I had idea for solving this with Gather by implementing additional list for classification in test function, but I think it would be slow, because I have lists around 5000 elements, so all possible pairings come in case. If I'm wrong please corect me. – Dragutin Jul 14 '16 at 12:12
  • @Dragutin I'll see if I can improve this. I posted the first solution that came to mind and I am sure it is not optimal. Why don't you try it out and see if it is fast enough in the mean time. Incidentally does it matter which end of the interval is open? – Mr.Wizard Jul 14 '16 at 12:16
  • thank you, now I will try your solution, and 'my idea' I will try later on, because, now I have to run whole bunch of code, to check if 'thing works'. It is not matter which side of interval is open. – Dragutin Jul 14 '16 at 12:22
  • @Dragutin Do you have any need for actual Infinity in your bins or will $MinMachineNumber to $MaxMachineNumber suffice? I think using Interpolation is as fast a method as I can find but it only works over the range of machine numbers. On my machine $MaxMachineNumber is ~1.8*10^308 so this should not be any real limitation unless Infinity appears in your data. – Mr.Wizard Jul 14 '16 at 12:48
  • 1
    Infinity is not neccessary, any number greater than 500 will suffice. Infinity is only because of clarity of meaning. :) – Dragutin Jul 14 '16 at 12:53
  • 1
    @Dragutin Please see my update. – Mr.Wizard Jul 14 '16 at 13:28
  • please what does @* mean? – Dragutin Nov 01 '16 at 15:07
  • 1
    @Dragutin It is an operator for Composition, introduced in version 10.0. There is a reference post for these here: http://mathematica.stackexchange.com/questions/18393/what-are-the-most-common-pitfalls-awaiting-new-users/25616#25616 -- or see the operator precedence table in the official documentation: http://reference.wolfram.com/language/tutorial/OperatorInputForms.html – Mr.Wizard Nov 01 '16 at 15:10
0
ClearAll[binF]
binF[lst_, bspec_, col_: 2] := With[{bf = #[[2]] /. 
 MapIndexed[Alternatives @@ # -> First[#2] &, BinLists[lst[[All, col]], bspec], 1] &}, 
 GatherBy[lst, bf]]

Examples:

binF[list1, 100]

{{{"1A", 1}, {"2A", 2}, {"3A", 3}, {"90A", 90}, {"80A", 80}, {"2A", 2}, {"30A", 30}},
{{"170A", 170}, {"110A", 110}},
{{"222A", 222}, {"200A", 200}, {"215A", 215}}}

binF[list1, {{0, 20, 100, ∞}}]

{{{"1A", 1}, {"2A", 2}, {"3A", 3}, {"2A", 2}},
{{"170A", 170}, {"110A", 110}, {"222A", 222}, {"200A", 200}, {"215A", 215}},
{{"90A", 90}, {"80A", 80}, {"30A", 30}}}

Alternatively,

ClearAll[binF2]
binF2[lst_, bspec_, col_: 2] :=  With[{bf =  Function[{x},
 MapIndexed[If[MemberQ[#, x[[col]]], First[#2]] &, BinLists[lst[[All, col]], bspec], 1]]},
 GatherBy[lst, bf]]
kglr
  • 394,356
  • 18
  • 477
  • 896