Aggregate function problem

Question

I use a function to do something similar to a SQL sum. I've asked about simplifying it in another post. I'm still using this function and it works well for aggregates like Total or Mean. It doesn't work for Count though. The function is:

mySum[x_, keys1_, keys2_, values_] := Module[{a1, a2, a3, a4, a5},
  a1 = Dataset[x];
  a2 = a1[GroupBy[keys1], Total, values];
  a3 = Normal[Values[a2]];
  a4 = Normal[Keys[a2]];
  a5 = AssociationThread[keys2 -> #] & /@ a4;
  MapThread[Join, {a5, a3}]
];

I tried Count directly in a Dataset, where it also failed. I really don't know what to try next. The Dataset example in Help gets into other sorts of details and I couldn't see anything applicable to this situation.

The only other thing I could think of, which I haven't tried, is using Total and Mean, joining the results and calculating the count. That seems pretty awkward.

Does anyone have any suggestions?

using the example a2 from your other question, does the following modification of WReach's answer give what you need: a2[GroupBy[KeyTake[{"L1", "L2"}] -> KeyTake[{"V1", "V2"}]] /* KeyValueMap[Join], Merge[Total@*Counts]] ? — kglr, Nov 20 '18 at 22:09
maybe Length, or Total@*Unitize or Total@*Counts instead of Count? — kglr, Nov 20 '18 at 23:27

Edmund · Answer 1 · 2018-11-22T22:46:03.663

From the OP referenced post I suggest a change to the group by aggregate function to facilitate functions that do not return scalars.

You may use Query and a bit of functional syntax sugar to create a more robust group by aggregate.

ClearAll[groupByAggregate]
groupByAggregate[
   grouping_?VectorQ,
   aggregator_?VectorQ,
   columns_,
   columnNames_?VectorQ,
   dat_?(VectorQ[#, AssociationQ] &)
   ] /; Equal @@ Map[Length, {aggregator, columns, columnNames}] :=
 Query[
   GroupBy[# /@ grouping &] /*
    KeyValueMap[
     Association[
       Thread[grouping -> #1],
       Thread[columnNames -> #2]
       ] &],
   Function[{d}, MapIndexed[#1@d[[All, First@#2]] &, aggregator]],
   columns]@dat

Then with a1 as in the reference post

groupByAggregate[
  {"L1"},
  {Total, Max},
  {"V1", "V2"},
  {"V1Total", "V2Max"},
  a1
  ] // Dataset

and even repeated columns and multi-column aggregates

groupByAggregate[
  {"L1", "L2"},
  {Total, Total, Max, Total@*Map[#V1 + 2 #V2 &]},
  {"V1", "V2", "V2", KeyTake[{"V1", "V2"}]},
  {"V1Total", "V2Total", "V2Max", "Calc"},
  a1
  ] // Dataset

Now for Count

groupByAggregate[
  {"L3"},
  {Count[n_ /; n > 1000], Count[n_ /; n < 100]},
  {"V1", "V2"},
  {"V1Count", "V2Count"},
  a1
  ] // Dataset

and Counts

groupByAggregate[
  {"L1", "L2"},
  {Counts, Counts},
  {"V1", "V2"},
  {"V1Count", "V2Count"},
  a1
  ] // Dataset

Hope this helps

Aggregate function problem

1 Answers1