0

I use a function to do something similar to a SQL sum. I've asked about simplifying it in another post. I'm still using this function and it works well for aggregates like Total or Mean. It doesn't work for Count though. The function is:

mySum[x_, keys1_, keys2_, values_] := Module[{a1, a2, a3, a4, a5},
  a1 = Dataset[x];
  a2 = a1[GroupBy[keys1], Total, values];
  a3 = Normal[Values[a2]];
  a4 = Normal[Keys[a2]];
  a5 = AssociationThread[keys2 -> #] & /@ a4;
  MapThread[Join, {a5, a3}]
];

I tried Count directly in a Dataset, where it also failed. I really don't know what to try next. The Dataset example in Help gets into other sorts of details and I couldn't see anything applicable to this situation.

The only other thing I could think of, which I haven't tried, is using Total and Mean, joining the results and calculating the count. That seems pretty awkward.

Does anyone have any suggestions?

Edmund
  • 42,267
  • 3
  • 51
  • 143
Mitchell Kaplan
  • 3,696
  • 22
  • 34
  • 5
    any examples to work with? – Kuba Nov 20 '18 at 21:57
  • using the example a2 from your other question, does the following modification of WReach's answer give what you need: a2[GroupBy[KeyTake[{"L1", "L2"}] -> KeyTake[{"V1", "V2"}]] /* KeyValueMap[Join], Merge[Total@*Counts]] ? – kglr Nov 20 '18 at 22:09
  • 1
    maybe Length, or Total@*Unitize or Total@*Counts instead of Count? – kglr Nov 20 '18 at 23:27

1 Answers1

2

From the OP referenced post I suggest a change to the group by aggregate function to facilitate functions that do not return scalars.

You may use Query and a bit of functional syntax sugar to create a more robust group by aggregate.

ClearAll[groupByAggregate]
groupByAggregate[
   grouping_?VectorQ,
   aggregator_?VectorQ,
   columns_,
   columnNames_?VectorQ,
   dat_?(VectorQ[#, AssociationQ] &)
   ] /; Equal @@ Map[Length, {aggregator, columns, columnNames}] :=
 Query[
   GroupBy[# /@ grouping &] /*
    KeyValueMap[
     Association[
       Thread[grouping -> #1],
       Thread[columnNames -> #2]
       ] &],
   Function[{d}, MapIndexed[#1@d[[All, First@#2]] &, aggregator]],
   columns]@dat

Then with a1 as in the reference post

groupByAggregate[
  {"L1"},
  {Total, Max},
  {"V1", "V2"},
  {"V1Total", "V2Max"},
  a1
  ] // Dataset

Mathematica graphics

and even repeated columns and multi-column aggregates

groupByAggregate[
  {"L1", "L2"},
  {Total, Total, Max, Total@*Map[#V1 + 2 #V2 &]},
  {"V1", "V2", "V2", KeyTake[{"V1", "V2"}]},
  {"V1Total", "V2Total", "V2Max", "Calc"},
  a1
  ] // Dataset

Mathematica graphics

Now for Count

groupByAggregate[
  {"L3"},
  {Count[n_ /; n > 1000], Count[n_ /; n < 100]},
  {"V1", "V2"},
  {"V1Count", "V2Count"},
  a1
  ] // Dataset

Mathematica graphics

and Counts

groupByAggregate[
  {"L1", "L2"},
  {Counts, Counts},
  {"V1", "V2"},
  {"V1Count", "V2Count"},
  a1
  ] // Dataset

Mathematica graphics

Hope this helps

Edmund
  • 42,267
  • 3
  • 51
  • 143