6

I often need to sum values in a list of Associations according to some columns in that Association. I have a way of doing it, which will be the main part of this question, but it seems awkward. I'm hoping that there's a simpler way to do this. Perhaps even one that would allow me to create a generic function I can use in a number of projects.

Can anyone suggest a less awkward way of doing this?

a1 = {
<|"L1" -> "abc", "L2" -> "wxy", "L3" -> "lmn", "V1" -> 0123, "V2" -> 0023, "V3" -> 0111|>,
<|"L1" -> "abc", "L2" -> "xyz", "L3" -> "lmn", "V1" -> 1023, "V2" -> 0043, "V3" -> 0411|>,
<|"L1" -> "abc", "L2" -> "www", "L3" -> "lmn", "V1" -> 1200, "V2" -> 0993, "V3" -> 1111|>,
<|"L1" -> "def", "L2" -> "wxy", "L3" -> "lmn", "V1" -> 1020, "V2" -> 0231, "V3" -> 2111|>,
<|"L1" -> "def", "L2" -> "wxy", "L3" -> "mno", "V1" -> 0120, "V2" -> 0123, "V3" -> 0011|>,
<|"L1" -> "ghi", "L2" -> "www", "L3" -> "lmn", "V1" -> 0088, "V2" -> 0523, "V3" -> 0001|>,
<|"L1" -> "jkl", "L2" -> "xyz", "L3" -> "lmn", "V1" -> 0808, "V2" -> 2713, "V3" -> 1001|>,
<|"L1" -> "jkl", "L2" -> "wxy", "L3" -> "lmn", "V1" -> 8080, "V2" -> 2123, "V3" -> 2002|>,
<|"L1" -> "abc", "L2" -> "wxy", "L3" -> "lmn", "V1" -> 0099, "V2" -> 0233, "V3" -> 0077|>,
<|"L1" -> "def", "L2" -> "www", "L3" -> "lmn", "V1" -> 0757, "V2" -> 0001, "V3" -> 9011|>,
<|"L1" -> "abc", "L2" -> "wxy", "L3" -> "mno", "V1" -> 1212, "V2" -> 0009, "V3" -> 0991|>,
<|"L1" -> "ghi", "L2" -> "www", "L3" -> "mno", "V1" -> 5512, "V2" -> 0111, "V3" -> 9091|>};

a2 = Dataset[a1]

dataset of input data

a3 = a2[GroupBy[{#L1, #L2} &], Total, {"V1", "V2"}];
a4 = Normal[Values[a3]];
a5 = Normal[Keys[a3]];
a6 = AssociationThread[{"L1", "L2"} -> #] & /@ a5;
a7 = MapThread[Join, {a6, a4}];
Dataset[a7]

result of 1st sum

a3 = a2[GroupBy[{#L1} &], Total, {"V2"}];
a4 = Normal[Values[a3]];
a5 = Normal[Keys[a3]];
a6 = AssociationThread[{"L1"} -> #] & /@ a5;
a7 = MapThread[Join, {a6, a4}];
Dataset[a7]

second summation

Mitchell Kaplan
  • 3,696
  • 22
  • 34

1 Answers1

3

When the same aggregation function (e.g. Total) is to be applied to all columns, then we can use:

a2[
  GroupBy[KeyTake[{"L1", "L2"}] -> KeyTake[{"V1", "V2"}]] /* KeyValueMap[Join]
, Merge[Total]
]

dataset screenshot

a2[GroupBy[KeyTake["L1"] -> KeyTake["V2"]] /* KeyValueMap[Join], Merge[Total]]

dataset screenshot

If we want to apply a different aggregation function to each column, we can apply a more general technique (taken from Dataset collapsing/reducing):

a2[GroupBy[{#L1, #L2}&] /* Values
, <| "L1" -> Query[First, "L1"]
   , "L2" -> Query[First, "L2"]
   , "V1" -> Query[Total, "V1"]
   , "V2" -> Query[Min, "V2"]
   , "V3" -> Query[Max, "V3"]
   |>
]

dataset screenshot

WReach
  • 68,832
  • 4
  • 164
  • 269
  • I like the 2nd one better. Of course there's the additional flexibility, but it's also a little easier to follow. The only real downside is typing in the keys more times. – Mitchell Kaplan Jun 29 '17 at 16:02