19

Suppose I have different functions: Total, Mean and Max and the following Dataset

SeedRandom[0]
dataSet = Dataset[AssociationThread[{"a", "b", "c", "d"} -> #] & /@ RandomReal[4, {10, 4}]]

Mathematica graphics

How can I do the following: apply Total to column "b", Mean to column "a" and Max to column "d", in that order without leaving the Dataset i.e. no use of Normal? So the result should be

{22.09, 2.383, 3.765}

smayhem
  • 325
  • 1
  • 8
  • Please confirm that your answer is what you expect. Maybe try restarting your session. I initially got same answer as you, but that's obviously wrong. – RunnyKine Oct 09 '14 at 20:09
  • @RunnyKine. You're right, restarting Mathematica gave me a different result. I had used a method that was convoluted that gave me that answer. It seemed to work when I tested it, so I don't know what happened there. I will update it now. Am I right to assume there are hidden bugs with Dataset as it currently stands. – smayhem Oct 09 '14 at 20:17
  • @RunnyKine Is it possible to localize this bug? – ybeltukov Oct 09 '14 at 20:35
  • @ybeltukov. I don't know, but this is not the first time I've experienced this sort of weird behavior from using Dataset. Every time, restarting the kernel solves the problem. – RunnyKine Oct 09 '14 at 21:14

3 Answers3

13

The following should work and can be easily extended:

dataSet[Transpose /* ({Total@#[[1]], Mean@#[[2]], Max@#[[3]]} &), {#b, #a, #d} &]

Mathematica graphics

Timings:

dataSet = Dataset[AssociationThread[{"a", "b", "c", "d"} -> #] & /@ RandomReal[4, {1000000, 4}]];

(* RunnyKine *)

dataSet[Transpose /* ({Total@#[[1]], Mean@#[[2]], 
      Max@#[[3]]} &), {#b, #a, #d} &]; // AbsoluteTiming

(* {0.968942, Null} *)


(* ybeltukov faster *)

dataSet @@@ {{Total, #b &}, {Mean, #a &}, {Max, #d &}}; // AbsoluteTiming

   (* {1.218837, Null} *)

(* alancalvitti *)

dataSet[{Query[Total, "b"], Query[Mean, "a"], Query[Max, "d"]}]; // AbsoluteTiming

(* {11.037016, Null} *)
RunnyKine
  • 33,088
  • 3
  • 109
  • 176
  • Hm... Do you know why your and OP's results differ from my and @alancalvitti results? – ybeltukov Oct 09 '14 at 19:43
  • I have the same table as OP! But summation over the column "b" give me 22.09 even with manual summation. I use version 10.0.1 under Linux. – ybeltukov Oct 09 '14 at 20:01
  • @ybeltukov. You're right. I just re-computed it after restarting my kernel and got same value as you. Hmm. – RunnyKine Oct 09 '14 at 20:07
10
dataSet[{ Query[Total, "b"], Query[Mean, "a"], Query[Max, "d"]}]

enter image description here

Can be adapted to association:

dataSet[<|"total b" -> Query[Total, "b"], 
  "mean a" -> Query[Mean, "a"], "max d" -> Query[Max, "d"]|>]

enter image description here

alancalvitti
  • 15,143
  • 3
  • 27
  • 92
8

From the documentation

dataSet[Total, "b"]

22.0943

Therefore

dataSet @@@ {{Total, "b"}, {Mean, "a"}, {Max, "d"}}

{22.0943, 2.38258, 3.76476}

Faster version

dataSet @@@ {{Total, #b &}, {Mean, #a &}, {Max, #d &}}

{22.0943, 2.38258, 3.76476}

ybeltukov
  • 43,673
  • 5
  • 108
  • 212