Percentage of keys that match a criteria in a dataset

Question

I have dataset designed around a whiskey tasting. I have 25 tasters who have submitted scores on about 20 whiskeys. Here is a sample of how I structured the data in Mathematica:

tasting = {<|"sample" -> "KC120", "score" -> 70, "taster" -> "MattW"|>, <|"sample" -> "SazRye", "score" -> 70, "taster" -> "Sky"|>, <|"sample" -> "OF", "score" -> 72, "taster" -> "EdB"|>, <|"sample" -> "BT1", "score" -> 73, "taster" -> "Leif"|>, <|"sample" -> "4RSB1", "score" -> 74, "taster" -> "MattW"|>}

I want to run some analysis on the different samples to see how well they were received. I am particularly interested in seeing how many of the scores for each samples were over 85 points. It is a bit tricky because not every taster submitted results for each sample.

I can count how many scores were over 85 with the following code:

over85 = tasting[Select[#score > 85 &], "sample"][Counts]

And I can figure out how many scores were submitted with this:

totalscores = tasting[Counts, "sample"]

Here is where I am running into problems. I want to divide the number of samples that were over 85 by the total number of scores submitted for each sample and have it return a percentage for each sample.

I tried simply running:

over85/totalscores

But, all I got was a table over a table instead of the calculations being run for each sample.

Any ideas on how to run the calcs and return the results grouped by sample?

Thanks in advance!

similar: Allocate amount A according to the distribution of amount B and Normalizing the value of columns in Dataset — WReach, Dec 21 '17 at 15:28
@Sektor I initially closed this as a duplicate of those other questions that you and I linked, but upon reflection I retracted my close vote. I think that there is sufficient complication to merit a separate question. — WReach, Dec 21 '17 at 16:02

score 4 · Accepted Answer · answered Dec 21 '17 at 16:10

Let's use a dataset with some scores that are greater than 85:

SeedRandom[0]
tasting = 
  { {"KC120", "SazRye", "OF", "BT1", "4RSB1"}
  , {"MattW", "Sky", "Edb", "Leif"}
  } //
  Tuples //
  ReplaceAll[{s_, t_} :> {s, 85+RandomInteger[{-20, 10}], t}] //
  Query[Dataset, AssociationThread[{"sample", "score", "taster"}, #]&]

Using this dataset, we can query for totalscores and over85 as before:

totalscores = tasting[Counts, "sample"]

over85 = tasting[Select[#score > 85 &], "sample"][Counts]

Note that the sample BT1 has no scores over 85.

We can perform the desired "division" thus:

AssociationMap[#[[1]] -> Lookup[Normal@over85, #[[1]], 0] / #[[2]] &, totalscores]

We use Lookup[over85, ...] instead of over85[...] to accomodate samples that have no scores over 85.

Alternatively, we can obtain these results in a single purpose-built query:

tasting[GroupBy[Key["sample"] -> Key["score"]], Count[#, n_ /; n > 85] / Length[#] &]

Thanks! While I can see how the other questions are addressing similiar issues, it wasn't until I read your reply that I could fully grasp the concept. — kickert, Dec 21 '17 at 16:32

Percentage of keys that match a criteria in a dataset

1 Answers1