3

I have been working with a dataset of predictor functions -- mainly so I can have more insight in to what the functions are doing and how they behave.

Where I am struggling is using PredictorMeasurements -- ideally as another new column in the dataset. So the end result I am looking for is :

<|pFunction -> somePfunction, pMeasure -> PredictorMeasurements[somepFunction, someTestdata]|>

This is where I have got to so far - what am I missing?

header = {"Input", "Output"};
trainingset = {{1, 2}, {3, 4.5}, {5, 6}, {7, 8.5}};
testset = {{1.5, 2}, {4, 5}, {6, 5.5}};
methods = {"NearestNeighbors", "LinearRegression", "NeuralNetwork", "RandomForest"};
samps = {2, 4};
headers = {"sampsize", "method", "pFunction"};

(* Create Datasets of test and training sets *)

trainingsetDS = 
Dataset[Flatten[AssociationThread[header -> #] & /@ trainingset]];

testsetDS = 
Dataset[Flatten[AssociationThread[header -> #] & /@ testset]];

(* Create an association of inputs of form <|samplesize, method|> *)

predictorInputs = 
  Flatten[AssociationThread[headers[[1 ;; 2]] -> #] & /@ Tuples[{samps, methods}]];

(* Create a list of predictor functions *)

p = 
  Predict[
    RandomSample[trainingsetDS, #"sampsize"] -> "Output", 
    Method -> #"method"] & /@ predictorInputs;

(* Create a dataset of predictor functions with some useful keys for querying*)

pDS = Dataset[
  Flatten[AssociationThread[{"samplesize", "method", "pFunction"} -> #] & /@ 
    Partition[Riffle[Flatten[Tuples[{samps, methods}]], p, 3], 3]]];

(* I can select results based on the predictor function information rather 
than using the keys! *)

result = 
  pDS[
    Select[
      PredictorInformation[#pFunction, Method] == "NearestNeighbors" || 
      PredictorInformation[#pFunction, "ExampleNumber"] == 2 &], 
    {"samplesize", "pFunction"}]

(* I've figured out how to get predictor measurement for 1 result using DS key "Output" *)

PredictorMeasurements[result[[1, 2]], testsetDS -> "Output", "StandardDeviation"]

Ideally I would like to get to a stage where I can filter the dataset of functions based on one or more properties of PredictorMeasurements just like I can the PredictorInformation.

J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574
Gordon Coale
  • 2,341
  • 15
  • 20

1 Answers1

4

Answer to my own question, this will give me a list of PredictorMeasurements objects

PredictorMeasurements[#, testsetDS -> "Output"] & /@ 
pDS[[All, "pFunction"]]

Including Append gives either

Append[#, 
"pMeasure" -> 
PredictorMeasurements[#pFunction, testsetDS -> "Output"]] & /@ pDS

or in the "Dataset" form - which is what I wanted :

pDS[All, Append[#, 
"pMeasure" -> 
PredictorMeasurements[#pFunction, testsetDS -> "Output" ]] &]
Gordon Coale
  • 2,341
  • 15
  • 20