9

Does anyone have knowledge of the O(n) time & space complexities for various model types supported by Classify[] and Predict[] during training and evaluation?

Here are a list of supported model types:

models = {"LogisticRegression", "Markov", "RandomForest", 
   "SupportVectorMachine", "NearestNeighbors", "NeuralNetwork", 
   "NaiveBayes"};

I tried using BenchmarkPlot[] but it is currently broken:

trainingData = ExampleData[{"MachineLearning", "MNIST"}, "TrainingData"];

Needs["GeneralUtilities`"]; Clear[g, f];
g[n_] := Classify[RandomSample[trainingData, n], Method -> "RandomForest"]
f[g_] := ClassifierInformation[g, "TrainingTime"]

BenchmarkPlot[f, g, {10, 100, 200, 400, 800, 1600, 3200, 6400}, "IncludeFits" -> True]

enter image description here

Update:

Even though BenchmarkPlot doesn't always work, it can on some inputs:

<<GeneralUtilities`
trainingData = ExampleData[{"MachineLearning", "MNIST"}, "TrainingData"];
plotComplexity[m_, trainingData_] := Module[{n,c,time,space,tdata,sdata},
n=Table[100*n,{n,1,50,2}];
c=Classify[RandomSample[trainingData,#],Method->m]&/@n;
time=QuantityMagnitude[ClassifierInformation[#,"TrainingTime"]]&/@c;
space=ByteCount/@c;
tdata=Thread[{n,time}]; sdata=Thread[{n,space}];
Return @ TextGrid @ {{BenchmarkPlot[sdata,"IncludeFits"->True,PlotLabel->m<>" Training Space"],
BenchmarkPlot[tdata,"IncludeFits"->True,PlotLabel->m<>" Training Time"]}}
]

Here's what the above code gives for numeric features:

trainingData = ExampleData[{"MachineLearning", "UCILetter"}, "TrainingData"];
models = {"LogisticRegression", "Markov", "RandomForest", 
   "SupportVectorMachine", "NearestNeighbors", "NeuralNetwork", "NaiveBayes"};
Column[plotComplexity[#, trainingData] & /@ models];

enter image description here

M.R.
  • 31,425
  • 8
  • 90
  • 281
  • 1
    What is holding you back finding out this yourself? BenchmarkPlot in the GeneralUtilities package may be helpful here. – Sjoerd C. de Vries Feb 01 '16 at 18:50
  • Good idea, will try tonight... Ive always found complexity hard to get a handle on in mma because there's a lot going on under the hood. I thought maybe a dev would know offhand which libraries mma uses for each type of classifier – M.R. Feb 02 '16 at 01:32
  • @SjoerdC.deVries seems that BenchmarkPlot doesn't like me, can you see what's going wrong here? – M.R. Feb 02 '16 at 04:17
  • BenchmarkPlot is broken. – Karsten7 Feb 02 '16 at 06:13
  • Yeah, I forgot about that. I actually tested it myself. But of course, what BenchmarkPlot does (timing executions with various amount of data) is not that hard to do yourself. Just a Table and AbsoluteTiming should do the trick. – Sjoerd C. de Vries Feb 02 '16 at 15:14
  • For now I'm doing an analysis without BenchmarkPlot but I'm not sure how to best gauge the space complexity. For training time there is a property in the classifier, but no "TrainingSpace". Just using ByteCount on the classifier or might miss allocations no? – M.R. Feb 02 '16 at 18:20
  • It seems that the plot images you added are cropped. We can see the title of the NaiveBayes, but there's no plot underneath (by the way: +1). – P. Fonseca Feb 03 '16 at 08:30

0 Answers0