What are the time and space complexities of Classify/Predict?

Question

Does anyone have knowledge of the O(n) time & space complexities for various model types supported by Classify[] and Predict[] during training and evaluation?

Here are a list of supported model types:

models = {"LogisticRegression", "Markov", "RandomForest", 
   "SupportVectorMachine", "NearestNeighbors", "NeuralNetwork", 
   "NaiveBayes"};

I tried using BenchmarkPlot[] but it is currently broken:

trainingData = ExampleData[{"MachineLearning", "MNIST"}, "TrainingData"];

Needs["GeneralUtilities`"]; Clear[g, f];
g[n_] := Classify[RandomSample[trainingData, n], Method -> "RandomForest"]
f[g_] := ClassifierInformation[g, "TrainingTime"]

BenchmarkPlot[f, g, {10, 100, 200, 400, 800, 1600, 3200, 6400}, "IncludeFits" -> True]

Update:

Even though BenchmarkPlot doesn't always work, it can on some inputs:

<<GeneralUtilities`
trainingData = ExampleData[{"MachineLearning", "MNIST"}, "TrainingData"];
plotComplexity[m_, trainingData_] := Module[{n,c,time,space,tdata,sdata},
n=Table[100*n,{n,1,50,2}];
c=Classify[RandomSample[trainingData,#],Method->m]&/@n;
time=QuantityMagnitude[ClassifierInformation[#,"TrainingTime"]]&/@c;
space=ByteCount/@c;
tdata=Thread[{n,time}]; sdata=Thread[{n,space}];
Return @ TextGrid @ {{BenchmarkPlot[sdata,"IncludeFits"->True,PlotLabel->m<>" Training Space"],
BenchmarkPlot[tdata,"IncludeFits"->True,PlotLabel->m<>" Training Time"]}}
]

Here's what the above code gives for numeric features:

trainingData = ExampleData[{"MachineLearning", "UCILetter"}, "TrainingData"];
models = {"LogisticRegression", "Markov", "RandomForest", 
   "SupportVectorMachine", "NearestNeighbors", "NeuralNetwork", "NaiveBayes"};
Column[plotComplexity[#, trainingData] & /@ models];

What is holding you back finding out this yourself? BenchmarkPlot in the GeneralUtilities package may be helpful here. — Sjoerd C. de Vries, Feb 01 '16 at 18:50
Good idea, will try tonight... Ive always found complexity hard to get a handle on in mma because there's a lot going on under the hood. I thought maybe a dev would know offhand which libraries mma uses for each type of classifier — M.R., Feb 02 '16 at 01:32
@SjoerdC.deVries seems that BenchmarkPlot doesn't like me, can you see what's going wrong here? — M.R., Feb 02 '16 at 04:17
Yeah, I forgot about that. I actually tested it myself. But of course, what BenchmarkPlot does (timing executions with various amount of data) is not that hard to do yourself. Just a Table and AbsoluteTiming should do the trick. — Sjoerd C. de Vries, Feb 02 '16 at 15:14
For now I'm doing an analysis without BenchmarkPlot but I'm not sure how to best gauge the space complexity. For training time there is a property in the classifier, but no "TrainingSpace". Just using ByteCount on the classifier or might miss allocations no? — M.R., Feb 02 '16 at 18:20
It seems that the plot images you added are cropped. We can see the title of the NaiveBayes, but there's no plot underneath (by the way: +1). — P. Fonseca, Feb 03 '16 at 08:30

What are the time and space complexities of Classify/Predict?

0 Answers0