12

Assume I have a ClassifierFunction object learned like this:

classifier = Classify[trainSet, Method -> "DecisionTree"]

Then how can I extract the actual decision tree structure from the result?

Here an example:

fn[n_] := Which[n < 1, 1, 1 <= n < 2, 2, 2 <= n, 3];
data = Table[(r = RandomReal[{0, 4}]; r -> fn[r]), {i, 1, 1000}];
c = Classify[data, Method -> "DecisionTree"];
Henrik Schumacher
  • 106,770
  • 7
  • 179
  • 309
user1747134
  • 777
  • 4
  • 11
  • Hard to answer without a concrete minimal example for trainSet. – Henrik Schumacher Aug 27 '18 at 08:33
  • How about this step function with a branching in decision? fn[n_] := Which[n < 1, 1, 1 <= n < 2, 2, 2 <= n, 3];. Data generated as data = Table[(r = RandomReal[{0, 4}]; r -> fn[r, 1, 2, 3]), {i, 1, 1000}]; and classifying using c = Classify[data, Method -> "DecisionTree"]; then trying to visuallize using c[[1, "Model", "Tree"]] – my account_ram Sep 12 '18 at 12:36
  • @HenrikSchumacher - I see an output like this See an output like this for the question<|FeatureIndices->RawArray[Integer16,<2>],NumericalThresholds->RawArray[Real32,<2>],NominalSplits->{},Children->RawArray[Integer16,<2,2>],LeafValues->RawArray[UnsignedInteger16,<3,3>],RootIndex->2,NominalDimension->0|>withclassifier[[1, "Model", "Tree"]]` . Can be improved further? – my account_ram Sep 12 '18 at 12:42
  • 1
    Are you interested in visualizing the Decision Trees made by Classify, or you just want to visualize a Decision Tree over some data? – Anton Antonov Sep 12 '18 at 13:22
  • @AntonAntonov - I was curious about visualizing the DecisionTrees made by Classify... Your answer in this thread was what I was looking for – my account_ram Sep 14 '18 at 15:18
  • @myaccount_ram You can take a look at this MSE answer of "Creating Identification/Classification trees". And here is another Decision Trees visualization application. – Anton Antonov Sep 14 '18 at 16:00
  • @AntonAntonov is it possible to visualize the output of Classify[] – my account_ram Sep 15 '18 at 06:43

2 Answers2

9

Here is one way to visualize/interpret Classify's tree structure from Henrik Schumacher's answer.

SeedRandom[432]
fn[n_] := Which[n < 1, 1, 1 <= n < 2, 2, 2 <= n, 3];
data = Table[(r = RandomReal[{0, 4}]; r -> fn[r]), {i, 1, 1000}];
c = Classify[data, Method -> "DecisionTree"];

tree = c[[1, "Model", "Tree"]];

fromRawArray[a_RawArray] := Developer`FromRawArray[a];
fromRawArray[a_] := a;
Map[Normal, fromRawArray /@ tree[[1]]]

(* <|"FeatureIndices" -> {1, 1}, 
 "NumericalThresholds" -> {-0.894867, -0.0252423}, 
 "NominalSplits" -> {}, "Children" -> {{-2, -3}, {1, -1}}, 
 "LeafValues" -> {{1, 1, 508}, {246, 1, 1}, {1, 249, 1}}, 
 "RootIndex" -> 2, "NominalDimension" -> 0|> *)

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/AVCDecisionTreeForest.m"]

dtree = BuildDecisionTree[List @@@ data]

(* {{0.374931, 1.9972, 1, Number, 
  1000}, {{0.499981, 1.0175, 1, Number, 
   493}, {{{245, 1}}}, {{{248, 2}}}}, {{{507, 3}}}} *)

LayeredGraphPlot[DecisionTreeToRules[dtree], 
 VertexLabeling -> True] 

enter image description here

There is a discrepancy of 1 in the obtained values, but otherwise the second tree seems to approximate Classify's one well. The splitting thresholds of Classify's tree a most likely obtained over the data being transformed with some embedding/hashing/normalization.

Anton Antonov
  • 37,787
  • 3
  • 100
  • 178
5

In general, many objects generated by Mathematica can be inspected by trying things of the form InputForm[classifier] or classifier[string] with string being one of the elements of classifier["Properties"]. Recently implemented objects such as ClassifierFunction are mere wrappers for well-structured Associations (thumbs up for this approach!), so classifier[[1]] can be very revealing.

The following reveals that

tree = c[[1, "Model", "Tree"]]

is another such object (with head MachineLearning`DecisionTree).

Inspecting

tree[[1]]

reveals that MachineLearning`DecisionTree are partially composed of RawArrays. These can be converted to usual integer arrays as follows:

fromRawArray[a_RawArray] := Developer`FromRawArray[a];
fromRawArray[a_] := a;
fromRawArray /@ tree[[1]]

<|"FeatureIndices" -> {1, 1}, "NumericalThresholds" -> {0.402823, 0.82983}, "NominalSplits" -> {}, "Children" -> {{-1, 2}, {-2, -3}}, "LeafValues" -> {{619, 1, 1}, {1, 130, 1}, {1, 1, 254}}, "RootIndex" -> 1, "NominalDimension" -> 0|>

But I cannot tell you how to interpret this data. I would have to learn first what a decision tree is and how it is constructed...

Henrik Schumacher
  • 106,770
  • 7
  • 179
  • 309