6

I am working in the problem where the dependent variables are ordered classes, such as bad, good, very good.

How could I declare this problem in xgboost instead of normal classification or regression?

Thanks

mommomonthewind
  • 161
  • 1
  • 5

2 Answers2

2

You can run 2 xgboost binary classifiers

  • 1 classifier classifies if sample is (good or very good)
  • 2 classifier classfies if sample is very good

  • if both true on unseen data classify as very good

  • if only 1st one true, second false classify as good both false=> classify as bad
alexprice
  • 221
  • 2
  • 7
  • What to do if first false but second true? – Ben Reiniger Oct 29 '19 at 15:50
  • if both classfiers trained well, should happen only rarely and should be classified as bad. if need more tuning can output probabilities and compare probabilities instead of labels – alexprice Oct 30 '19 at 12:45
  • Indeed, this is probably a better situation than the regression setup in the other answer in the case of conflicting uncertainty. You could just output "I don't know," or if a decision is required, make sure the classifiers are probabilistic and well-calibrated. – Ben Reiniger Oct 30 '19 at 15:05
  • You can also use the prediction/probabilities of earlier labels as features for the higher labels. For example, the classifier 2 can be given the probability that classifier 1 already indicated it was at least 'good' as a feature – DrewH Feb 14 '20 at 21:42
1

I think you can use a regression setup, e.g. bad=0, good=0.5, very good = 1 for labels, and then postprocess output of XGBoost, such as pred_value < 0.25 => prediction_label=bad, pred_value >= 0.25 and pred_value < 0.75 => prediction_label=good and so on.