4

Suppose I have a binary classification problem with 10 features and about 1000 samples. In the training set, most of my data is unlabeled (75%). The rest of the data is labeled but contains only positive labels.

In the test set, I have both negative and positive labels. How should I approach this classification problem?

emmy
  • 41
  • 2

2 Answers2

2

I would use a novelty detection approach: Use SVMs (one-class) to find a hyperplane around the existing positive samples. Alternatively, you could use GMMs to fit multiple hyper-ellipsoids to enclose the positive examples. Then given a test image, for the case of SVMs, you check whether this falls within the hyperplane or not. For GMMs, you check if it is enclosed in the hyper-ellipsoids. They are both proven to work well in practice.

If you also have some unlabled data in your training set, I would certainly adapt a variant of transfer learning. Maybe you would be able to automatically label the unlabeled data, based on the already learnt samples.

Tolga Birdal
  • 5,465
  • 1
  • 16
  • 40
1

I usually train on those positive labels and find the minimum threshold that accepts it as positive and then consider every sample less than this threshold as negative.

This method should work only if your data is big enough.

Peter K.
  • 25,714
  • 9
  • 46
  • 91
Humam Helfawi
  • 266
  • 4
  • 14