Machine learning with only positive labels

Question

Suppose I have a binary classification problem with 10 features and about 1000 samples. In the training set, most of my data is unlabeled (75%). The rest of the data is labeled but contains only positive labels.

In the test set, I have both negative and positive labels. How should I approach this classification problem?

I think what you are trying to achieve is anomaly detection. This is different from a classification problem. — applesoup, May 10 '16 at 16:57

score 2 · Answer 1 · answered Mar 11 '16 at 03:03

I would use a novelty detection approach: Use SVMs (one-class) to find a hyperplane around the existing positive samples. Alternatively, you could use GMMs to fit multiple hyper-ellipsoids to enclose the positive examples. Then given a test image, for the case of SVMs, you check whether this falls within the hyperplane or not. For GMMs, you check if it is enclosed in the hyper-ellipsoids. They are both proven to work well in practice.

If you also have some unlabled data in your training set, I would certainly adapt a variant of transfer learning. Maybe you would be able to automatically label the unlabeled data, based on the already learnt samples.

score 1 · Answer 2 · edited Oct 12 '15 at 21:08

1

I usually train on those positive labels and find the minimum threshold that accepts it as positive and then consider every sample less than this threshold as negative.

This method should work only if your data is big enough.

edited Oct 12 '15 at 21:08

Peter K.

25,714
9
46
91

answered Oct 12 '15 at 20:50

Humam Helfawi

266
4
14

Machine learning with only positive labels

2 Answers2