0

I am using K-Fold cross validation from sklearn.model_selection for evaluating the performance of my model. K=10 and the K-fold cross-validation is set as:

kfcv=Kfold(n_splits=10, random_state=0, shuffle=True)

The result of the first fold is 70% while remaining 9 folds are 100%. I have set random state to another value (such as 50), the same problem.

Why is the high discrepancy only with the first fold? I have used 5 fold and the same problem with the first fold. I expect that other folds should also reflects a decrease since the division is random and I also set shuffle to be true.

Is there anything am doing wrongly? If not, what would be the likely explanation for this?

Thanks.

A_A
  • 10,650
  • 3
  • 27
  • 35
  • How many "examples" are there that the K-Fold operates upon? – A_A Mar 14 '19 at 13:17
  • Thanks for your response. Fine-tuned on a small dataset with 2420 examples. – I.O Animasahun Mar 15 '19 at 03:49
  • And this discrepancy is consistent across more than one runs? – A_A Mar 15 '19 at 07:30
  • not consistent. Only with the first fold (70%) while others gave me 100%. Why the discrepancy is only with the first fold and other folds gave good results? Is this strange and if not, what could be the likely explanation for this? – I.O Animasahun Mar 15 '19 at 09:39
  • If you run K-fold once, it would run the computation partitioning the data in one way. If you run it again, the data will be partitioned in a different way. Is this behaviour that you report consistent between runs? – A_A Mar 15 '19 at 10:55
  • Yes, the accuracy after every run and only the first fold is low (70%) and others 9 folds are 100%. The code for the division in a loop is like this:
    kfcv = KFold(n_splits=10, random_state=0, shuffle=True) for trn_idx, tst_idx in kfcv.split(data): x_train = data[trn_idx, y_train1 = target[trn_idx] x_test =data[tst_idx, y_test1 = target[tst_idx]
    – I.O Animasahun Mar 15 '19 at 12:20

0 Answers0