1

I am working on a project where I am classifying coughs of a patient as either positive or negative for a certain pulmonary illness.

What I have at the moment is multiple cough events, segmented from larger recordings. I have extracted various spectral features for each cough event and want to add all of these to one feature vector to train a Logistic Regression (LR) classifier.

The problem is that each cough event is different in length and this results in my Mel-Frequency Cepstrum Coefficients (MFCC) being different (in length) too, which is an issue when training the LR classifier.

So, I want to know if anyone has some fix for dealing with MFCC feature vectors of different sizes and how to get them in the correct shape to use as a training vector. I feel really stupid, but I can't find anything about this online. Surely this issue has been encountered before (?)

Renier Botha
  • 111
  • 2
  • Perhaps you could interpolate the low length features to make them longer. – sudosuroot Feb 17 '16 at 15:07
  • Are you referring to zero padding?

    Yes that is obviously an option, but if the zeros are longer than the non-zeroes in the feature vector, will it not nullify (or at least bias) the training result on that vector?

    – Renier Botha Feb 17 '16 at 15:12
  • Use a recurrent neural network with the varying lengths, or consider shingled (overlapping) windows of fixed length then find the consensus of the classifier outputs. – Emre May 17 '16 at 17:32

0 Answers0