2

I have a bunch of brief (~1 second each) wave files, some of which contain a frequency-modulated sound of interest between about 3900-4500 Hz. I thought I might run MFCC calculations on these sound files and use the results to pursue clustering/separation between files that contain the sound of interest and files that do not.

Since I'm not interested in the other frequency bands, I limited the calculation bounds to the aforementioned range. However, my MFCC matrix outputs as a bunch of NaNs, presumably because the range is so narrow. This either means that MFCCs are an inappropriate measure in general for such a narrow frequency range, or that I have yet to take advantage of the numerous other parameters involved in this calculation (e.g. window size, DCT type, etc.). What might be going on here? It could be that I should be investigating other acoustic features for this purpose instead of MFCCs, but I am a total audio rookie and don't know where to start.

(I'm only looking for a general/big picture answer to this question, but in case it helps provide context, I'm using R's melfcc() function to calculate these. I believe it is based on the same code as that for Matlab.)

EDIT: For more background.

I have a bunch of "detections" of an animal sound event from real field recordings, detected via a spectrogram cross correlation template. Some of these detections are true positives, some are false positives. I want to find acoustic features that might help me predict which are true and which are false. (In addition to the acoustic features, I can use other things such as time of day detected, etc.) Ultimately pursuing a Bayesian system in which the model can learn over time as new detection data are incorporated. Not attached to any acoustic features in particular, but not sure what to try and MFCCs seemed like somewhere to start.

  • I'm not 100% convinced MFCCs are the right thing if you need a frequency analysis of but a very small part of your nyquist bandwidth; that sounds like you don't actually make use of the features of the MFCC. Can you explain what you need this for? maybe there's other ways. – Marcus Müller Sep 09 '16 at 06:49
  • I have a bunch of "detections" of an animal sound event from real field recordings, detected via a spectrogram cross correlation template. Some of these detections are true positives, some are false positives. I want to find acoustic features that might help me predict which are true and which are false. (In addition to the acoustic features, I can use other things such as time of day detected, etc.) Ultimately pursuing a Bayesian system in which the model can learn over time as new detection data are incorporated. Not attached to any acoustic features in particular, but not sure what to try. – m0ckingbird Sep 09 '16 at 12:14
  • I'd strongly recommend editing your question and including all that info, so others will see that you've added more background! – Marcus Müller Sep 09 '16 at 12:28
  • I don't know anything about MFCC but you should look into why you are getting NaN's. If it's due to input error then fix the input. If it's generated by the internal processing of the program; that's a lousy error "message" and should be changed. Personally, I think you have a long road ahead; have you consulted the literature on doing what you want. Perhaps Bioacoustics, https://en.wikipedia.org/wiki/Bioacoustics, or http://songbirdscience.com/resources/behavior/matlab-functions might help. – rrogers Sep 13 '16 at 17:52
  • Thanks! I've consulted the literature pretty heavily. Lots of people with good ideas and workarounds, but large scale accurate detection of animals from field recordings is still a fairly intractable problem and there are going to be false positives regardless of features/clustering/classification method used. I think I'm getting NaNs because I have too few frequency bins to calculate over with the mel filterbanks in that range. Could address this by changing freq. sampling size, but the issue could pop up again depending on the freq. spread of any given animal call of interest. (1/2) – m0ckingbird Sep 14 '16 at 14:48
  • That's why I'm simply looking for a bunch of general purpose acoustic features to try. I don't care much if the features cause good clustering/classif. immediately. Ultimately my vision is some Bayesian update of my belief in the ability of certain features to predict whether the event is a true or false positive as more data rolls in. So I'm seeking general purpose acoustic features to calculate. Right now I'm investigating statistical moments of time and freq bins instead. Obviously those aren't independent, which could cause issues, but that's the kind of thing I'm going for. (2/2) – m0ckingbird Sep 14 '16 at 14:54

0 Answers0