I've noticed in some literature that the authors chose to use the mean and standard deviation of the extracted MFCC features.
"ANALYSIS AND VOICE RECOGNITION IN INDONESIAN LANGUAGE USING MFCC AND SVM METHOD", Harvianto
Can you offer some insights as to why this approach is considered after the MFCC extraction?
I have seen the mean and standard deviation stacked horizontally like this:
mfccs = (np.hstack((np.mean( mfccs, axis=1), np.std( mfccs, axis=1)))
Or are you asking the REASONING behind taking the mean and std? As in what's the physical meaning of taking the mean and std?
– Jdip Sep 02 '22 at 17:14