I have been working on a text-dependent speaker verification project. I have used mel-frequency cepstrum coefficients(MFCC) for the project. As far as I know, MFCC coefficients depend on the vocal characteristics of the speaker, not what he/she says. What features can be extracted from speech that rely both on the speaker's vocal characteristics and what he/she says?
Asked
Active
Viewed 82 times
2
-
basically everything that you would reasonably call a speech feature would change depending on what is spoken. Otherwise, it's little use to call it "speech feature", i.e. a distinguishing property within the class of speech. – Marcus Müller Aug 04 '22 at 13:46
-
Related; also see this paper. – OverLordGoldDragon Aug 04 '22 at 14:15
-
@MarcusMüller: You are correct, but I think what M. Fahmin means is a combination of voice features and text-dependent speech features as a function of those voice features (i.e. "the speaker"). I will try and put an answer together. – Max Aug 05 '22 at 08:14
1 Answers
0
MFCC does contain plenty of information about what is being spoken. This is assuming that the time-resolution is on the order of the length of phones (10-100 ms), and that the number of coefficients is still reasonable (13-40). In fact, this feature representation is one of the most common for keyword spotting, automated speech recognition, et.c.
Jon Nordby
- 267
- 1
- 9