I am interested in learning more about processing audio and thought that it will be helpful to learn while doing a project. The project involves a pseudo-speec recognition where I have a corpus of singly-recorded audio of words, say apple.wav, bat.wav, cream.wav, dog.wav, that came from a text-to-speech program.
I have an 'unknown' audio file of a recorded audio of a word (spoken by a person) which is either the word 'apple', 'bat', 'cream', or 'dog'.
Suppose the spoken 'unknown' audio file is the recording of the word 'dog', how do i go about in matching that recording to the closest text-to-speech recording (i.e. dog.wav)?
(context: i want to built a system in our chemistry lab to log our waste disposal through speech e.g. the user talks through a computer saying "ethanol" and the computer records that audio and finds the closest recording of a chemical in the library of chemical audio recordings (ethanol.wav). The file name is then recorded in a database as a log of the disposal of ethanol)
This might be a duplicate but I have been browsing but I can't find anything.