0

I am trying to use (not implement VAD algorithm) voice activity detection to get timestamps for a given audio but facing hard time doing so.


What I am trying to achieve ?

Find an offline library for voice activity detection


What I have ?
I have an audio file (mp3)


What I am trying to get ?
Given an audio file (mp3), script/library should return the indexes/timestamps of human voice.


What I tried ?

Used https://github.com/kdavis-mozilla/vad.js and tested VAD with audio stream but it is returning whether it detects voice or not, but no way to get timestamps of when the script detects human voice.


Example ?

Script should take an audio file with 1 minute duration where human voice starts from 00:00:05 to 00:00:20 and 00:00:39 to 00:00:45 and 00:00:50 to 00:00:60 and return timestamps as output like

[[00:00:05,00:00:20],[00:00:39,00:00:45],[00:00:50,00:00:60]]

Can you suggest me a library or Git repository for my scenario ?

I have seen similar question which was asked 10 years ago but I'm not sure if the answers are still relevant

1 Answers1

1

I did a project once where I was trying to use machine learning to detect the tone of single syllable words in mandarin (there are 4 or 5 unique tones which change the meaning of the word). In order to do this I had a dataset of recorded audio files which I needed to find sections in which people were talking and sections that were quiet. To do this I used the Librosa library in python. If you are able to use python the librosa.effects.split function appears to be exactly what you are looking for. The description is "Split an audio signal into non-silent intervals."

CMH12
  • 89
  • 4
  • VAD algorithms are usually designed to work in many different environments: voice in ambient noise , voice in music, etc. Voice in silence is rarely useful, and doesn’t require any complicated algorithm (simple power levels are sufficient). – Jdip Jan 12 '23 at 21:12
  • Maybe try following this guide. It looks like Librosa has the functionality you are looking for

    I also found this repository

    – CMH12 Jan 13 '23 at 13:23
  • Thank you for the answer – Gangadhar Jannu Jan 15 '23 at 20:43