VAD #96
-
I'm looking to use Whisper for voice activity detection (VAD) only. Anyone able to point me in the right direction as to how I detect presence or absence of speech in an audio clip using this model? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
I don't think in the code base the VAD is built in. One way (What I'd do) to do it is just to see if the output is an empty string. For this purpose you can turn off all the beam search params to make it greedy search to speed things up. |
Beta Was this translation helpful? Give feedback.
-
In the |
Beta Was this translation helpful? Give feedback.
In the
["segment"]
field of the dictionary returned by the functiontranscribe()
, each item will have segment-level details, and there isno_speech_prob
that contains the probability of the token<|nospeech|>
. This combined with the log probability threshold and the compression ratio threshold performs a crude VAD intranscribe()
, but you might find a better result by combining with a separate VAD tool that's more accurate.