Skip to content
Discussion options

You must be logged in to vote

In the ["segment"] field of the dictionary returned by the function transcribe(), each item will have segment-level details, and there is no_speech_prob that contains the probability of the token <|nospeech|>. This combined with the log probability threshold and the compression ratio threshold performs a crude VAD in transcribe(), but you might find a better result by combining with a separate VAD tool that's more accurate.

Replies: 2 comments 4 replies

Comment options

You must be logged in to vote
2 replies
@creatorrr
Comment options

@IpsumDominum
Comment options

Comment options

You must be logged in to vote
2 replies
@madroidmaq
Comment options

@LukasNel
Comment options

Answer selected by jongwook
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
6 participants