How to use Whisper only to detect no_speech probability? #1935

rusty-ai · 2024-01-02T16:08:55Z

rusty-ai
Jan 2, 2024

Hi, I am only using Whisper to detect the langauge using that

# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio).to(model.device)

# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")

but it only returns probabilities, that the given 30 sec chunk of audio has the specific language, but it does not return the no_speech probability. Is there any way to only return the no_speech probability, without using .transcribe, which is computationally expensive?

glangford · 2024-01-02T20:01:19Z

glangford
Jan 2, 2024

It would be better to use a dedicated voice activity detector, such as Silero VAD. The no_speech probability in whisper is not very accurate...but if you want to use just whisper, see this discussion

VAD #96

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to use Whisper only to detect no_speech probability? #1935

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

How to use Whisper only to detect no_speech probability? #1935

Uh oh!

Uh oh!

rusty-ai Jan 2, 2024

Replies: 1 comment

Uh oh!

Uh oh!

glangford Jan 2, 2024

rusty-ai
Jan 2, 2024

glangford
Jan 2, 2024