Option to skip silence for `whisper.pad_or_trim` #2569

rickif · 2025-04-06T08:48:40Z

rickif
Apr 6, 2025

import whisper

model = whisper.load_model("turbo")

# load audio and pad/trim it to fit 30 seconds
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)

# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio, n_mels=model.dims.n_mels).to(model.device)

# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")

I am using an example to detect the language of an audio file. If the audio begins with a silence longer than 30 seconds, the performance of the detection will be poor, and the confidence level will be low. I wonder if adding an offset option to whisper.pad_or_trim would be helpful. This way, we could skip the silent portion if the detection confidence is low.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Option to skip silence for `whisper.pad_or_trim` #2569

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Option to skip silence for whisper.pad_or_trim #2569

Uh oh!

rickif Apr 6, 2025

Replies: 0 comments

Option to skip silence for `whisper.pad_or_trim` #2569

rickif
Apr 6, 2025