Significant difference in speed between .transcribe()
and .decode()
#1896
Unanswered
haixuanTao
asked this question in
Q&A
Replies: 1 comment 2 replies
-
Look at what the # This method is more accurate on shorter clips earlier in the runtime.
def transcribe(model: whisper.Whisper, audio_filepath: str, prompt = None) -> str:
audio = whisper.load_audio(audio_filepath, 22050) # This sample rate is required!
audio = whisper.pad_or_trim(audio)
mel = whisper.log_mel_spectrogram(audio, device=model.device) #.to(model.device)
opts = whisper.DecodingOptions(language='en', prompt=prompt)
res = whisper.decode(model, mel, opts)
return res.text I've noticed that on that on longer audio the method above has improved accuracy but can glitch out on unique prompt terms occasionally. The |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
When comparing:
time ~ 1.5 sec
and
time: ~ 0.7 sec
The second method is 2x faster and I was wondering if that was normal? Is there any tradeoff ?
2x faster seems iike a lot
@camilhamani
Hardware
NVidia 4090
Beta Was this translation helpful? Give feedback.
All reactions