Significant difference in speed between `.transcribe()` and `.decode()` #1896

haixuanTao · 2023-12-14T08:39:13Z

haixuanTao
Dec 14, 2023

When comparing:

now = time.time()

result = model.transcribe("audio.mp3")

elapsed = now - time.time()
print("Elapsed time: " + str(elapsed) + " seconds")

time ~ 1.5 sec

and

now = time.time()

# load audio and pad/trim it to fit 30 seconds
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)

# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio).to(model.device)

# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")

# decode the audio
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)

elapsed = now - time.time()
print("Elapsed time: " + str(elapsed) + " seconds")

time: ~ 0.7 sec

The second method is 2x faster and I was wondering if that was normal? Is there any tradeoff ?

2x faster seems iike a lot

@camilhamani

Hardware
NVidia 4090

T145 · 2023-12-15T21:38:56Z

T145
Dec 15, 2023

Look at what the model.transcribe method is actually doing. It's a lot more than the default example! Here's the method I like to use:

# This method is more accurate on shorter clips earlier in the runtime.
def transcribe(model: whisper.Whisper, audio_filepath: str, prompt = None) -> str:
	audio = whisper.load_audio(audio_filepath, 22050) # This sample rate is required!
	audio = whisper.pad_or_trim(audio)
	mel = whisper.log_mel_spectrogram(audio, device=model.device) #.to(model.device)
	opts = whisper.DecodingOptions(language='en', prompt=prompt)
	res = whisper.decode(model, mel, opts)
	return res.text

I've noticed that on that on longer audio the method above has improved accuracy but can glitch out on unique prompt terms occasionally. The model.transcribe method is very consistent with a custom vocabulary but can glitch on punctuation and casing, which should be easily fixed with an external program or another Python library.

2 replies

ejentos Dec 17, 2023

@T145 , what are the libraries for 'punctuation and casing' ? Thanks in advance

T145 Dec 18, 2023

I use LanguageTool, but there are many options. Spacy, NLTK, pyspellchecker, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Significant difference in speed between `.transcribe()` and `.decode()` #1896

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Significant difference in speed between .transcribe() and .decode() #1896

Uh oh!

Uh oh!

haixuanTao Dec 14, 2023

Replies: 1 comment · 2 replies

Uh oh!

T145 Dec 15, 2023

Uh oh!

ejentos Dec 17, 2023

Uh oh!

T145 Dec 18, 2023

Significant difference in speed between `.transcribe()` and `.decode()` #1896

haixuanTao
Dec 14, 2023

Replies: 1 comment 2 replies

T145
Dec 15, 2023