using the large model within python #1354

bradleyb127 · 2023-05-16T06:36:29Z

bradleyb127
May 16, 2023

Installed Whisper and everything works from the command line and within a python script.

However, when using the following command line command, I get much better results (as expected):
whisper --model large ".\20230428.mp3"

There are words in the audio that are transcribed correctly this way.

However, I'm not having any luck using the large model from within a python script.

I have tried the following:

import os
import openai
openai.api_key = "key"
audio_file = open("20230428.mp3", "rb")
transcript = openai.Audio.transcribe("large", audio_file)
print(transcript.text)

and:


import whisper

model = whisper.load_model("large-v2")
result = model.transcribe("20230428.mp3")
print (result["text"])

and:

import whisper
load the model
model = whisper.load_model("large-v2")
get transcription
result = model.transcribe("20230428.mp3", 
language="english")
result contains 3 output,
result['text'] --> complete transcription that with punctuations
result['segments'] --> segment wise transcription with timestamps and other details 
result['langauge'] --> detected language of the audio
print (result["text"])

I'm able to get the file transcribed, but not with the accuracy as the cli results and it runs much faster so for those two reasons, I don't believe it's actually using the large (or large-v2) model.

Any suggestions would be appreciated as this has been a wall I can't figure out for about a week now.

And, thank you.

Brad

Answered by glangford

May 16, 2023

This is likely because the command line invocation is using different default settings for the transcription, as described here:

#177 (comment)

You can just load the "large" model, it is the same as "large-v2". Try changing the transcription line in your 3rd code block to:

model.transcribe("20230428.mp3", beam_size=5, best_of=5)

View full answer

glangford · 2023-05-16T10:25:40Z

glangford
May 16, 2023

This is likely because the command line invocation is using different default settings for the transcription, as described here:

Much slower via command line tool than in Python? #177 (comment)

You can just load the "large" model, it is the same as "large-v2". Try changing the transcription line in your 3rd code block to:

model.transcribe("20230428.mp3", beam_size=5, best_of=5)

1 reply

bradleyb127 May 16, 2023
Author

That worked - thank you! I was able to get the same results within a python script as I was from the command line by changing the transcription line as you suggested.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

using the large model within python #1354

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

using the large model within python #1354

Uh oh!

bradleyb127 May 16, 2023

Replies: 1 comment · 1 reply

Uh oh!

glangford May 16, 2023

Uh oh!

bradleyb127 May 16, 2023 Author

bradleyb127
May 16, 2023

Replies: 1 comment 1 reply

glangford
May 16, 2023

bradleyb127 May 16, 2023
Author