word_timestamps parameter results in out-of-sequence output with large-v3 #2024

AutomationAdam · 2024-02-16T11:59:55Z

AutomationAdam
Feb 16, 2024

I'm running this simple code below, which produces good results with large-v2 and better results with large-v3.
But when I use the word_timestamps=True parameter with large-v3, words and sentence fragments start to get out of sequence, usually beginning about half way through the transcript. I've tested with several spoken 2 minute mp3 files, all with clean audio.

Do I need to do something differently for large-v3, or could this be a bug?

`import whisper

audio = './test_1.mp3'
model = whisper.load_model("large-v2") #or large-v3

result = model.transcribe(
audio=audio,
language='en',
word_timestamps=True,
task="transcribe"
)

print(result["text"])`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

word_timestamps parameter results in out-of-sequence output with large-v3 #2024

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

word_timestamps parameter results in out-of-sequence output with large-v3 #2024

Uh oh!

AutomationAdam Feb 16, 2024

Replies: 0 comments

AutomationAdam
Feb 16, 2024