Best prompt to transcribe Japanese? #2151

shonokin · 2024-04-29T16:32:53Z

shonokin
Apr 29, 2024

I've been using Whisper almost since the beginning, but I've never quite found a good recipe to generate Japanese subs that have accurate transcription and accurate syncing.

There's usually some stuff that's grossly off when transcribing. Words or phrases which are not being picked up super well, even during moments with no background noise.
Subs go out of sync quite often, my theory is that as Whisper cuts the audio into 30 second bits that it's losing the proper timing in places.
usually there's a bit of hallucinations and single words with 10 or 20 sub entries in a row.

I've tried so many combinations of prompts but nothing ever seems to save a large amount of cleanup afterwards.
The Medium model seems to get the most accurate timing and Large/Large-V3 seems to transcribe better.
I'm working locally on Win11 with plenty of GPU. I also try on colaboratory but the results aren't much different. Here's a couple of my sample prompts:

whisper "filename" --model medium --language ja
whisper "filename" --model large-V3 --language ja --task translate
whisper "filename" --model large --language ja --task translate --word_timestamps True --temperature 0

I've searched the discussion here and couldn't find quite what I was looking. aadnk's post about VAD and using his only interface (from 2022) didn't really improve things.

I'm not a coding or prompt genius. I barely scrape by so maybe I'm missing something someone could say, "Hey ya just need to add these bits to your prompt and use this model size". Any help please?

glangford · 2024-04-29T18:04:48Z

glangford
Apr 29, 2024

These options are not specific to Japanese, but consider trying the following (large-v3 may be more accurate but tends to hallucinate more)

--patience 2 --suppress_tokens "" --word_timestamps True --model large-v2

And FYI in case these past discussions are helpful

6 replies

FriedGenera May 28, 2024

--patience 2 --suppress_tokens "" --word_timestamps True --model large-v2

What does each of these options do?

itaipee May 29, 2024

may i ask what is "patience" ? i did not encounter it so far , and when i looked at other examples it was either none, did not specify.
also , how does it help avoiding hallucinations ?

glangford May 29, 2024

what is "patience" ?

See https://arxiv.org/abs/2204.05424

how does it help avoiding hallucinations ?

The model large-v2 hallucinates less (in practice) than large-v3.

FriedGenera Jul 1, 2024

what does --suppress_tokens "" do when you don't specify an int? @glangford

glangford Jul 1, 2024

@FriedGenera The idea for --suppress_tokens comes from this past discussion, it seems to reduce hallucination in practice but I haven't seen an explanation for why it works. It is worth testing with and without this option to see what works best for you.

[Bug] --suppress_tokens: default value wreaking havoc #1488

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Best prompt to transcribe Japanese? #2151

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 6 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Best prompt to transcribe Japanese? #2151

Uh oh!

shonokin Apr 29, 2024

Replies: 1 comment · 6 replies

Uh oh!

Uh oh!

glangford Apr 29, 2024

Uh oh!

FriedGenera May 28, 2024

Uh oh!

itaipee May 29, 2024

Uh oh!

glangford May 29, 2024

Uh oh!

FriedGenera Jul 1, 2024

Uh oh!

glangford Jul 1, 2024

shonokin
Apr 29, 2024

Replies: 1 comment 6 replies

glangford
Apr 29, 2024