Replies: 1 comment 6 replies
-
These options are not specific to Japanese, but consider trying the following (large-v3 may be more accurate but tends to hallucinate more)
And FYI in case these past discussions are helpful |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I've been using Whisper almost since the beginning, but I've never quite found a good recipe to generate Japanese subs that have accurate transcription and accurate syncing.
I've tried so many combinations of prompts but nothing ever seems to save a large amount of cleanup afterwards.
The Medium model seems to get the most accurate timing and Large/Large-V3 seems to transcribe better.
I'm working locally on Win11 with plenty of GPU. I also try on colaboratory but the results aren't much different. Here's a couple of my sample prompts:
whisper "filename" --model medium --language ja
whisper "filename" --model large-V3 --language ja --task translate
whisper "filename" --model large --language ja --task translate --word_timestamps True --temperature 0
I've searched the discussion here and couldn't find quite what I was looking. aadnk's post about VAD and using his only interface (from 2022) didn't really improve things.
I'm not a coding or prompt genius. I barely scrape by so maybe I'm missing something someone could say, "Hey ya just need to add these bits to your prompt and use this model size". Any help please?
Beta Was this translation helpful? Give feedback.
All reactions