Replies: 1 comment
-
in general the text should represent the speech in the audio , not what meant to be in the audio. For punctuation and filler words , it is yours call , but try to be consistent ( either all your transcriptions have punctuations , or all miss punctuations ) . |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I want to fine-tune Whisper on my data, but I do not know the best format for the references.
the speaker says "Oh I am a study um student. My phone number is five twenty.... It is five two zero...".
What is the best transcript for that speech:
Whisper generates a formal transcript, I do not want to change that when I fine-tune that on my data.
Beta Was this translation helpful? Give feedback.
All reactions