Removing automatic grammar correction in Whisper #1631
Replies: 1 comment 3 replies
-
There is a built in normalizer that you can’t do much about without probably forking the project. It is described in the paper and is fairly readable in the code but in short it removes “ums”, standardizes contractions and various numbers, etc. Additionally, I would suspect that since it’s fairly standard practice for human transcribers to edit outputs and vendors charge extra for verbatim outputs, a significant portion of the training data is “cleaned” and so whisper will probably follow those patterns as well. As with all other problems someone will suggest you attempt to use the initial prompt argument but your milage will vary. You might also look at setting the temp. Finally as with any asr, the outputs aren’t perfect and spelling and interpretation errors are going to happen, use the large model of you can. Otherwise, review the outputs and post process ftw. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Whisper is correcting grammar in the transcription. How can we achieve raw output in transctiption. Also, due to this it is not transcribing one word transcriptions correctly. E.g It is converting 'chemist' to 'i missed.'.
Beta Was this translation helpful? Give feedback.
All reactions