Guillemets inside result text #503
apwhitelaw
started this conversation in
General
Replies: 1 comment 1 reply
-
Thanks for reporting, this happened because it (wrongly) predicted that it was a quotation, which is possible because there must've been many examples of French text in the training data which had guillemets. We tried to avoid this by suppressing tokens corresponding to these letters: Lines 248 to 256 in 8cf36f3 But «, » were not included in the default set of tokens to suppress, unintentionally. You can add them to the list in your code to avoid this, and I will also consider adding them to the default. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have been playing around with whisper and I know some French so I tried saying a sentence. One of the times I tried, the result looked like this
but what I actually said was
For those that don't know, the characters « and » are called guillemets, and are basically quotation marks used in French (and some other languages, I believe).
Is whisper adding those in because it thought it was a quotation? Or do those characters mean something specific to whisper?
It hasn't happened since, so I wasn't sure if it was only using those characters because it is French or if those characters mean something specific and could appear with any language.
Beta Was this translation helpful? Give feedback.
All reactions