"Субтитры подогнал «Симон»" - dirty datasets for Russian subtitles. #2131
LocalVoidPictures
started this conversation in
General
Replies: 1 comment 1 reply
-
large-v2 has much lesser hallucinations than the original large model please try if it does not help , you need to research a bit on hallucinations , there are several posts on the subject. p.s. do the 7 minutes in the video include any speech , or just silence or music ? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
On multiple occasions, I have received "nonsensical" results, such as the one in the title (read literally - "Subtitles by Simon"). You can frequently find this line by doing a simple online search. This phrase has nothing in common with actual transcription. Including it in the dataset results in wildly incorrect results.
Funnily enough, I'm only getting this when using a large model, not medium or tiny.
Where do we report this?
Here's a sample of what I'm getting sometimes:
Beta Was this translation helpful? Give feedback.
All reactions