Replies: 1 comment
-
Honestly Large-V3 seems to suck in my experience. I tried it on my latest video and it hallucinated in multiple places, which V2 almost never does. Sometimes V2 mishears a word or something, but it is very very rare that it completely hallucinates. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi! I've been using Whisper for a while now, it's wonderful :D
I've been testing this new model for a couple of days and it's not working as intended.
Using the same source of audio with v2 and v3 results in drastically different outputs.
The most notable difference is that when it doesn't recognize a phrase, proceeds to repeat it many times.
This happened sometimes with v2, but in this case the timestamps are the same through multiple lines and the repetitions last longer.
Another thing is that it works better with automatic language detection than with the language of the video itself. That is, even though the video is in Spanish, it works better if I put auto on it.
If anyone has an explanation for why is this happening and perhaps needs more specific information, I'll be glad to give it. I haven't done it now since the only change I make to the code is changing from large-v2 to large-v3, so it shouldn't be a code problem.
Beta Was this translation helpful? Give feedback.
All reactions