Whisper-v3 works better on long audio than several short audios #1913
-
Dear all, I am a newbies to ASR, and only tried Whisper-v3 for a couple of times. I have found that it performs good on a long speech. But the results become quite worse if I split the long speech to multiple short segments. Is that a common issue? Thanks a lot. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 3 replies
-
may also depends on languages |
Beta Was this translation helpful? Give feedback.
-
That's expected, because with splitting you lose context of the previous segments. |
Beta Was this translation helpful? Give feedback.
-
Are there others encountered the same problem? |
Beta Was this translation helpful? Give feedback.
-
I have experienced this, and it can be greatly improved if you provide the previous text in prompt. When calling In addition to using the prompt, I can provide two other advices with regards to short segments of audio:
|
Beta Was this translation helpful? Give feedback.
That's expected, because with splitting you lose context of the previous segments.