Replies: 4 comments
-
Try the large or medium model?? whisper.cpp is outputting, for example (see ggml-org/whisper.cpp#1240 ): (source: https://www.bbc.co.uk/programmes/m001sdb3 )
I like this one the best, though:
The above is with whisper.cpp and large v2, which often outputs more "fun" transcriptions for music than regular whisper... Regular whisper (or rather faster-whisper in this case) doesn't seem to output as much "fun" stuff, but useful nonetheless: https://rumble.com/v2nvr1w-trainspodder-helping-decode-reggae-lyrics.html If anybody cares to enlighten re why different levels of "fun" are available in different whisper variants, please opine! |
Beta Was this translation helpful? Give feedback.
-
I did see some fun output with large model in English. What if the language is not English? Do they fine tune themselves? |
Beta Was this translation helpful? Give feedback.
-
Whisper is explicitly filtering such char/token out. See here: Line 247 in e58f288 You can change the char/token list with the suppress_tokens option:Line 413 in e58f288 |
Beta Was this translation helpful? Give feedback.
-
@EtienneAb3d I already tested whisper cli with no suppress_tokens, it does transcribe some ♪ in English. But it doesn't work with Chinese audio which is song and speech mixed. They all just transcribed as normal characters. So I'd like to know whether the sites supporting Chinese subtitle for lyrics fine-tuned themselves. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I see the video sites generate the lyrics with ♪ around, for example, "♪Jingle bells♪, ♪jingle bells♪, ♪jingle all the way♪".
And there will be no ♪ with speech only, for example, Merry Christmas.
How can I achieve this? Or is there a way to achieve it with comination of other model? Thank you.
Beta Was this translation helpful? Give feedback.
All reactions