You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using coquiTTS to generate audio files from text (in french) but using the cloning feature the output files present audio glitches and sometimes ghostly/hallucination voices. Because the randomness of the output audio (talking about prosody) the quantity of glitches change also. To avoid that i fix the randomness seed to be able to reproduce the possible best result.
I haven't success to get a clean audio in my multiple tests, so (here comes the idea) i'm thinking about using Whisper to identify these glitches because they aren't really 'words' and removing them.
I start to write a script to compare the WER & CER and using jiwer library to get some 'confidance value' i could use. I would like some guidance or suggestion.
Thanks
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm using coquiTTS to generate audio files from text (in french) but using the cloning feature the output files present audio glitches and sometimes ghostly/hallucination voices. Because the randomness of the output audio (talking about prosody) the quantity of glitches change also. To avoid that i fix the randomness seed to be able to reproduce the possible best result.
I haven't success to get a clean audio in my multiple tests, so (here comes the idea) i'm thinking about using Whisper to identify these glitches because they aren't really 'words' and removing them.
I start to write a script to compare the WER & CER and using jiwer library to get some 'confidance value' i could use. I would like some guidance or suggestion.
Thanks
Beta Was this translation helpful? Give feedback.
All reactions