How to deal with wrong homonym vocabulary in ASR results #1595
Unanswered
LeeHaha314
asked this question in
Q&A
Replies: 1 comment
-
@LeeHaha314 , have you considered WhisperBiasing as a possibile solution? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all! Now I am using whisper for Chinese ASR, and I find there are many wrong homonym vocabularies in whisper results.
At first, I tried the official solution that using initial_prompt to deal with this issue. I collected special nouns and collocations in my dataset as a vocab list, then join them to a str to pass them with the initial_prompt parameter, but it didn't work as expected. Every time I changed the prompt with a update in my vocab list, the result of the same audio changed a lot. Meanwhile, it turns out that the transcribe results with such initial prompt are more likely to contain hallucinations text and fall into a repetition loop... I struggled with it for a while and still no idea to fix it.
Then I plan to do some postprocess. In Chinese it's unreasonable to apply some methods like levenshtein distance since similar pronouciation could result in totally different character. As for phonetic matching, it might be a solution, but how to combine it with local personal vocabulary also takes time to figure out.
Now I found some phonetic algorithm for indexing Chinese characters by sound, then I checked the ASR results of whisper with word_timestamp to analyze word segments because I thought it might be a base input for further process. However, most of the results of word segments for Chinese are just character segments.(e.g. 公司 would be recognized into two different segments 公 and 司), which means that I can't use the result of word segments directly to match words in my personal vocab list. I still need to apply an additional match strategy for correction.
Is there any good idea or practice for something like that?
UPDATE: Now I use another tokenizer with better performance on Chinese text and a tool computing phonetic distance of two words based on pinyin to match source text with my vocab list. I code the postprocess logic. To some extent, it does work with some limits. And I also need to contain a whitelist to skip some similar words in my vocab list manually.
Looking forward to other inspired ideas lol
Beta Was this translation helpful? Give feedback.
All reactions