Is there a way to work with a limited vocabulary? #843
-
EDIT: Improving this question. After further research, I found that what I'm looking for is called ASR using limited vocabulary. I'm wondering if that is possible with Whisper and how I may go about doing that. DeepSpeech has something like this: https://github.com/mozilla/DeepSpeech/blob/master/data/lm/generate_lm.py |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 5 replies
-
It's not trivial, but it would be simpler if every word in your list can be represented in a single token, like: from whisper.tokenizer import get_tokenizer
tokenizer = get_tokenizer(multilingual=False)
tokenizer.encode(" cat") # [3797]
tokenizer.encode(" dog") # [3290]
tokenizer.encode(" mouse") # [10211] At this point it become just a multi-class classification. Otherwise, during decoding, you could add another instance of Lines 367 to 380 in 0f39c89 |
Beta Was this translation helpful? Give feedback.
-
IMO, a solid way to attack the problem would be through (1) recognition, (2) distance calculation, both morphologically (e.g., vanilla Levenshtein) and phonetically (e.g., double metaphone), and attach weights to each such extracted "feature" by batch (supervised) learning, and finally, (3) use a suitable RL technique, (e.g., Q) to learn weights corresponding to idiosyncratic hyperarticulations of (a group of) individuals. Further, using tokens in this use case may not be applicable dynamically that limits its applicability. |
Beta Was this translation helpful? Give feedback.
-
Just in case somebody is struggling to implement the solution suggested by @jongwook.
Usage
|
Beta Was this translation helpful? Give feedback.
It's not trivial, but it would be simpler if every word in your list can be represented in a single token, like:
At this point it become just a multi-class classification. Otherwise, during decoding, you could add another instance of
LogitFilter
which would block all token sequences except the ones you allow:whisper/whisper/decoding.py
Lines 367 to 380 in 0f39c89