Whisper for keyword spotting #1249
-
Hi, If I have a list of keywords, how can I use whisper to get the probability that the audio is the keyword for each word in the list? The direct use of whisper can give the wrong transcription to a similar word to the word in the keyword list, whereas I am only interested in the keyword in the list. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
If the keywords are very common words so that all of them can be represented as just one token each using the tokenizer, you can compare the probabilities directly by taking softmax of the logits predicted by the model. If some keywords span multiple tokens, it becomes tricker to compare, you can either use the sum of log probability that mathematically makes more sense, or just the average log probability which works better practically, or something in between like the length penalty used in equation 14 of this paper. |
Beta Was this translation helpful? Give feedback.
-
Hi I know this might be a stupid question but how exactly do we extract the softmax layer to get logits predicted by the model? |
Beta Was this translation helpful? Give feedback.
-
Can you please tell me how did you do this? I am fairly new to LLMs. I would like to use whisper for key word spotting in my personal project. |
Beta Was this translation helpful? Give feedback.
-
Does putting the keywords directly into the prompt work? I noticed that if the single word is there, it works. if it is more than one word, it doesn't work properly. The single word messes up with other similar-sounding words. For example, I added "Artem" into the prompt, and it messes up with "startup" now. |
Beta Was this translation helpful? Give feedback.
If the keywords are very common words so that all of them can be represented as just one token each using the tokenizer, you can compare the probabilities directly by taking softmax of the logits predicted by the model. If some keywords span multiple tokens, it becomes tricker to compare, you can either use the sum of log probability that mathematically makes more sense, or just the average log probability which works better practically, or something in between like the length penalty used in equation 14 of this paper.