fix: suppress common whisper hallucinations during silence#3884
Open
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
Open
fix: suppress common whisper hallucinations during silence#3884devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
Conversation
Co-Authored-By: john@hyprnote.com <john@hyprnote.com>
✅ Deploy Preview for hyprnote-storybook canceled.
|
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
✅ Deploy Preview for hyprnote canceled.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Expands the Whisper hallucination filter in
crates/whisper-local/src/model/actual.rsto suppress more known hallucination phrases that Whisper produces when a channel (mic or speaker) is silent.The previous filter only did exact-match checks against a handful of strings (
"you","thank you","you.","thank you.","♪"). The newis_hallucinationmethod:"Thank you!"or"Thank you,"are now caught."the","thanks","bye","goodbye","bye bye","so","oh","uh","hmm","ah","music", and empty strings.starts_withto catch common YouTube-training-data hallucinations like"thank you for watching","thanks for listening","please subscribe","subtitles by", etc.Hallucination list informed by sachaarbonel/whisper-hallucinations dataset of Whisper outputs on noise-only audio.
Review & Testing Checklist for Human
starts_with("thank you"): This will filter any segment beginning with "thank you", including legitimate speech like "Thank you, John, for joining us". Verify this is acceptable given that segments are typically short chunks, or consider tightening the match."so","oh","the"apply to the entire segment text after stripping punctuation. Confirm that real speech segments are unlikely to consist of only these words (they should be fine given VAD chunking, but worth verifying).is_hallucinationfunction has no test coverage. Consider adding tests for edge cases (e.g.,"Thank you, Sarah"should NOT be filtered,"Thank you for watching"should).Notes
Requested by: @ComputelessComputer
Link to Devin run