Telegram bot, who detects "abusive language" in voice and video notes in chat rooms #638
dokzlo13
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I've made a telegram bot, which can detect "abusive language" in voice and video notes in telegram chat rooms.
You can check it here: https://github.com/dokzlo13/blya_bot/
This bot has 3 main parts:
Previously, it uses only Vosk Speech Recognition Toolkit, but after I met Whisper, i've also add it as second (now main ❤️) recognition core.
You can specify dictionary file with abusive words (example: russian), which will be expanded by adding multiple morphological variants of each word.
Then, dictionary will be converted into Aho-Corasick automata, which will be used for pattern matching.
Pattern matching will be applied to text, transcribed by recognition engine from video or audio notes.
How to use it?
Firstly, you need to run own copy of
blya-bot
. Unfortunately no public instance available now. You can follow instructions in repo to run own copy of app.After your bot come to life, you can send voice or video note to this bot, and it will do following steps:
You can also add your bot to the group, and it will automatically respond to any voice and video notes, which have "abusive language" with summary.
Limitations
Started as simple joke, this project was written especially for russian language.
Although speech recognition model supports many languages, bot source code has some internal limitations. At current state, only russian language tested.
If anyone interested in this project, i will try my best to add other languages support. Any contributions are welcome.
Beta Was this translation helpful? Give feedback.
All reactions