Telegram bot, who detects "abusive language" in voice and video notes in chat rooms #638

dokzlo13 · 2022-12-05T16:33:00Z

dokzlo13
Dec 5, 2022

I've made a telegram bot, which can detect "abusive language" in voice and video notes in telegram chat rooms.

You can check it here: https://github.com/dokzlo13/blya_bot/

This bot has 3 main parts:

Speech recognition engine (of course)
Dictionary generator with custom DSL and morphological expansion (pymorphy2), which used to generate all variations of "abusive language" words
Pattern Searching based on Aho-Corasick Algorithm (ahocorapy), which used to find all "abusive language" words in transcribed text.

Previously, it uses only Vosk Speech Recognition Toolkit, but after I met Whisper, i've also add it as second (now main ❤️) recognition core.

You can specify dictionary file with abusive words (example: russian), which will be expanded by adding multiple morphological variants of each word.
Then, dictionary will be converted into Aho-Corasick automata, which will be used for pattern matching.

Pattern matching will be applied to text, transcribed by recognition engine from video or audio notes.

How to use it?

Firstly, you need to run own copy of blya-bot. Unfortunately no public instance available now. You can follow instructions in repo to run own copy of app.

After your bot come to life, you can send voice or video note to this bot, and it will do following steps:

Transcribe voice into text
Search all "abusive language" words in transcribed text
Generate summary of used "abusive language" words
Bot reply with "abusive language" words summary

You can also add your bot to the group, and it will automatically respond to any voice and video notes, which have "abusive language" with summary.

Limitations

Started as simple joke, this project was written especially for russian language.

Although speech recognition model supports many languages, bot source code has some internal limitations. At current state, only russian language tested.

If anyone interested in this project, i will try my best to add other languages support. Any contributions are welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Telegram bot, who detects "abusive language" in voice and video notes in chat rooms #638

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Telegram bot, who detects "abusive language" in voice and video notes in chat rooms #638

Uh oh!

dokzlo13 Dec 5, 2022

How to use it?

Limitations

Replies: 0 comments

dokzlo13
Dec 5, 2022