A study task for predicting toxicity of Twitter comments (in Russian). Includes tokenization, lemmatization, word cloud, bag of words, TF-IDF, fastText. NLP libraries for Python used: re, pymorphy2, transliterate, wordcloud, nltk, razdel, fastText.
Comments in the test data are given without labels, results were submitted to the closed competition at Kaggle. The highest achieved score (accuracy) was 0.89286 (the 4th place at the leaderboard out of 20).