NLP - Predicting toxicity for Twitter comments

A study task for predicting toxicity of Twitter comments (in Russian). Includes tokenization, lemmatization, word cloud, bag of words, TF-IDF, fastText. NLP libraries for Python used: re, pymorphy2, transliterate, wordcloud, nltk, razdel, fastText.

Comments in the test data are given without labels, results were submitted to the closed competition at Kaggle. The highest achieved score (accuracy) was 0.89286 (the 4th place at the leaderboard out of 20).

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
HW1_Toxic_comments_Mironenko_part1.ipynb		HW1_Toxic_comments_Mironenko_part1.ipynb
HW1_Toxic_comments_Mironenko_part2.ipynb		HW1_Toxic_comments_Mironenko_part2.ipynb
README.md		README.md
clean_text.py		clean_text.py
stopwords-ru.txt		stopwords-ru.txt
test_data.csv		test_data.csv
train_data.csv		train_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP - Predicting toxicity for Twitter comments

About

Uh oh!

Releases

Packages

Languages

ejeej/NLP_Toxic_Comments

Folders and files

Latest commit

History

Repository files navigation

NLP - Predicting toxicity for Twitter comments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages