This project addresses the hate speech through building a multimodal classifier on Facebook hateful memes dataset
- Pre-processing notebook
- Data sourcing & EDA notebook
- Model building & Evalulation notebooks
- Google API was used to extract the text from Images and stored in a row
- Various pre-processing techniques were used to solve OCR errors like word segmentation, Internet slang contractions, spelling correction using language models.
- Conducted exploratory data analysis to understand the data(Used Topic modelling, word frequency plots, Named Entity Recognition using spacy language model, Bigrams & Trigrams to understand the conext of a meme)
- Used pre-trained fasttext model with urban dictionary embeddings to get better representations of internet slang words
- Built four models to check which type of model has outperformed others and can be used to improve the current algorithms
- Insights from our current work and future work