Multinomial Naive Bayes + TF-IDF
This project builds a baseline sentiment classification system to automatically label Amazon product reviews as negative (0) or positive (1) .
The primary objective is business-driven : to help teams (Customer Service, Product, Operations) identify negative reviews faster , reduce manual triage time, and prioritize customer issues more effectively.
The solution uses TF-IDF features and a Multinomial Naive Bayes model, chosen for its speed, interpretability, and ease of deployment.
- Dataset : Amazon Reviews (Kaggle)
- Task : Binary sentiment classification
- Negative: ratings 1–2
- Positive: ratings 4–5
- Neutral (rating 3) is excluded
- Text Representation : TF-IDF (unigram + bigram)
- Model : Multinomial Naive Bayes
- Evaluation Focus : Performance on negative reviews
- Threshold-based decisioning for operational flexibility
Using the best configuration (alpha = 2.0) and default threshold thr_pos = 0.50:
- Accuracy : ~0.86
- Negative class
- Precision_neg ≈ 0.88
- Recall_neg ≈ 0.85
- F1_neg ≈ 0.86
- ROC AUC : ≈ 0.94
Threshold tuning allows the model to trade off between:
- catching more customer complaints ( higher recall ), or
- keeping review flags cleaner ( higher precision ).
Assuming 20 seconds to manually review one customer comment:
- At
thr_pos = 0.55:- ~1,100 reviews automatically flagged as negative
- Recall_neg ≈ 0.91
- Estimated ~6 hours of manual work saved per 2,000 reviews
This demonstrates that even a simple model can deliver immediate operational value when paired with threshold-based decisioning.
The same source dataset (Reviews.csv) is used in two distinct ways:
- Exploratory Data Analysis (EDA)
Conducted on the full dataset to understand:
- rating distribution,
- review volume trends,
- real-world imbalance toward positive reviews.
- Modeling & Evaluation
Performed on a balanced subset sampled from the same dataset to:
- ensure fair evaluation between negative and positive classes,
- speed up experimentation,
- build a clean and interpretable baseline.
This separation is intentional and documented in the notebook.
nlp_nb_amazon_reviews.ipynb— end-to-end analysis and modeling notebookdataset/— raw and sampled review dataartifacts_nb/— saved TF-IDF vectorizer and trained modelREADME.md— project summary and usage notes
- Clone the repository
git clone https://github.com/your-username/nlp-sentiment-analysis.git cd nlp-sentiment-analysis - Install dependencies
(Python ≥ 3.9 recommended) :
- pandas
- numpy
- scikit-learn
- matplotlib
- nltk
- wordcloud
- Open and run the notebook:
jupyter notebook nlp_nb_amazon_reviews.ipynb
- The model struggles with negation and sarcasm (e.g., “not bad at all” , “great… if you like disappointment” ).
- Sensitive to language drift and new product terminology.
- Interpretability is word-based , not instance-level (e.g., SHAP/LIME).
- Enhanced preprocessing (negation handling, trigram features)
- Model comparison with Logistic Regression and Linear SVM
- Monitoring dashboard for Precision/Recall of negative reviews
- Evaluation of Transformer-based models (DistilBERT / BERT)
- Integration with ticketing or CRM systems for automated triage
This project was developed as part of the Portfolio Build at Purwadhika Digital Technology School — Data Science Bootcamp .
Special thanks to my teammate Ardinata Tambun for collaborating on this project during our first NLP modeling experience.