NLP Sentiment Analysis — Amazon Reviews

Multinomial Naive Bayes + TF-IDF

Overview

This project builds a baseline sentiment classification system to automatically label Amazon product reviews as negative (0) or positive (1) .

The primary objective is business-driven : to help teams (Customer Service, Product, Operations) identify negative reviews faster , reduce manual triage time, and prioritize customer issues more effectively.

The solution uses TF-IDF features and a Multinomial Naive Bayes model, chosen for its speed, interpretability, and ease of deployment.

Key Highlights

Dataset : Amazon Reviews (Kaggle)
Task : Binary sentiment classification
Negative: ratings 1–2
Positive: ratings 4–5
Neutral (rating 3) is excluded
Text Representation : TF-IDF (unigram + bigram)
Model : Multinomial Naive Bayes
Evaluation Focus : Performance on negative reviews
Threshold-based decisioning for operational flexibility

Model Performance (Test Set)

Using the best configuration (alpha = 2.0) and default threshold thr_pos = 0.50:

Accuracy : ~0.86
Negative class
- Precision_neg ≈ 0.88
- Recall_neg ≈ 0.85
- F1_neg ≈ 0.86
ROC AUC : ≈ 0.94

Threshold tuning allows the model to trade off between:

catching more customer complaints ( higher recall ), or
keeping review flags cleaner ( higher precision ).

Business Impact (Illustrative Simulation)

Assuming 20 seconds to manually review one customer comment:

At thr_pos = 0.55 :
- ~1,100 reviews automatically flagged as negative
- Recall_neg ≈ 0.91
- Estimated ~6 hours of manual work saved per 2,000 reviews

This demonstrates that even a simple model can deliver immediate operational value when paired with threshold-based decisioning.

Data Usage Clarification (EDA vs Modeling)

The same source dataset (Reviews.csv) is used in two distinct ways:

Exploratory Data Analysis (EDA) Conducted on the full dataset to understand:
- rating distribution,
- review volume trends,
- real-world imbalance toward positive reviews.
Modeling & Evaluation Performed on a balanced subset sampled from the same dataset to:
- ensure fair evaluation between negative and positive classes,
- speed up experimentation,
- build a clean and interpretable baseline.

This separation is intentional and documented in the notebook.

Project Structure

nlp_nb_amazon_reviews.ipynb — end-to-end analysis and modeling notebook
dataset/ — raw and sampled review data
artifacts_nb/ — saved TF-IDF vectorizer and trained model
README.md — project summary and usage notes

How to Run

Clone the repository

git clone https://github.com/your-username/nlp-sentiment-analysis.git
cd nlp-sentiment-analysis

Install dependencies (Python ≥ 3.9 recommended) :
- pandas
- numpy
- scikit-learn
- matplotlib
- nltk
- wordcloud

Open and run the notebook:

jupyter notebook nlp_nb_amazon_reviews.ipynb

Limitations

The model struggles with negation and sarcasm (e.g., “not bad at all” , “great… if you like disappointment” ).
Sensitive to language drift and new product terminology.
Interpretability is word-based , not instance-level (e.g., SHAP/LIME).

Future Improvements

Enhanced preprocessing (negation handling, trigram features)
Model comparison with Logistic Regression and Linear SVM
Monitoring dashboard for Precision/Recall of negative reviews
Evaluation of Transformer-based models (DistilBERT / BERT)
Integration with ticketing or CRM systems for automated triage

Acknowledgements

This project was developed as part of the Portfolio Build at Purwadhika Digital Technology School — Data Science Bootcamp .

Special thanks to my teammate Ardinata Tambun for collaborating on this project during our first NLP modeling experience.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Sentiment Analysis — Amazon Reviews

Overview

Key Highlights

Model Performance (Test Set)

Business Impact (Illustrative Simulation)

Data Usage Clarification (EDA vs Modeling)

Project Structure

How to Run

Limitations

Future Improvements

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
artifacts_nb		artifacts_nb
dataset		dataset
.gitignore		.gitignore
README.md		README.md
nlp_nb_amazon_reviews.ipynb		nlp_nb_amazon_reviews.ipynb

Folders and files

Latest commit

History

Repository files navigation

NLP Sentiment Analysis — Amazon Reviews

Overview

Key Highlights

Model Performance (Test Set)

Business Impact (Illustrative Simulation)

Data Usage Clarification (EDA vs Modeling)

Project Structure

How to Run

Limitations

Future Improvements

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages