Sentiment-analysis-with-hyper-parameter-tunning-and-model-evaluation-with-Xai

This project performs Sentiment Analysis on movie reviews using IMDb's large-scale dataset. It applies machine learning and NLP techniques to classify reviews as positive or negative, along with explainability (XAI) tools to interpret model decisions.

📌 Objective

Convert IMDb ratings (1–10 scale) into binary sentiment classes:
- 1–4 → Negative
- 7–10 → Positive
Build models to classify text-based reviews into positive or negative
Address challenges like sarcasm, negation, and mixed sentiments
Use Explainable AI (XAI) to understand and interpret the model’s predictions

📊 Dataset Overview

📁 Source: Kaggle IMDb Dataset
🔢 Size: 149,780 reviews
🎬 Columns:
- Review: Text review
- Rating: Numerical score (1–10)
- Movie: Name of the movie
- Resenhas: Portuguese review (dropped)

🧹 Data Preprocessing

Expanded contractions (e.g., can't → cannot)
Removed URLs, special characters, and irrelevant stopwords
Retained negation words like “not” to preserve sentiment context
Created a Review_clean column for processed text

📈 Exploratory Data Analysis (EDA)

Verified class balance (equal reviews per rating)
Plotted word clouds by sentiment
Analyzed review length, word density, and average word length
N-gram analysis (uni-, bi-, and trigrams) showed key sentiment patterns

🛠️ Feature Engineering

Used CountVectorizer and TF-IDF for text-to-vector conversion
Focused on unigrams, bigrams, and trigrams
Applied chi-squared test for feature selection
Found TF-IDF more effective in separating sentiment

🤖 Model Building & Evaluation

Models Used:

Logistic Regression ✅ (Best performer)
Decision Tree (Overfit)
Random Forest (Some overfitting)

Metrics:

Precision
F1-score
AUC-ROC

Best Model (Logistic Regression):

Precision: 0.8955 (test)
F1 Score: 0.8955
AUC: 0.9606
Fine-tuned using GridSearchCV

🚧 Challenges Faced

Sarcasm: e.g., “Sure, best movie ever... I slept halfway.”
Mixed polarity: Some reviews contained both praise and criticism.
Negation Handling: “Not bad” ≠ “bad”

📌 Future Work

Introduce deep learning models like LSTM or BERT
Implement sarcasm detection modules
Apply domain adaptation for different datasets (e.g., product reviews)

👥 Contributors

Nikhil Gupta

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Phase 1 Pre-processing and eda.ipynb		Phase 1 Pre-processing and eda.ipynb
Phase 2Feature Engineering and Feature Selection.ipynb		Phase 2Feature Engineering and Feature Selection.ipynb
Phase 3a Model Selection for Sentiment Analyzer.ipynb		Phase 3a Model Selection for Sentiment Analyzer.ipynb
Phase 3b Hyper_tune on random forest.ipynb		Phase 3b Hyper_tune on random forest.ipynb
Phase 4 Model Evaluation and XAI for Sentiment Analyzer .ipynb.txt		Phase 4 Model Evaluation and XAI for Sentiment Analyzer .ipynb.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment-analysis-with-hyper-parameter-tunning-and-model-evaluation-with-Xai

📌 Objective

📊 Dataset Overview

🧹 Data Preprocessing

📈 Exploratory Data Analysis (EDA)

🛠️ Feature Engineering

🤖 Model Building & Evaluation

Models Used:

Metrics:

Best Model (Logistic Regression):

🚧 Challenges Faced

📌 Future Work

👥 Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sentiment-analysis-with-hyper-parameter-tunning-and-model-evaluation-with-Xai

📌 Objective

📊 Dataset Overview

🧹 Data Preprocessing

📈 Exploratory Data Analysis (EDA)

🛠️ Feature Engineering

🤖 Model Building & Evaluation

Models Used:

Metrics:

Best Model (Logistic Regression):

🚧 Challenges Faced

📌 Future Work

👥 Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages