Skip to content

Commit fd3f025

Browse files
committed
💡[Feature]: Add Sentiment Analysis of Movie Reviews Project
Fixes #1510
1 parent ead412f commit fd3f025

File tree

3 files changed

+207
-0
lines changed

3 files changed

+207
-0
lines changed
Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
# **Sentiment Analysis of Movie Reviews**
2+
3+
This project implements **sentiment analysis** using **NLP (Natural Language Processing)** techniques to classify movie reviews as **positive, negative, or neutral**. The goal is to process unstructured text data, extract meaningful features, and train classification models to predict the sentiment of each review.
4+
5+
---
6+
7+
## **Table of Contents**
8+
- [Tech Stack](#tech-stack)
9+
- [Features](#features)
10+
- [Benefits](#benefits)
11+
- [Project Structure](#project-structure)
12+
- [Installation](#installation)
13+
- [Usage](#usage)
14+
- [Evaluation Metrics](#evaluation-metrics)
15+
- [Contributing](#contributing)
16+
- [License](#license)
17+
- [Contact](#contact)
18+
19+
---
20+
21+
## **Tech Stack**
22+
23+
The following technologies and libraries are used in this project:
24+
25+
- **Python 3.8+**: Core programming language for building the project.
26+
- **pandas**: For data manipulation and analysis.
27+
- **scikit-learn**: Machine learning library used for feature extraction and building classifiers.
28+
- **nltk (Natural Language Toolkit)**: For text preprocessing (tokenization, stopword removal, etc.).
29+
- **TF-IDF Vectorizer**: Converts text data into numerical features.
30+
- **Naive Bayes and SVM**: Machine learning algorithms used for classification.
31+
- **GitHub Codespaces**: Cloud-based development environment for coding and collaboration.
32+
33+
---
34+
35+
## **Features**
36+
37+
- **Text Preprocessing:**
38+
- Removes punctuation, converts text to lowercase, tokenizes, and removes stopwords.
39+
- **Feature Extraction:**
40+
- Uses **TF-IDF vectorization** to convert reviews into numerical form for model input.
41+
- **Classification Models:**
42+
- Implements **Multinomial Naive Bayes** and **Support Vector Machine (SVM)** classifiers.
43+
- **Evaluation Metrics:**
44+
- Outputs **accuracy score** and a **classification report** with precision, recall, and F1-score.
45+
46+
---
47+
48+
## **Benefits**
49+
50+
1. **Scalable and Automated Analysis:**
51+
Automates the sentiment analysis of large volumes of reviews, eliminating the need for manual reading.
52+
2. **Improved Decision-Making:**
53+
Helps platforms like Netflix or Amazon identify audience reactions and improve content recommendations.
54+
3. **Proactive Issue Detection:**
55+
Identifies negative sentiment early to help brands respond to user concerns in real-time.
56+
4. **Versatile Solution:**
57+
Can be extended to other text-based sentiment use cases, such as product reviews or social media posts.
58+
5. **Brand Reputation Management:**
59+
Helps brands track public opinion trends and address customer feedback proactively.
60+
61+
---
62+
63+
## **Project Structure**
64+
65+
```plaintext
66+
sentiment-analysis-movie-reviews/
67+
68+
├── sentiment_analysis.py # Main script with model code.
69+
├── requirements.txt # List of dependencies.
70+
├── README.md # Project documentation.
71+
```
72+
73+
---
74+
75+
## **Installation**
76+
77+
1. **Clone the Repository:**
78+
```bash
79+
git clone https://github.com/<your-username>/machine-learning-repos.git
80+
cd machine-learning-repos/sentiment-analysis-movie-reviews
81+
```
82+
83+
2. **Set Up Virtual Environment (Optional):**
84+
```bash
85+
python -m venv venv
86+
source venv/bin/activate # For Linux/macOS
87+
.\venv\Scripts\activate # For Windows
88+
```
89+
90+
3. **Install Dependencies:**
91+
```bash
92+
pip install -r requirements.txt
93+
```
94+
95+
---
96+
97+
## **Usage**
98+
99+
1. **Prepare the Dataset:**
100+
Optionally, you can store additional datasets inside the `data/` folder.
101+
102+
2. **Run the Script:**
103+
```bash
104+
python sentiment_analysis.py
105+
```
106+
107+
3. **Expected Output:**
108+
The console will display:
109+
- A **classification report** (precision, recall, and F1-score) for each class.
110+
- The **overall accuracy** of the classifier.
111+
112+
---
113+
114+
## **Evaluation Metrics**
115+
116+
- **Accuracy:** Measures the percentage of correct predictions.
117+
- **Precision:** Measures the correctness of positive predictions.
118+
- **Recall:** Measures how well the model identifies positive instances.
119+
- **F1-Score:** Harmonic mean of precision and recall for better evaluation in imbalanced datasets.
120+
121+
---
122+
123+
## **Contributing**
124+
125+
We welcome contributions to this project! To contribute:
126+
127+
1. **Fork** this repository.
128+
2. Create a **new branch** for your feature or bug fix:
129+
```bash
130+
git checkout -b feature-branch
131+
```
132+
3. Commit your changes:
133+
```bash
134+
git commit -m "Add feature/bug fix"
135+
```
136+
4. **Push** the changes to your forked repository:
137+
```bash
138+
git push origin feature-branch
139+
```
140+
5. Open a **Pull Request** to the original repository and provide a detailed description of your changes.
141+
142+
---
143+
144+
## **License**
145+
146+
This project is licensed under the **MIT License**. See the [LICENSE](../LICENSE) file for more details.
147+
148+
---
149+
150+
## **Contact**
151+
152+
- **Author:** [Sanchit Chauhan]
153+
- **Email:** <[email protected]>
154+
- **GitHub:** [Your GitHub Profile](https://github.com/sanchitc05)
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
pandas
2+
scikit-learn
3+
nltk
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
import pandas as pd
2+
from sklearn.model_selection import train_test_split
3+
from sklearn.feature_extraction.text import TfidfVectorizer
4+
from sklearn.naive_bayes import MultinomialNB
5+
from sklearn.svm import SVC
6+
from sklearn.metrics import classification_report, accuracy_score
7+
import nltk
8+
from nltk.corpus import stopwords
9+
from nltk.tokenize import word_tokenize
10+
import re
11+
12+
nltk.download('punkt')
13+
nltk.download('stopwords')
14+
15+
stop_words = set(stopwords.words('english'))
16+
17+
def preprocess_text(text):
18+
text = re.sub(r'[^a-zA-Z\s]', '', text)
19+
tokens = word_tokenize(text.lower())
20+
filtered_tokens = [word for word in tokens if word not in stop_words]
21+
return ' '.join(filtered_tokens)
22+
23+
data = {
24+
'review': [
25+
"The movie was absolutely fantastic!",
26+
"I hated the plot. It was so boring.",
27+
"It was an average movie, not great but not terrible either.",
28+
"What a waste of time! Completely disappointed.",
29+
"Loved the acting and cinematography!"
30+
],
31+
'sentiment': ['positive', 'negative', 'neutral', 'negative', 'positive']
32+
}
33+
df = pd.DataFrame(data)
34+
df['cleaned_review'] = df['review'].apply(preprocess_text)
35+
36+
X = df['cleaned_review']
37+
y = df['sentiment']
38+
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
39+
40+
tfidf = TfidfVectorizer(max_features=5000)
41+
X_train_tfidf = tfidf.fit_transform(X_train)
42+
X_test_tfidf = tfidf.transform(X_test)
43+
44+
nb_model = MultinomialNB()
45+
nb_model.fit(X_train_tfidf, y_train)
46+
nb_predictions = nb_model.predict(X_test_tfidf)
47+
48+
print("Naive Bayes Classifier Report:")
49+
print(classification_report(y_test, nb_predictions))
50+
print("Accuracy:", accuracy_score(y_test, nb_predictions))

0 commit comments

Comments
 (0)