Skip to content

Commit 141342d

Browse files
authored
Create README.md
1 parent d7f3013 commit 141342d

File tree

1 file changed

+157
-0
lines changed

1 file changed

+157
-0
lines changed

README.md

Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
# 💬 Twitter Sentiment Analysis App
2+
3+
A complete **end-to-end NLP project** that classifies tweets as **Positive** or **Negative** using the [Sentiment140 Dataset](https://www.kaggle.com/datasets/kazanova/sentiment140).
4+
This project combines **Natural Language Processing**, **Machine Learning**, and **MLOps** — from data cleaning to model deployment.
5+
6+
---
7+
8+
## 🚀 Features
9+
10+
✅ Text Preprocessing (cleaning, tokenization, lemmatization)
11+
✅ TF-IDF Vectorization for feature extraction
12+
✅ Multiple ML models with cross-validation & hyperparameter tuning
13+
✅ Streamlit web app for real-time sentiment prediction
14+
✅ Model saving & reusability with `joblib`
15+
✅ Fully Dockerized for consistent deployment
16+
✅ GitHub Actions CI Workflow for automated testing & build
17+
✅ Kubernetes/Manifest ready for cloud deployment *(optional)*
18+
19+
---
20+
21+
## 🧩 Project Structure
22+
```
23+
├── .github
24+
└── workflows
25+
│ └── sentimentlsis.yml
26+
├── Docker-compose.yml
27+
├── Dockerfile
28+
├── dashboard.py
29+
├── manifest.yml
30+
├── requirements.txt
31+
└── src
32+
├── preprep.ipynb
33+
├── sentiment_model.pkl
34+
└── tfidf_vectorizer.pkl
35+
```
36+
37+
---
38+
39+
## 🧠 Tech Stack
40+
41+
| Category | Tools / Libraries |
42+
|-----------|------------------|
43+
| **Language** | Python |
44+
| **Data Handling** | Pandas, NumPy |
45+
| **NLP** | NLTK, Regex, Emoji |
46+
| **Feature Extraction** | TF-IDF (sklearn) |
47+
| **Modeling** | Logistic Regression, SVM, Random Forest |
48+
| **App Framework** | Streamlit |
49+
| **Model Persistence** | Joblib |
50+
| **Containerization** | Docker |
51+
| **Automation** | GitHub Actions |
52+
| **Deployment** | Streamlit Cloud / Render / Kubernetes |
53+
54+
---
55+
56+
## 🧹 Data Preprocessing
57+
58+
- Lowercasing text
59+
- Removing URLs, mentions, hashtags, and punctuation
60+
- Tokenization using **nltk**
61+
- Stopword removal
62+
- Lemmatization (`WordNetLemmatizer`)
63+
- Emoji handling (`emoji.demojize`)
64+
65+
This ensures the model sees only meaningful words.
66+
67+
---
68+
69+
## 🧮 Feature Engineering — TF-IDF
70+
71+
**Why TF-IDF?**
72+
It represents each tweet as a numerical vector based on **word importance**.
73+
74+
\[
75+
TFIDF(w) = TF(w) \times \log\left(\frac{N}{df(w)}\right)
76+
\]
77+
78+
Used `TfidfVectorizer(max_features=5000, ngram_range=(1,2))` for best balance between accuracy and speed.
79+
80+
---
81+
82+
## 🤖 Model Training
83+
84+
| Model | Description | Accuracy (CV) |
85+
|--------|--------------|---------------|
86+
| Logistic Regression | Simple & effective for text data | ✅ Best |
87+
| SVM | Handles high-dimensional data | Good |
88+
| Random Forest | Captures non-linear patterns | Moderate |
89+
90+
Performed:
91+
- **5-Fold Cross-Validation**
92+
- **GridSearchCV** for hyperparameter tuning
93+
- **Evaluation Metrics:** Accuracy, Precision, Recall, F1-score
94+
95+
---
96+
97+
## 💾 Model Saving
98+
99+
Used `joblib` to persist model and TF-IDF vectorizer:
100+
```python
101+
joblib.dump(model, 'sentiment_model.pkl')
102+
joblib.dump(tfidf, 'tfidf_vectorizer.pkl')
103+
```
104+
## Streamlit Web App
105+
106+
<h3>Simple, interactive web app for real-time predictions.</h3>
107+
<h3>Run locally:</h3>
108+
``` streamlit run app.py ```
109+
110+
111+
<ol> <h3>App Flow:</h3>
112+
<li>Input tweet text 📝</li>
113+
<li>Clean & preprocess</li>
114+
<li>Convert text → TF-IDF vector</li>
115+
<li>Predict sentiment using model</li>
116+
<li>Display result (😊 Positive / 😠 Negative)</li> </ol>
117+
118+
---
119+
## 🐳 Docker Integration
120+
<pre>
121+
docker build -t sentiment-app .
122+
docker run -p 8501:8501 sentiment-app
123+
</pre>
124+
---
125+
<ul>
126+
## 📊 Results
127+
<li>Logistic Regression achieved ~85% accuracy on validation data</li>
128+
<li>Clean UI for sentiment prediction</li>
129+
<li>Fully automated CI/CD pipeline with Docker integration</li>
130+
</ul>
131+
---
132+
<ul>
133+
## Key Takeaways
134+
<li>Built a complete ML workflow: from preprocessing → training → deployment</li>
135+
<li>Learned to ensure preprocessing consistency between training & inference</li>
136+
<li>Containerized the app for reproducibility</li>
137+
<li>Automated CI/CD with GitHub Actions</li>
138+
<li>Gained experience with MLOps fundamentals</li>
139+
</ul>
140+
---
141+
## Setup Instructions
142+
<pre>
143+
## Clone repo
144+
git clone https://github.com/<your-username>/sentiment-analysis.git
145+
cd sentiment-analysis
146+
# Install dependencies
147+
pip install -r requirements.txt
148+
# Run Streamlit app
149+
streamlit run app.py
150+
</pre>
151+
<h3>or run in Docker:</h3>
152+
<pre>docker-compose up --build</pre>
153+
---
154+
## Author
155+
<h2>Sameer Chauhan</h2>
156+
<h3>MLOps & Machine Learning Engineer</h3>
157+
<h3>💼 Passionate about bridging ML with real-world deployment through Docker, CI/CD, and automation.</h3>

0 commit comments

Comments
 (0)