This project demonstrates a machine learning pipeline for sentiment analysis using real-world Twitter data related to airline customer feedback. It is designed as a professional AI portfolio piece to showcase proficiency in natural language processing (NLP), data preprocessing, model training, and evaluation.
The ultimate goal is to develop a scalable sentiment analysis tool that can be extended to platforms such as WhatsApp messages, live chat systems, or customer support tools.
- Build a production-ready sentiment classifier using classical ML algorithms.
- Demonstrate applied knowledge of NLP and text preprocessing techniques.
- Use clean, real-world data from Kaggle to train and evaluate the model.
- Offer a visual explanation of model performance using classification metrics and confusion matrices.
- Prepare for future expansion with deep learning models or integration into web interfaces.
- Source: Kaggle - Airline Tweets Dataset
- Number of tweets: ~15,000
- Sentiment labels:
positive
,neutral
,negative
-
Data Loading & Cleaning
Load CSV data, remove noise, lowercase text, strip special characters. -
Text Vectorization
ApplyCountVectorizer
to convert text into numerical features. -
Model Training
Train aMultinomial Naive Bayes
model usingscikit-learn
. -
Model Evaluation
Generate a classification report and confusion matrix to visualize model accuracy and class-wise performance.
- Overall Accuracy: 78%
- Strong performance on the
negative
class (Precision: 0.78, Recall: 0.96) - Moderate performance on the
positive
class (Precision: 0.82, Recall: 0.55) - Lower performance on the
neutral
class (Precision: 0.72, Recall: 0.35) – affected by class imbalance
Classification Report Sample:
precision recall f1-score support
negative 0.78 0.96 0.86 1889
neutral 0.72 0.35 0.47 580
positive 0.82 0.55 0.66 459
accuracy 0.78 2928
macro avg 0.77 0.62 0.66 2928
weighted avg 0.77 0.78 0.75 2928
model.predict(vectorizer.transform(["I love this airline!"]))
# Output: ['positive']
- Replace
CountVectorizer
withTF-IDF
or deep embeddings likeWord2Vec
orBERT
- Try stronger models such as Logistic
Regression
,SVM
, or deep learning classifiers - Address class imbalance using
SMOTE
or class weighting - Deploy the model as a simple web app using
Streamlit
orGradio
- Extend to multilingual sentiment analysis (Arabic/English)
- Create a real-time API for processing WhatsApp or chat messages
- Install requirements
pip install -r requirements.txt
- Run the notebook
Open
Sentiment_Analysis_Project.ipynb
in Jupyter Notebook or VS Code.
📁 Sentiment_Analysis
│
├── Sentiment_Analysis_Project.ipynb # Main notebook
├── README.md # Project documentation
├── requirements.txt # Python dependencies
│
└── 📁 Data_Explorer
├── Tweets.csv # Raw dataset
└── database.sqlite # SQLite version of dataset
AI & Robotics Enthusiast | Python Developer
- 💼 LinkedIn | 💻 GitHub
- 📧 Email: [email protected]
This project is licensed under the Apache 2.0 License.