An end-to-end Machine Learning web application that classifies emails as Spam or Legitimate (Ham) using a Support Vector Machine (SVM) model trained on real-world email data.
Deployed via Flask, this project demonstrates the complete ML lifecycle — from text preprocessing and feature engineering to web-based deployment — delivering an interactive, explainable, and production-ready application.
- ✅ Achieved over 98% accuracy in classifying spam and legitimate emails.
- ✅ Deployed an interactive Flask web app enabling real-time email spam detection.
- ✅ Implemented TF-IDF feature extraction and SVM optimization for high-precision classification.
- ✅ Designed an explainable interface showing influential words and weights driving predictions.
- ✅ Enhanced user experience with a “Get Detail” feature and search filter to explore model insights.
This project reflects skills in data preprocessing, feature engineering, model evaluation, and Flask deployment, aligning directly with Data Science and Machine Learning engineering roles.
- Programming Language: Python
- Frameworks: Flask, Scikit-learn
- ML Algorithm: Support Vector Machine (SVM)
- Feature Extraction: TF-IDF Vectorizer
- Libraries: NumPy, Pandas, Joblib
- Frontend: HTML, CSS, JavaScript
-
Data Preprocessing
- Cleaned and normalized email text (lowercasing, punctuation & digit removal).
- Converted textual data into numerical vectors using TF-IDF.
-
Model Building
- Trained an SVM classifier for binary text classification.
- Tuned parameters using GridSearchCV for optimal accuracy.
-
Model Evaluation
- Evaluated using confusion matrix, precision, recall, F1-score, and accuracy metrics.
- Ensured balanced performance across both spam and ham categories.
-
Deployment
- Integrated the trained model with a Flask web interface.
- Enabled real-time predictions and model interpretability features.
This solution can be extended to:
- 📬 Enterprise email security systems
- 🔎 Phishing or fraud detection platforms
- 💬 Chat moderation and text classification tools
Data Importing → Load dataset (spam.csv) and clean unnecessary columns
Preprocessing → Apply label encoding, remove duplicates, and perform basic text cleaning (lowercase, punctuation removal)
EDA → Visualize spam vs ham distribution and analyze message length patterns
Vectorization → Convert text data into numerical features using CountVectorizer
TF-IDF Transformation → Reweight words based on their importance and frequency
SVM Model → Train a Support Vector Machine classifier for spam detection
Evaluation → Measure model performance using accuracy, confusion matrix, and classification report
Prediction → Test the model on new email examples through an interactive Flask web app
Email_Classifier_SVM/
│
├── app.py
├── model.pkl
├── vector.pkl
├── tf.pkl
├──index.html
├── style.css
└── requirements.txt
git clone https://github.com/<your-username>/Email-Classifier-using-SVM.git
cd Email-Classifier-using-SVM
python -m venv venv
venv\Scripts\activate # Windows
# OR
source venv/bin/activate # macOS/Linux
pip install -r requirements.txt
python app.pyThen open your browser at 👉 http://127.0.0.1:5000
| 📧 Input Email | 🧠 Predicted Output |
|---|---|
| "Congratulations! You’ve won a free iPhone. Click here to claim now!" | 🚫 Spam Email |
| "Team meeting scheduled at 10 AM tomorrow." | ✅ Legitimate (Ham) |
| "Get 50% off on all products! Limited time offer." | 🚫 Spam Email |
| "Your invoice for the last month is attached below." | ✅ Legitimate (Ham) |
| "Win cash rewards by completing this short survey!" | 🚫 Spam Email |
- End-to-End ML Pipeline: Data preprocessing → model training → deployment
- Text Analytics:Text Preprocessing & Feature extraction via CountVectorizer & TF-IDF
- Model Optimization: Hyperparameter tuning with GridSearchCV
- Web Deployment: Flask integration and UI development
- Explainable AI (XAI): Display of top influential words for transparency
- Full-Stack ML Project Execution: From dataset to live application
Kazi Umar
Linkedin profile: https://www.linkedin.com/in/umar-kazi18
💼 Data Analyst | ML Engineer | Data Science & AI Enthusiast | Power BI | Python | SQL