Spam SMS Detection using Machine Learning

This project classifies SMS messages as Spam or Not Spam using Naive Bayes algorithms.

Dataset

SMS Spam Collection dataset
Stored in data/spam.csv
Encoded in UTF-8

📊 Dataset Update & Model Improvement

Originally, this project used a standard Kaggle SMS spam dataset. While it performed well on promotional spam, recall was poor on modern scam patterns such as:

account security alerts
fake delivery messages
job and investment scams
invoice and refund phishing

To address this, a custom dataset was generated containing modern spam and scam patterns.

Models Used

Gaussian Naive Bayes
Multinomial Naive Bayes (selected for deployment)
Bernoulli Naive Bayes

Tech Stack

Python
Pandas
NumPy
Scikit-learn
Joblib

Project Structure

spam-detection/
├── spam_sms_detection.ipynb
├── data/
│   └── spam.csv
├── models/
│   ├── mnb.pkl
│   └── vectorizer.pkl
└── requirements.txt

How to Run

Install dependencies
pip install -r requirements.txt
Open the notebook
spam_sms_detection.ipynb
Run all cells

NLTK Setup

This project uses NLTK for tokenization. If you face errors related to tokenizers, run:

import nltk
nltk.download('punkt')


### Results
- Recall improved from ~74% → ~99.6%
- Precision remains ~99%
- Model now generalizes better to real-world scam messages

### Note on Limitations
This is a text-only model. Certain messages such as neutral security
alerts or charity requests may still be ambiguous without metadata
(sender, headers, links).

## Future Improvements
- Web app deployment
- API support

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
models		models
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
spam_detection_upgrade.ipynb		spam_detection_upgrade.ipynb
spam_sms_detection.ipynb		spam_sms_detection.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spam SMS Detection using Machine Learning

Dataset

📊 Dataset Update & Model Improvement

Models Used

Tech Stack

Project Structure

How to Run

NLTK Setup

About

Uh oh!

Releases

Packages

Languages

License

Somsubhra-Nandi/spam-detection

Folders and files

Latest commit

History

Repository files navigation

Spam SMS Detection using Machine Learning

Dataset

📊 Dataset Update & Model Improvement

Models Used

Tech Stack

Project Structure

How to Run

NLTK Setup

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages