A Machine Learning-powered Email and SMS spam classification app built using Python and Streamlit. π
This project classifies messages as either "Spam" or "Not Spam" based on their content. It utilizes Natural Language Processing (NLP) techniques to preprocess text before making predictions using a pre-trained model.
The project consists of the following key components:
- π¨ Streamlit Application β A user-friendly interface for entering and classifying messages.
- π Text Preprocessing β Cleans and processes input text using NLP techniques.
- π€ Machine Learning Model β A trained model that predicts whether a message is spam or not.
- π Vectorizer β Converts text data into a numerical format (Bag of Words) for processing.
- π Input: The user enters a message (SMS or Email) into the provided text box in the Streamlit app.
- π Preprocessing: The text undergoes:
- Lowercasing
- Tokenization
- Removal of non-alphanumeric characters & stopwords
- Stemming using the Porter Stemmer
- π’ Vectorization: The cleaned text is transformed into a numerical format using a pre-trained CountVectorizer.
- π€ Prediction: The vectorized text is fed into the model, which classifies it as either:
- π© Spam β The message is likely spam.
- β Not Spam β The message is not spam.
- π Output: The result is displayed on the Streamlit app.
- Run the Streamlit app:
streamlit run app.py
- Enter a message in the provided text area.
- Click the "Predict" button to check if the message is spam or not.
β
"Hey, are we still on for dinner tonight?"
π© "Congratulations! You've won a free ticket to the Bahamas. Call now!"
To install required dependencies, run:
pip install streamlit scikit-learn nltkAdditionally, NLTK data packages punkt and stopwords need to be downloaded.
app.pyβ The main script that runs the Streamlit app.model.pklβ The pre-trained machine learning model for spam classification.vectorizer.pklβ The pre-trained CountVectorizer for text transformation.
The model was trained on a labeled dataset of SMS messages using common text classification techniques, including:
- Text Preprocessing β Cleaning, tokenization, and stemming.
- Vectorization β Converting text into a numerical format using Bag of Words.
- Model Selection β A machine learning classifier was trained and optimized for accurate predictions.
This project showcases the power of NLP and machine learning in identifying spam messages. The Streamlit app provides a simple interface for testing the classifier with real-world examples.
π‘ Feel free to explore, contribute, or extend this project. Happy coding!
This project is licensed under the MIT License β see the LICENSE file for details.