This project provides a fully reproducible pipeline for text-based emotion detection using BERT and modern NLP preprocessing. Includes:
- Data Pipeline: Automated merging, advanced preprocessing, and negation-based augmentation.
- BERT Model: Fine-tuned
distilbert-base-uncasedfor multi-class emotion classification. - Interfaces: CLI interactive mode and a modern Streamlit web application.
- Unified data preparation script:
prepare_data.pycombines, preprocesses, and augments emotion datasets automatically. - Automated NLTK setup: No manual downloads required; the script handles
punkt,stopwords, etc. - Advanced Preprocessing: Lemmatization, proper stopword handling, and explicit negation marking (e.g., "not happy" -> "NOT_happy").
- Augmentation: Adds robust negated phrase samples for each emotion class to improve "neutral" detection.
- Flexible Inference: Choose between a Command Line Interface (CLI) or a Web UI.
├── data/
│ ├── train.csv
│ ├── test.csv
│ ├── val.csv
│ ├── final_data_aug.csv # Generated training data
├── prepare_data.py # Data pipeline: load, preprocess, augment
├── bert_emotion.py # Model training & CLI inference
├── app.py # Streamlit Web Application
├── requirements.txt # Dependencies
├── README.md # Documentation
└── bert_emotion_model/ # Saved model artifact (after training)
pip install -r requirements.txtNLTK data is now automatically downloaded when you run the scripts for the first time.
Run the unified script to preprocess and augment data:
python prepare_data.py- Output:
data/final_data_aug.csv
Train the BERT emotion classifier:
python bert_emotion.py- Choose Option 1 for training.
- The model will be saved in
bert_emotion_model/.
python bert_emotion.py- Choose Option 2 for interactive detection.
- Type sentences and press Enter for results.
For a more visual experience, run the web app:
streamlit run app.pyThe pipeline requires the following Python libraries:
pandasnltktorchtransformersscikit-learnacceleratestreamlit(for the web app)
- GPU Acceleration: The scripts automatically detect CUDA-enabled GPUs for faster training and inference.
- Custom Labels: You can tune emotion labels in
prepare_data.py. - Model Tuning: Adjust epochs and batch sizes in
bert_emotion.pywithinTrainingArguments.
MIT License
Developed with automation and ease of use in mind.