Skip to content

Deep learning-based emotion detection system that identifies human emotions (joy, sadness, anger, fear, love, surprise) from text using LSTM neural networks. Includes complete NLP preprocessing pipeline, model training, evaluation metrics, and visualization tools. Achieves 90%+ accuracy on emotion classification tasks.

Notifications You must be signed in to change notification settings

SamuelJoseph23/bert-emotion-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 

Repository files navigation

BERT-based Emotion Recognition Pipeline

Overview

This project provides a fully reproducible pipeline for text-based emotion detection using BERT and modern NLP preprocessing. Includes:

  • Data Pipeline: Automated merging, advanced preprocessing, and negation-based augmentation.
  • BERT Model: Fine-tuned distilbert-base-uncased for multi-class emotion classification.
  • Interfaces: CLI interactive mode and a modern Streamlit web application.

Features

  • Unified data preparation script: prepare_data.py combines, preprocesses, and augments emotion datasets automatically.
  • Automated NLTK setup: No manual downloads required; the script handles punkt, stopwords, etc.
  • Advanced Preprocessing: Lemmatization, proper stopword handling, and explicit negation marking (e.g., "not happy" -> "NOT_happy").
  • Augmentation: Adds robust negated phrase samples for each emotion class to improve "neutral" detection.
  • Flexible Inference: Choose between a Command Line Interface (CLI) or a Web UI.

Repository Structure

├── data/
│   ├── train.csv
│   ├── test.csv
│   ├── val.csv
│   ├── final_data_aug.csv   # Generated training data
├── prepare_data.py         # Data pipeline: load, preprocess, augment
├── bert_emotion.py         # Model training & CLI inference
├── app.py                  # Streamlit Web Application
├── requirements.txt        # Dependencies
├── README.md               # Documentation
└── bert_emotion_model/     # Saved model artifact (after training)

Setup

1. Install requirements

pip install -r requirements.txt

2. NLTK Data

NLTK data is now automatically downloaded when you run the scripts for the first time.


Usage

1. Data Preparation

Run the unified script to preprocess and augment data:

python prepare_data.py
  • Output: data/final_data_aug.csv

2. Model Training

Train the BERT emotion classifier:

python bert_emotion.py
  • Choose Option 1 for training.
  • The model will be saved in bert_emotion_model/.

Inference

A. Command-line Interface (CLI)

python bert_emotion.py
  • Choose Option 2 for interactive detection.
  • Type sentences and press Enter for results.

B. Web Application (Streamlit)

For a more visual experience, run the web app:

streamlit run app.py

Requirements

The pipeline requires the following Python libraries:

  • pandas
  • nltk
  • torch
  • transformers
  • scikit-learn
  • accelerate
  • streamlit (for the web app)

Tips

  • GPU Acceleration: The scripts automatically detect CUDA-enabled GPUs for faster training and inference.
  • Custom Labels: You can tune emotion labels in prepare_data.py.
  • Model Tuning: Adjust epochs and batch sizes in bert_emotion.py within TrainingArguments.

License

MIT License


Developed with automation and ease of use in mind.

About

Deep learning-based emotion detection system that identifies human emotions (joy, sadness, anger, fear, love, surprise) from text using LSTM neural networks. Includes complete NLP preprocessing pipeline, model training, evaluation metrics, and visualization tools. Achieves 90%+ accuracy on emotion classification tasks.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages