This repository is for our project as part of SemEval 2025 Task 11-A, focusing on emotion detection and intensity prediction from text. The goal is to classify emotions expressed in a text snippet and predict their intensity. The target emotions include:
- Joy
- Sadness
- Fear
- Anger
- Surprise
- Disgust
This project involves multi-label classification and ordinal intensity prediction for a single language (English).
- Develop a machine learning model capable of:
- Detecting multiple emotions in a text.
- Predicting the intensity of each detected emotion.
- Focus on a single-language (English) dataset to optimize accuracy.
- Provide explainability and insights into the model's predictions using tools like SHAP.
- Evaluate the model's performance using the F-score and explore comparisons with human judgments.
- Preprocess and explore the dataset to prepare it for model training.
- Fine-tune transformer models (e.g., RoBERTa) for:
- Emotion detection (multi-label classification).
- Intensity prediction (ordinal regression or classification).
- Conduct model evaluation and error analysis.
- Perform explainability analysis to interpret the model's predictions.
- Document findings, results, and challenges.
- Source: Dataset provided by the SemEval 2025 Task 11-A competition (download from Codabench).
- Language: English.
- Structure: Text snippets with labels for emotions and optional intensity levels.
- Emotion Classes: Joy, sadness, fear, anger, surprise, disgust.
- Intensity Levels:
- 0: No emotion.
- 1: Low intensity.
- 2: Moderate intensity.
- 3: High intensity.
- Download and preprocess the dataset.
- Explore label distributions and handle class imbalance.
- Tokenize text using RoBERTa's tokenizer.
- Fine-tune RoBERTa for emotion detection and intensity prediction.
- Add dual-output heads:
- Multi-label classification for emotions.
- Ordinal regression or classification for intensity.
- Evaluate the model using:
- F-score for emotion detection.
- Mean Squared Error or Weighted F-score for intensity.
- Compare results with baseline models.
- Use SHAP to analyze model predictions and identify key text features contributing to emotions and intensity.
- Develop APIs for emotion and intensity prediction.
- Document the model's performance, insights, and challenges.
- Prepare a final report for submission.
- Python 3.8+
- Hugging Face Transformers
- PyTorch or TensorFlow
- SHAP for explainability
- Set up the repository structure.
- Collaborate on preprocessing scripts and baseline models.
- Divide tasks among team members for efficient progress.
- SemEval 2025 Task 11-A Details
- Competition GitHub Repository
- Guidelines about submission and dataset
- BERT Text Classification Tutorial
- RoBERTa Explanation
- SHAP for Text Analysis
- GoogleDoc for Documentation