Leveraging LLM Ensembles for Robust Sentiment Classification

This repository contains the source code, data, models, and analysis for our sentiment classification project submitted for the Computational Intelligence Lab (FS2025) at ETH Zürich.

📝 Project Overview

We tackle ternary sentiment classification (positive, neutral, negative) using a dataset of 100,000+ sentences. Our approach compares classical ML models with state-of-the-art transformer-based architectures, explores preprocessing strategies, and evaluates ensembling techniques and LLM-generated paraphrasing.

Our best-performing model is a softmax-averaged ensemble of multiple fine-tuned transformer models:

distilbert-base-multilingual-cased
deberta-v3-base
deberta-v3-large
roberta-large

This ensemble achieves:

L score: 0.9034
Weighted F1 score: 0.83

For details, refer to our 📄 Project Report.

📁 Folder Structure

.
├── config/               # Configuration scripts
├── data/                 # Raw training and test datasets
├── data_loader/          # Data loading logic
├── data_preprocessing/   # Preprocessing, language detection, LLM-based augmentation
├── fine_tuned_models/    # Checkpoints of fine-tuned transformer models
├── generated/            # Generated paraphrases, language info, misclassifications
├── models/               # Classical and transformer model definitions
├── notebooks/            # Jupyter notebooks for all experiments and analysis
├── scripts/              # Scripts to run jobs on cluster and to train LLM models
├── submissions/          # CSV submissions for the Kaggle competition
├── utils/                # Utility functions
├── visualizations/       # Plots and visual analysis functions
├── requirements.txt      # Project dependencies
└── README.md             # You're here!

🔧 Installation & Setup

1. Clone the repository

git clone https://github.com/Thosam1/SentimentClassification.git
cd SentimentClassification

2. Create a virtual environment and activate it

python3 -m venv venv
source venv/bin/activate

3. Install the dependencies

pip install -r requirements.txt

🧪 Running the Notebooks

All experiments are reproducible through the provided Jupyter notebooks.

Run notebooks locally

Launch Jupyter:
```
jupyter notebook
```
Navigate to the notebooks/ folder and open:

Notebook	Purpose
`1_data_exploration.ipynb`	Dataset overview, class distribution
`2_basic_machine_learning.ipynb`	Classical ML models like Logistic Regression, RF, XGBoost
`3_large_language_models.ipynb`	Fine-tuning transformer models (BERT, RoBERTa, DeBERTa, etc.)
`4_roberta_analysis.ipynb`	Roberta-large misclassifications and error analysis
`5_inference_time_augmentation.ipynb`	LLM-generated paraphrasing for correcting misclassifications
`submission_models.ipynb`	Aggregation, ensembling (softmax & majority voting), submission generation

🚀 Scripts for Training on a Cluster

To run fine-tuning jobs on a cluster:

# Bash entry point
bash scripts/batch.sh

# OR Python launcher
python scripts/run_job_on_cluster.py

🧠 Fine-Tuned Models

All fine-tuned model weights are stored under fine_tuned_models/. These include:

BERT (base, multilingual, large)
DistilBERT (base, multilingual)
RoBERTa (base, large)
DeBERTa v3 (base, large)
XLM-RoBERTa (base)

Use these directly via Hugging Face’s AutoModelForSequenceClassification.

📊 Submissions

Submission files (CSV format) using various ensembling strategies are found in:

submissions/
├── deberta_large_submission.csv
├── majority_voting_submission.csv
└── softmax_averaging_submission.csv

📌 Future Work

Test different classification heads and attention mechanisms
Apply LLM translation for sanitizing input data
Improve paraphrasing strategies using cheaper LLMs or distillation
Apply selective LLM augmentation only to samples likely to be misclassified

👥 Authors

Thösam Norlha-Tsang
Afonso Ferreira da Silva Domingues
Rahul Kaundal
Group: Siuuupremacy
ETH Zürich — Computational Intelligence Lab FS2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Leveraging LLM Ensembles for Robust Sentiment Classification

📝 Project Overview

📁 Folder Structure

🔧 Installation & Setup

1. Clone the repository

2. Create a virtual environment and activate it

3. Install the dependencies

🧪 Running the Notebooks

Run notebooks locally

🚀 Scripts for Training on a Cluster

🧠 Fine-Tuned Models

📊 Submissions

📌 Future Work

👥 Authors

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
config		config
data		data
data_loader		data_loader
data_preprocessing		data_preprocessing
generated		generated
models		models
notebooks		notebooks
scripts		scripts
submissions		submissions
utils		utils
visualizations		visualizations
.gitignore		.gitignore
CIL_Sentiment_Analysis___Report.pdf		CIL_Sentiment_Analysis___Report.pdf
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Leveraging LLM Ensembles for Robust Sentiment Classification

📝 Project Overview

📁 Folder Structure

🔧 Installation & Setup

1. Clone the repository

2. Create a virtual environment and activate it

3. Install the dependencies

🧪 Running the Notebooks

Run notebooks locally

🚀 Scripts for Training on a Cluster

🧠 Fine-Tuned Models

📊 Submissions

📌 Future Work

👥 Authors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages