This repository contains an ML solution for detecting stance on COVID-19 vaccination from Twitter data. The project fine-tunes Google's FLAN-T5-Large model to classify tweets into three categories: "in-favor", "against", and "neutral-or-unclear".
The fine-tuned weights are available upon request. Due to their size, they are not included in the repository.
Predict the stance of 5,751 tweets regarding COVID-19 vaccination using a finetuned FLAN-T5-Large language model. The stance classification task involves determining whether a tweet supports, opposes, or is neutral/unclear about vaccination.
- Size: 5,751 tweets
- Labels: 3 classes (in-favor, against, neutral-or-unclear)
- Format: CSV with tweet text and human-annotated ground truth labels
- Split: 70% train, 15% validation, 15% test (stratified)
- Base Model: Google FLAN-T5-Large (770M parameters)
- Framework: HuggingFace Transformers + PyTorch
- Training: Fine-tuning with custom prompt engineering
The model uses structured prompts to guide stance classification:
Analyze this tweet's stance on COVID-19 vaccination.
Tweet: "{tweet}"
If the tweet clearly supports vaccination, answer: in-favor.
If the tweet clearly opposes vaccination, answer: against.
If the tweet is neutral, unclear, or not about vaccination, answer: neutral-or-unclear.
Answer with only the label: in-favor, against, or neutral-or-unclear.
- Learning Rate: 8e-5
- Batch Size: 8 (effective batch size: 32 with gradient accumulation)
- Epochs: 10
- Optimizer: AdamW with weight decay
- Mixed Precision: FP16 for efficiency
- Early Stopping: Based on validation loss
| Approach | F1 Macro | F1 Weighted | Accuracy |
|---|---|---|---|
| Zero-shot FLAN-T5-Large | 0.427 | 0.534 | 0.594 |
| Fine-tuned FLAN-T5-Large | 0.612 | 0.697 | 0.731 |
precision recall f1-score support
in-favor 0.56 0.58 0.57 271
against 0.61 0.81 0.70 436
neutral-or-unclear 1.00 0.01 0.01 156
Prediction Distribution:
- Model Predictions: in-favor (578), against (284), neutral-or-unclear (1)
- True Labels: in-favor (271), against (436), neutral-or-unclear (156)
precision recall f1-score support
in-favor 0.75 0.76 0.76 271
against 0.73 0.91 0.81 436
neutral-or-unclear 0.57 0.17 0.27 156
- Significant improvement from zero-shot to fine-tuned (F1 Macro: 0.427 → 0.612, +43% improvement)
- Zero-shot bias - model heavily overpredicts "in-favor" (578 vs 271 true) and underpredicts "neutral-or-unclear" (1 vs 156 true)
- Strong fine-tuned performance on majority classes (in-favor, against)
- Class imbalance challenge - neutral-or-unclear class severely underpredicted in both zero-shot and fine-tuned
- Model bias toward majority classes due to imbalanced training data
stance/
├── data/
│ ├── raw_data.csv # Original dataset
│ └── processed/ # Train/val/test splits
├── src/
│ ├── config/
│ │ └── config.py # Configuration settings
│ ├── data/
│ │ ├── data_loader.py # Data loading and preprocessing
│ │ └── data_formatter.py # Prompt formatting
│ └── models/
│ ├── trainer.py # Model training pipeline
│ └── evaluate_model.py # Model evaluation
├── models/
├── results/
pip install -r requirements.txtpython -m src.models.trainerpython -m src.models.evaluate_modelpython -m src.models.test_zero_shot- Class Imbalance: Neutral-or-unclear class severely underpredicted
- Model Bias: Tendency to predict majority classes
- Data Augmentation: Create synthetic neutral-or-unclear examples
- Oversampling: Balance training data through strategic sampling
- Class Weights
- Hardware: NVIDIA V100 GPU (Colab Pro)
- Training Time: ~1-2 hours for 10 epochs
- Memory Usage: ~16GB GPU memory
- Checkpointing: Best model saved based on validation loss
- Primary: F1 Macro (accounts for class imbalance)
- Secondary: F1 Weighted, Accuracy, Per-class F1 scores
- Analysis: Confusion matrix, prediction distribution
This project successfully demonstrates the effectiveness of fine-tuning large language models for stance detection tasks. The 23% improvement in accuracy (0.594 → 0.731) shows the value of task-specific training. The zero-shot baseline reveals significant model bias toward the "in-favor" class, which fine-tuning helps address. However, the class imbalance challenge persists and highlights the need for more sophisticated data balancing techniques in future developments of this project.