PFW at SemEval-2026 Task 6: Multi-Seed DeBERTa Ensembles for Political Response Clarity and Evasion Classification
This repository contains the code for our system submission to SemEval-2026 Task 6 (CLARITY), which addresses the classification of response clarity and evasion techniques in political interview question-answer pairs.
Our approach fine-tunes DeBERTa-xlarge (900M) and DeBERTa-v3-large (304M) with a multi-seed ensemble strategy:
- 5-fold cross-validation with 10 random seeds yields 50 models per architecture
- Predictions are combined via simple logit averaging
- No LLM prompting or API calls required — runs on a single GPU
| System | Subtask 1 (Clarity) | Subtask 2 (Evasion) |
|---|---|---|
| Majority class baseline | 0.248 | 0.052 |
| TF-IDF + Logistic Regression | 0.546 | 0.319 |
| DeBERTa-v3-large (single model) | 0.643 ± 0.024 | 0.327 ± 0.040 |
| DeBERTa-xlarge (single model) | 0.663 ± 0.021 | — |
| Multi-seed ensemble (ours) | 0.76 (18/41) | 0.50 (12/33) |
Macro F1 scores. Rank on official leaderboard in parentheses.
├── src/
│ ├── training/ # Training scripts
│ │ ├── train_10seed.py # Primary: unified multi-seed training (Task 1 & 2)
│ │ ├── train_v3large_task1.py # v3-large Task 1 training
│ │ ├── train_v3large_task2.py # v3-large Task 2 training
│ │ ├── train_task1_xlarge.py # xlarge Task 1 training
│ │ └── train_utils.py # Shared training utilities
│ ├── models/
│ │ └── encoder_classifier.py # DeBERTa encoder-classifier architecture
│ ├── data/
│ │ ├── load_dataset.py # HuggingFace data loading
│ │ ├── preprocess.py # Text preprocessing & label normalization
│ │ ├── splits.py # GroupKFold CV split generation
│ │ └── stratified_group_kfold.py
│ ├── metrics/
│ │ ├── compute_metrics.py # Macro F1 computation
│ │ └── local_test_scorer.py # Local evaluation scorer
│ └── submission/
│ ├── make_prediction_file.py # Single-model predictions
│ ├── make_prediction_file_ensemble.py # Ensemble predictions
│ └── zip_submission.py # Submission packaging
├── scripts/
│ ├── generate_final_ensemble.py # Final ensemble inference pipeline
│ ├── generate_simple_ensemble.py # Simple logit-averaging ensemble
│ ├── paper_baselines.py # Reproduce paper baselines
│ ├── paper_analysis.py # Generate paper figures and tables
│ ├── build_oof_logits.py # Build OOF logit matrices
│ ├── collect_task1_oof.py # OOF collection for Task 1
│ ├── collect_v3large_oof.py # OOF collection for v3-large
│ ├── eval_task1_predictions.py # Task 1 evaluation
│ ├── eval_task2_predictions.py # Task 2 evaluation
│ ├── local_eval.py # Local evaluation harness
│ └── slurm/ # SLURM job scripts for HPC
├── latex/ # Paper source (ACL format)
├── docs/ # Documentation and paper figures
└── requirements.txt
- Python 3.9+
- PyTorch 2.0+ with CUDA support
- 1x NVIDIA A100 (80GB) recommended; runs on any GPU with >= 24GB
pip install -r requirements.txtThe QEvasion dataset is loaded automatically from HuggingFace:
from datasets import load_dataset
dataset = load_dataset("ailsntua/QEvasion")Step 1: Generate cross-validation splits
python src/data/splits.pyStep 2: Train multi-seed models (example: Task 1, xlarge)
# Single fold + seed
python src/training/train_10seed.py \
--task 1 --fold 0 --seed 42 \
--model_name microsoft/deberta-xlarge \
--epochs 6 --lr 1e-5 --label_smoothing 0.03
# Or submit all fold x seed combinations via SLURM
sbatch scripts/slurm/task1_10seed.sbatchStep 3: Collect OOF logits
python scripts/collect_task1_oof.pyStep 4: Generate ensemble predictions
python scripts/generate_simple_ensemble.pypython scripts/local_eval.py --task 1 --prediction_file submissions/task1_prediction
python scripts/local_eval.py --task 2 --prediction_file submissions/task2_predictionSemEval-2026 Task 6 (CLARITY) addresses political question evasion detection:
- Subtask 1: Classify responses into 3 clarity levels (Clear Reply, Ambivalent, Clear Non-Reply)
- Subtask 2: Classify into 9 fine-grained evasion types (Explicit, Dodging, Deflection, etc.)
Both are evaluated using macro F1 on the QEvasion dataset (3,448 training / 237 evaluation instances).
If you use this code, please cite our paper:
@inproceedings{tamsal2026pfw,
title = {{PFW} at {SemEval}-2026 Task 6: Multi-Seed {DeBERTa} Ensembles for Political Response Clarity and Evasion Classification},
author = {Tamsal, Taleef},
booktitle = {Proceedings of the 20th International Workshop on Semantic Evaluation (SemEval-2026)},
year = {2026},
publisher = {Association for Computational Linguistics},
note = {To appear}
}This project is released for research purposes. The QEvasion dataset is subject to its own license terms.