PFW at SemEval-2026 Task 6: Multi-Seed DeBERTa Ensembles for Political Response Clarity and Evasion Classification

This repository contains the code for our system submission to SemEval-2026 Task 6 (CLARITY), which addresses the classification of response clarity and evasion techniques in political interview question-answer pairs.

System Overview

Our approach fine-tunes DeBERTa-xlarge (900M) and DeBERTa-v3-large (304M) with a multi-seed ensemble strategy:

5-fold cross-validation with 10 random seeds yields 50 models per architecture
Predictions are combined via simple logit averaging
No LLM prompting or API calls required — runs on a single GPU

Results

System	Subtask 1 (Clarity)	Subtask 2 (Evasion)
Majority class baseline	0.248	0.052
TF-IDF + Logistic Regression	0.546	0.319
DeBERTa-v3-large (single model)	0.643 ± 0.024	0.327 ± 0.040
DeBERTa-xlarge (single model)	0.663 ± 0.021	—
Multi-seed ensemble (ours)	0.76 (18/41)	0.50 (12/33)

Macro F1 scores. Rank on official leaderboard in parentheses.

Project Structure

├── src/
│   ├── training/           # Training scripts
│   │   ├── train_10seed.py          # Primary: unified multi-seed training (Task 1 & 2)
│   │   ├── train_v3large_task1.py   # v3-large Task 1 training
│   │   ├── train_v3large_task2.py   # v3-large Task 2 training
│   │   ├── train_task1_xlarge.py    # xlarge Task 1 training
│   │   └── train_utils.py           # Shared training utilities
│   ├── models/
│   │   └── encoder_classifier.py    # DeBERTa encoder-classifier architecture
│   ├── data/
│   │   ├── load_dataset.py          # HuggingFace data loading
│   │   ├── preprocess.py            # Text preprocessing & label normalization
│   │   ├── splits.py                # GroupKFold CV split generation
│   │   └── stratified_group_kfold.py
│   ├── metrics/
│   │   ├── compute_metrics.py       # Macro F1 computation
│   │   └── local_test_scorer.py     # Local evaluation scorer
│   └── submission/
│       ├── make_prediction_file.py  # Single-model predictions
│       ├── make_prediction_file_ensemble.py  # Ensemble predictions
│       └── zip_submission.py        # Submission packaging
├── scripts/
│   ├── generate_final_ensemble.py   # Final ensemble inference pipeline
│   ├── generate_simple_ensemble.py  # Simple logit-averaging ensemble
│   ├── paper_baselines.py           # Reproduce paper baselines
│   ├── paper_analysis.py            # Generate paper figures and tables
│   ├── build_oof_logits.py          # Build OOF logit matrices
│   ├── collect_task1_oof.py         # OOF collection for Task 1
│   ├── collect_v3large_oof.py       # OOF collection for v3-large
│   ├── eval_task1_predictions.py    # Task 1 evaluation
│   ├── eval_task2_predictions.py    # Task 2 evaluation
│   ├── local_eval.py               # Local evaluation harness
│   └── slurm/                       # SLURM job scripts for HPC
├── latex/                           # Paper source (ACL format)
├── docs/                            # Documentation and paper figures
└── requirements.txt

Reproduction

Requirements

Python 3.9+
PyTorch 2.0+ with CUDA support
1x NVIDIA A100 (80GB) recommended; runs on any GPU with >= 24GB

pip install -r requirements.txt

Data

The QEvasion dataset is loaded automatically from HuggingFace:

from datasets import load_dataset
dataset = load_dataset("ailsntua/QEvasion")

Training

Step 1: Generate cross-validation splits

python src/data/splits.py

Step 2: Train multi-seed models (example: Task 1, xlarge)

# Single fold + seed
python src/training/train_10seed.py \
    --task 1 --fold 0 --seed 42 \
    --model_name microsoft/deberta-xlarge \
    --epochs 6 --lr 1e-5 --label_smoothing 0.03

# Or submit all fold x seed combinations via SLURM
sbatch scripts/slurm/task1_10seed.sbatch

Step 3: Collect OOF logits

python scripts/collect_task1_oof.py

Step 4: Generate ensemble predictions

python scripts/generate_simple_ensemble.py

Evaluation

python scripts/local_eval.py --task 1 --prediction_file submissions/task1_prediction
python scripts/local_eval.py --task 2 --prediction_file submissions/task2_prediction

Task Description

SemEval-2026 Task 6 (CLARITY) addresses political question evasion detection:

Subtask 1: Classify responses into 3 clarity levels (Clear Reply, Ambivalent, Clear Non-Reply)
Subtask 2: Classify into 9 fine-grained evasion types (Explicit, Dodging, Deflection, etc.)

Both are evaluated using macro F1 on the QEvasion dataset (3,448 training / 237 evaluation instances).

Citation

If you use this code, please cite our paper:

@inproceedings{tamsal2026pfw,
  title     = {{PFW} at {SemEval}-2026 Task 6: Multi-Seed {DeBERTa} Ensembles for Political Response Clarity and Evasion Classification},
  author    = {Tamsal, Taleef},
  booktitle = {Proceedings of the 20th International Workshop on Semantic Evaluation (SemEval-2026)},
  year      = {2026},
  publisher = {Association for Computational Linguistics},
  note      = {To appear}
}

License

This project is released for research purposes. The QEvasion dataset is subject to its own license terms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PFW at SemEval-2026 Task 6: Multi-Seed DeBERTa Ensembles for Political Response Clarity and Evasion Classification

System Overview

Results

Project Structure

Reproduction

Requirements

Data

Training

Evaluation

Task Description

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
docs		docs
latex		latex
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

PFW at SemEval-2026 Task 6: Multi-Seed DeBERTa Ensembles for Political Response Clarity and Evasion Classification

System Overview

Results

Project Structure

Reproduction

Requirements

Data

Training

Evaluation

Task Description

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages