SeizureTransformer on TUSZ: First Clinical Evaluation with NEDC v6.0.0

Reveals 27× False Alarm Gap from Reported Performance

TL;DR: SeizureTransformer won EpilepsyBench 2025 with 1 FA/24h on Dianalund. We evaluated it on TUSZ v2.0.3 using the clinical standard NEDC v6.0.0 scorer. Result: 26.89 FA/24h — a 27× gap from the reported performance.

🎯 Key Finding

The same model that achieves 1 FA/24h on Dianalund (Nordic dataset, SzCORE scoring) produces 26.89 FA/24h on TUSZ (Temple dataset, NEDC scoring). This isn't a model failure — it's a demonstration of how dataset and scoring choices fundamentally shape reported metrics.

Why This Matters

TUSZ + NEDC is the clinical gold standard for seizure detection evaluation
EpilepsyBench doesn't report TUSZ results for models trained on it (marked with 🚂)
First evaluation of SeizureTransformer on TUSZ using proper clinical scoring (NEDC v6.0.0)

SeizureTransformer ranks #1 on EpilepsyBench, but TUSZ evaluation is blocked (🚂)

📊 Results

Dataset	Scorer	Sensitivity	FA/24h	F1	Note
Dianalund	SzCORE¹	37%	1	0.43	EpilepsyBench Winner
TUSZ eval	NEDC OVERLAP²	45.63%	26.89	0.396	Clinical Standard
TUSZ eval	SzCORE Event¹	52.35%	8.59	0.485	With Tolerances
TUSZ eval	NEDC TAES²	65.21%	136.73	0.237	Strictest (Partial Credit)

¹ SzCORE: 30s pre-ictal, 60s post-ictal tolerances, merges events <90s apart ² NEDC: Temple's clinical scorer for TUSZ. OVERLAP = any-overlap, TAES = time-aligned

Key insight: Same predictions, different scorers → 3.1× difference in FA/24h (26.89 vs 8.59). This demonstrates how evaluation methodology fundamentally shapes reported performance.

Additional Operating Points

10 FA/24h target: 33.90% sensitivity (NEDC OVERLAP)
2.5 FA/24h target: 14.50% sensitivity (NEDC OVERLAP)
AUROC: 0.9021 (excellent discrimination capability)

🚀 Quick Start

# 1. Setup
git clone https://github.com/Clarity-Digital-Twin/SeizureTransformer
cd SeizureTransformer
make setup-dev && source .venv/bin/activate

# 2. Get pretrained weights from https://github.com/keruiwu/SeizureTransformer
# Place at: wu_2025/src/wu_2025/model.pth

# 3. Get TUSZ eval data (5.2GB, requires Temple data agreement)
# From https://isip.piconepress.com/projects/nedc/html/tuh_eeg/
rsync -auxvL nedc-tuh-eeg@www.isip.piconepress.com:data/tuh_eeg/tuh_eeg_seizure/v2.0.3/edf/eval .
# Place at: wu_2025/data/tusz/v2.0.3/

# 4. Run evaluation
tusz-eval \
  --data_dir wu_2025/data/tusz/v2.0.3/edf/eval \
  --out_dir experiments/baseline

# 5. Score with NEDC v6.0.0
nedc-run \
  --checkpoint experiments/baseline/checkpoint.pkl \
  --outdir results/nedc_baseline

Docker Alternative

# CPU
make docker-build && make docker-run

# GPU (requires NVIDIA Container Toolkit)
make docker-build-gpu && make docker-run-gpu

📂 Repository Structure

SeizureTransformer/
├── wu_2025/                          # Original SeizureTransformer (vendored, unmodified)
├── evaluation/
│   └── nedc_eeg_eval/
│       ├── v6.0.0/                   # Temple's official NEDC scorer (unmodified)
│       └── nedc_scoring/             # Our wrapper scripts for NEDC
├── src/seizure_evaluation/           # Our evaluation pipeline
├── literature/arxiv_submission/      # Full paper and analysis
└── docs/                             # Comprehensive technical documentation

Key Documentation

Full Paper — Complete arXiv submission
Results Table — All metrics across scoring methods
Methods — Tuning and evaluation details

📝 Paper & Citations

@article{clarity2025scoring,
  title = {Scoring Matters: A Reproducible NEDC Evaluation of SeizureTransformer on TUSZ},
  author = {{Clarity Digital Twin Team}},
  journal = {arXiv preprint arXiv:2025.XXXXX},
  year = {2025}
}

Also cite: Wu et al. 2025 (original model), Shah et al. 2021 (NEDC), Shah et al. 2018 (TUSZ dataset)

⚖️ License

Apache-2.0 (our code) • MIT (SeizureTransformer) • Temple License (NEDC)

🙏 Acknowledgments

Kerui Wu for the model • Temple NEDC for tools and dataset • EpilepsyBench for infrastructure

Questions? Open an issue • Paper: arXiv:2025.XXXXX

Name		Name	Last commit message	Last commit date
Latest commit History 602 Commits
.claude		.claude
.github/workflows		.github/workflows
docker		docker
docs		docs
evaluation		evaluation
experiments		experiments
literature		literature
scripts		scripts
src/seizure_evaluation		src/seizure_evaluation
tests		tests
wu_2025		wu_2025
.dockerignore		.dockerignore
.env		.env
.env.example		.env.example
.gitignore		.gitignore
ACKNOWLEDGMENTS.md		ACKNOWLEDGMENTS.md
AGENTS.md		AGENTS.md
CITATION.cff		CITATION.cff
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
Dockerfile.gpu		Dockerfile.gpu
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
TUSZ_EDF_HEADER_FIX.md		TUSZ_EDF_HEADER_FIX.md
monitor_eval.sh		monitor_eval.sh
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
run_pipeline.sh		run_pipeline.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SeizureTransformer on TUSZ: First Clinical Evaluation with NEDC v6.0.0

Reveals 27× False Alarm Gap from Reported Performance

🎯 Key Finding

Why This Matters

📊 Results

Additional Operating Points

🚀 Quick Start

Docker Alternative

📂 Repository Structure

Key Documentation

📝 Paper & Citations

⚖️ License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SeizureTransformer on TUSZ: First Clinical Evaluation with NEDC v6.0.0

Reveals 27× False Alarm Gap from Reported Performance

🎯 Key Finding

Why This Matters

📊 Results

Additional Operating Points

🚀 Quick Start

Docker Alternative

📂 Repository Structure

Key Documentation

📝 Paper & Citations

⚖️ License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages