NEDC-BENCH implements five EEG annotation scoring algorithms originally developed by Temple University's Neural Engineering Data Consortium (NEDC). These algorithms evaluate the agreement between reference (ground truth) and hypothesis (predicted) annotations for EEG seizure detection systems.
| Algorithm | Type | Scoring Unit | Confusion Matrix | Key Metric | Use Case |
|---|---|---|---|---|---|
| TAES | Event-based | Fractional events | No | TP (fractional) | Clinical seizure detection |
| Epoch | Sample-based | Fixed epochs | Yes (NxN) | TP (integer) | Time-series classification |
| DP Alignment | Sequence-based | Label sequences | Yes (substitution) | Edit distance | Sequence comparison |
| Overlap | Event-based | Binary events | No | Hits (integer) | Simple event detection |
| IRA | Sample-based | Epochs | Yes (NxN) | Cohen's Kappa | Inter-rater agreement |
All algorithms operate on event annotations with:
- Channel: EEG channel identifier (e.g., "TERM")
- Start/Stop Time: Temporal boundaries in seconds
- Label: Event classification (e.g., "seiz", "bckg", "null")
- Confidence: Prediction confidence score (0-1)
- True Positives (TP): Correctly identified events
- False Positives (FP): Incorrectly predicted events
- False Negatives (FN): Missed events
- FA/24h: False alarms per 24 hours (critical clinical metric)
- Clinical accuracy is paramount
- Events have variable durations
- Fractional credit for partial overlaps is needed
- Multi-overlap sequencing behavior is acceptable
- Fixed-width time windows are natural
- Confusion matrix analysis is required
- Background augmentation is needed
- Integer counts are preferred
- Sequence-level comparison is needed
- Edit operations (ins/del/sub) are meaningful
- Order of events matters
- Detailed error analysis is required
- Simple binary hit/miss is sufficient
- ANY temporal overlap counts as detection
- Fast computation is needed
- Event boundaries are less critical
- Inter-rater reliability is the goal
- Cohen's Kappa is the standard metric
- Per-label and overall agreement needed
- Statistical significance testing required
src/nedc_bench/algorithms/
├── taes.py # Time-Aligned Event Scoring
├── epoch.py # Epoch-based scoring
├── dp_alignment.py # Dynamic Programming alignment
├── overlap.py # Binary overlap detection
└── ira.py # Inter-Rater Agreement
- Exact NEDC Parity: Bit-for-bit matching of original algorithms
- SOLID Principles: Clean, maintainable code architecture
- Type Safety: Full MyPy type annotations
- Testability: Comprehensive pytest coverage
- Documentation: Inline references to NEDC source lines
- Inclusive boundaries: NEDC uses
<=for stop time comparisons - Bitwise operators: Some algorithms use
&for historical reasons - NULL_CLASS handling: Special "null" label for gaps/sentinels
- TAES: Returns float values for fractional scoring
- Epoch/DP/Overlap: Return integer counts
- IRA: Integer confusion matrix, float kappa values
- All algorithms are O(n²) worst case for overlap detection
- Epoch/IRA benefit from pre-sorting events
- DP Alignment has O(m×n) dynamic programming complexity
All algorithms achieve 100% parity with NEDC v6.0.0 on the SSOT parity set. See docs/archive/bugs/FINAL_PARITY_RESULTS.md for details.
- TAES Algorithm - Fractional event scoring
- Epoch Algorithm - Fixed-width window scoring
- DP Alignment - Sequence alignment
- Overlap Algorithm - Binary detection
- IRA Algorithm - Inter-rater agreement
- Metrics Calculation - FA/24h and other metrics