TAES is the most clinically relevant scoring algorithm in the NEDC suite, designed specifically for evaluating seizure detection systems. It provides fractional credit for partial temporal overlaps between predicted and actual seizure events.
TAES evaluates event-level agreement using fractional scoring:
- Partial credit for events that partially overlap in time
- Multi-overlap sequencing for complex overlap scenarios
- Fractional penalties based on duration ratios
- Fractional Scoring: Returns float values (e.g., TP=133.84)
- Duration-Aware: Penalties proportional to event durations
- Clinical Focus: Optimized for seizure detection evaluation
- No Confusion Matrix: Direct hit/miss/FA calculation only
The critical innovation of TAES is handling multiple overlapping events:
# Scenario 1: One hypothesis spans multiple references
refs: <--> <--> <--> <--> # 4 separate seizures
hyp: <-----------------> # 1 long detection
# Result:
# - First ref: fractional hit (e.g., 0.8)
# - Refs 2-4: Each adds +1.0 to miss penalty!
# - Total: hit=0.8, miss=3.2, fa=fractional-
For each reference event:
- Find all overlapping hypotheses with matching label
- Determine overlap type (hyp extends vs ref extends)
- Apply appropriate scoring function
-
Two overlap cases:
Case A:
ovlp_ref_seqs(hypothesis extends beyond reference)ref: <-----> hyp: <----------->- Calculate fractional hit/FA for primary overlap
- Add +1.0 miss for each additional ref overlapped
Case B:
ovlp_hyp_seqs(reference extends beyond hypothesis)ref: <-----------> hyp: <----->- Multiple hyps can contribute to single ref
- Each hyp adds to hit and reduces miss
-
Unmatched events:
- Any reference event with no overlapping hypothesis adds +1.0 to miss
- Any hypothesis event with no overlapping reference adds +1.0 to false alarms
The core scoring calculation for overlapping events:
def calc_hf(ref, hyp):
ref_dur = ref.stop - ref.start
# Case 1: Pre-prediction (hyp starts before ref)
if hyp.start <= ref.start and hyp.stop <= ref.stop:
hit = (hyp.stop - ref.start) / ref_dur
fa = min(1.0, (ref.start - hyp.start) / ref_dur)
# Case 2: Post-prediction (hyp ends after ref)
elif hyp.start >= ref.start and hyp.stop >= ref.stop:
hit = (ref.stop - hyp.start) / ref_dur
fa = min(1.0, (hyp.stop - ref.stop) / ref_dur)
# Case 3: Over-prediction (hyp covers entire ref)
elif hyp.start < ref.start and hyp.stop > ref.stop:
hit = 1.0
fa = min(1.0, ((hyp.stop - ref.stop) + (ref.start - hyp.start)) / ref_dur)
# Case 4: Under-prediction (hyp entirely within ref)
else:
hit = (hyp.stop - hyp.start) / ref_dur
fa = 0.0
return hit, fafrom nedc_bench.algorithms.taes import TAESScorer
from nedc_bench.models.annotations import EventAnnotation
# Create scorer
scorer = TAESScorer(target_label="seiz")
# Define events
reference = [
EventAnnotation(
channel="TERM", start_time=100.0, stop_time=120.0, label="seiz", confidence=1.0
)
]
hypothesis = [
EventAnnotation(
channel="TERM", start_time=105.0, stop_time=125.0, label="seiz", confidence=0.9
)
]
# Score
result = scorer.score(reference, hypothesis)
print(f"TP: {result.true_positives:.2f}") # TP: 0.75
print(f"FP: {result.false_positives:.2f}") # FP: 0.25
print(f"FN: {result.false_negatives:.2f}") # FN: 0.25- Multi-overlap penalty: Initially missed the +1.0 penalty for each additional reference
- Flag tracking: Proper boolean flag management for processed events
- Fractional boundaries: Exact NEDC calc_hf formula matching
- FA/24h duration mismatch (P0, 2025-09-15) — Beta summed only the longest
event per file, inflating false alarm rates by 5.6×. The fix now mirrors the
legacy implementation by summing durations across all files. See
docs/archive/bugs/P0_CRITICAL_BUG_DURATION.mdfor the original write-up; parity evidence is tracked indocs/reference/parity.md. - Microscopic float differences — Investigations showed ≤0.0014 drift due
to floating-point rounding. TAES tolerances in the tests reflect the NEDC
behaviour documented in
docs/archive/bugs/TAES_INVESTIGATION.md. Any larger delta should trigger a regression investigation.
- Time Complexity: O(n × m) where n=refs, m=hyps
- Space Complexity: O(n + m) for flag arrays
- Typical Runtime: <100ms for clinical datasets
- Clinical seizure detection evaluation
- Variable-duration event scoring
- Systems where partial detection has value
- FDA submission and regulatory compliance
- Fixed-window classification tasks
- Multi-class confusion analysis
- Systems requiring integer counts
- Real-time streaming evaluation
- Parity: Beta matches NEDC v6.0.0 TAES exactly on the SSOT parity set. See docs/archive/bugs/FINAL_PARITY_RESULTS.md.
- False alarm rate (FA/24h) uses event FP counts directly (no epoch scaling). See docs/algorithms/metrics.md.
- Algorithm Overview - Comparison of all algorithms
- Metrics Calculation - FA/24h computation
- Source:
nedc_bench/algorithms/taes.py - NEDC Reference:
nedc_eeg_eval_taes.py(v6.0.0):ovlp_ref_seqs(lines ~669–736)ovlp_hyp_seqs(lines ~740–891)calc_hf(lines ~926–1006)