You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This document details the computation of all metrics used in NEDC-BENCH, with special emphasis on the critical FA/24h (False Alarms per 24 hours) metric used in clinical evaluation.
Core Metrics
True Positives (TP)
Definition: Correctly identified positive events
Computation: Algorithm-specific
TAES: Fractional sum of overlaps
Epoch/DP/Overlap/IRA: Integer counts
False Positives (FP)
Definition: Incorrectly predicted positive events
Computation: Algorithm-specific
TAES: Fractional false alarm portions
Epoch: Integer count from confusion matrix
Overlap: Unmatched hypothesis events
False Negatives (FN)
Definition: Missed positive events
Computation: Algorithm-specific
TAES: 1.0 - hit + multi-overlap penalties
Epoch: Misses from compressed sequences
DP: Deletions + substitutions
FA/24h (False Alarms per 24 hours)
Critical Clinical Metric
FA/24h is the most important metric for clinical seizure detection systems:
deffa_per_24h(false_positives, total_duration_seconds, epoch_duration=None):
""" Compute FA/24h according to NEDC definitions. Args: false_positives: FP count (float or int) total_duration_seconds: Total recording duration epoch_duration: For epoch-based algorithms only Returns: False alarms per 24 hours """iftotal_duration_seconds<=0:
return0.0# For epoch-based algorithms, scale by epoch durationifepoch_durationisnotNone:
numerator=false_positives*epoch_durationelse:
numerator=false_positives# Convert to 24-hour ratereturn (numerator/total_duration_seconds) *86400.0
Algorithm-Specific FA/24h
Algorithm
FP Units
Epoch Scaling
Example
TAES
Fractional events
No
134.20 / duration * 86400
Epoch
Epoch counts
Yes
31989 * 1.0 / duration * 86400
DP Alignment
Event counts
No
3 / duration * 86400
Overlap
Event counts
No
1 / duration * 86400
IRA
N/A
N/A
Not computed
Standard Metrics
Sensitivity (Recall, TPR)
sensitivity=TP/ (TP+FN)
Range: 0.0 to 1.0
Interpretation: Proportion of actual positives correctly identified
Clinical Target: >0.90 for seizure detection
Precision (PPV)
precision=TP/ (TP+FP)
Range: 0.0 to 1.0
Interpretation: Proportion of positive predictions that are correct
defget_total_duration(annotations):
"""Extract total recording duration from annotations"""ifnotannotations:
return0.0# Method 1: From file duration headerifhasattr(annotations[0], "file_duration"):
returnannotations[0].file_duration# Method 2: From max stop timemax_time=max(event.stop_timeforeventinannotations)
returnmax_time# Method 3: From explicit duration parameter# Passed separately to scoring functions
Clinical Thresholds
FDA/Clinical Standards
Metric
Acceptable
Good
Excellent
Sensitivity
>0.85
>0.90
>0.95
FA/24h
<10
<5
<1
F1 Score
>0.70
>0.80
>0.90
Research Standards
Metric
Minimum
Target
State-of-Art
Cohen's κ
>0.40
>0.60
>0.80
Precision
>0.50
>0.70
>0.90
Accuracy
>0.80
>0.90
>0.95
Implementation Notes
Floating Point Precision
Use float64 for fractional metrics
Round display to 2-4 decimal places
Exact comparison for parity testing
Edge Cases
Empty annotations: Return 0.0 for all metrics
Zero duration: Return 0.0 for FA/24h
No positive class: Undefined sensitivity (return 0.0)
Performance Optimization
Cache duration calculations
Vectorize confusion matrix operations
Use numpy for large-scale computations
Validation
See docs/archive/bugs/FINAL_PARITY_RESULTS.md for parity-confirmed metrics across all algorithms.