Commit 342bd7a
Add CalibrationErrorMetric and CalibrationError handler (Project-MONAI#8707)
## Description
Addresses Project-MONAI#8505
### Overview
This PR adds calibration error metrics and an Ignite handler to MONAI,
enabling users to evaluate and monitor model calibration for
segmentation and other multi-class probabilistic tasks with shape `(B,
C, spatial...)`.
### What's Included
#### 1. Calibration Metrics (`monai/metrics/calibration.py`)
- **`calibration_binning()`**: Core function to compute calibration bins
with mean predictions, mean ground truths, and bin counts. Exported to
support research workflows where users need per-bin statistics for
plotting reliability diagrams.
- **`CalibrationReduction`**: Enum supporting three reduction methods:
- `EXPECTED` - Expected Calibration Error (ECE): weighted average by bin
count
- `AVERAGE` - Average Calibration Error (ACE): simple average across
bins
- `MAXIMUM` - Maximum Calibration Error (MCE): maximum error across bins
- **`CalibrationErrorMetric`**: A `CumulativeIterationMetric` subclass
supporting:
- Configurable number of bins
- Background channel exclusion (`include_background`)
- All standard MONAI metric reductions (`mean`, `sum`, `mean_batch`,
etc.)
- Batched, per-class computation
#### 2. Ignite Handler (`monai/handlers/calibration.py`)
- **`CalibrationError`**: An `IgniteMetricHandler` wrapper that:
- Attaches to PyTorch Ignite engines for training/validation loops
- Supports `save_details` for per-sample/per-channel metric details via
the metric buffer
- Integrates with MONAI's existing handler ecosystem
#### 3. Comprehensive Tests
- **`tests/metrics/test_calibration_metric.py`**: Tests covering:
- Binning function correctness with NaN handling
- ECE/ACE/MCE reduction modes
- Background exclusion
- Cumulative iteration behavior
- Input validation (shape mismatch, ndim, num_bins)
- **`tests/handlers/test_handler_calibration_error.py`**: Tests
covering:
- Handler attachment and computation via `engine.run()`
- All calibration reduction modes
- `save_details` functionality
- Optional Ignite dependency handling (tests skip if Ignite not
installed)
### Public API
Exposes the following via `monai.metrics`:
- `CalibrationErrorMetric`
- `CalibrationReduction`
- `calibration_binning`
Exposes via `monai.handlers`:
- `CalibrationError`
### Implementation Notes
- Uses `scatter_add` + counts instead of `scatter_reduce("mean")` for
better PyTorch version compatibility
- Includes input validation with clear error messages
- Clamps bin indices to prevent out-of-range errors with slightly
out-of-bound probabilities
- Uses `torch.nan_to_num` instead of in-place operations for cleaner
code
- Ignite is treated as an optional dependency in tests (skipped if not
installed)
### Related Work
The algorithmic approach follows the calibration metrics from
[Average-Calibration-Losses](https://github.com/cai4cai/Average-Calibration-Losses/),
with related publications:
- [MICCAI 2024
Paper](https://papers.miccai.org/miccai-2024/091-Paper3075.html)
- [arXiv Paper](https://arxiv.org/abs/2506.03942v1)
### Future Work
As discussed in the issue, calibration losses will be added in a
separate PR to keep changes focused and easier to review.
## Checklist
- [x] Code follows MONAI style guidelines (ruff passes)
- [x] All new code has appropriate license headers
- [x] Public API is exported in `__init__.py` files
- [x] Docstrings include examples with proper transforms usage
- [x] Unit tests cover main functionality
- [x] Tests handle optional Ignite dependency gracefully
- [x] No breaking changes to existing API
## Example Usage
```python
from monai.metrics import CalibrationErrorMetric
from monai.transforms import Activations, AsDiscrete
# Setup transforms
softmax = Activations(softmax=True)
to_onehot = AsDiscrete(to_onehot=num_classes)
# Create metric
metric = CalibrationErrorMetric(
num_bins=15,
include_background=False,
calibration_reduction="expected" # ECE
)
# In evaluation loop
# Note: y_pred should be probabilities in [0,1], y should be one-hot/binarized
for batch_data in dataloader:
logits, labels = model(batch_data)
preds = softmax(logits)
labels_onehot = to_onehot(labels)
metric(y_pred=preds, y=labels_onehot)
ece = metric.aggregate()
```
### With Ignite Handler
```python
from monai.handlers import CalibrationError, from_engine
calibration_handler = CalibrationError(
num_bins=15,
include_background=False,
calibration_reduction="expected",
output_transform=from_engine(["pred", "label"]),
save_details=True,
)
calibration_handler.attach(evaluator, name="calibration_error")
```
---------
Signed-off-by: Theo Barfoot <theo.barfoot@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>1 parent 4b1777f commit 342bd7a
File tree
8 files changed
+1096
-0
lines changed- docs/source
- monai
- handlers
- metrics
- tests
- handlers
- metrics
8 files changed
+1096
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
86 | 92 | | |
87 | 93 | | |
88 | 94 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
185 | 185 | | |
186 | 186 | | |
187 | 187 | | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
188 | 197 | | |
189 | 198 | | |
190 | 199 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
| |||
0 commit comments