Commit f619d63
feat: implement Calibration Metrics for uncertainty quantification
Add comprehensive calibration metrics module (molax/metrics) including:
Metrics:
- negative_log_likelihood: Proper scoring rule for probabilistic predictions
- expected_calibration_error: Average gap between confidence and accuracy
- compute_calibration_curve: Data for reliability diagrams
- sharpness: Average predicted uncertainty
- calibration_error_per_sample: Per-sample z-scores
- evaluate_calibration: Comprehensive metrics in one call
Calibration:
- TemperatureScaling: Post-hoc calibration via temperature optimization
on validation set to minimize NLL
Visualization:
- plot_reliability_diagram: Calibration quality visualization
- plot_calibration_comparison: Compare multiple models side-by-side
- plot_uncertainty_vs_error: Scatter of predicted vs actual uncertainty
- plot_confidence_histogram: Distribution of predicted uncertainties
- plot_z_score_histogram: Z-score distribution vs expected N(0,1)
- create_calibration_report: Comprehensive multi-plot report
Also includes:
- 43 comprehensive tests (all passing)
- Example script comparing MC Dropout, Ensemble, and Evidential calibration
- Updated roadmap marking 1.3 as complete
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>1 parent ea3f94a commit f619d63
File tree
6 files changed
+2036
-53
lines changed- docs
- examples
- molax/metrics
- tests
6 files changed
+2036
-53
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
141 | 141 | | |
142 | 142 | | |
143 | 143 | | |
144 | | - | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
145 | 147 | | |
146 | 148 | | |
147 | 149 | | |
| |||
151 | 153 | | |
152 | 154 | | |
153 | 155 | | |
154 | | - | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | | - | |
162 | | - | |
163 | | - | |
164 | | - | |
165 | | - | |
166 | | - | |
167 | | - | |
168 | | - | |
169 | | - | |
170 | | - | |
171 | | - | |
172 | | - | |
173 | | - | |
174 | | - | |
175 | | - | |
176 | | - | |
177 | | - | |
178 | | - | |
179 | | - | |
180 | | - | |
181 | | - | |
182 | | - | |
183 | | - | |
184 | | - | |
185 | | - | |
186 | | - | |
187 | | - | |
188 | | - | |
189 | | - | |
190 | | - | |
191 | | - | |
192 | | - | |
193 | | - | |
194 | | - | |
195 | | - | |
196 | | - | |
197 | | - | |
198 | | - | |
199 | | - | |
200 | | - | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
201 | 190 | | |
202 | 191 | | |
203 | 192 | | |
204 | | - | |
205 | | - | |
206 | | - | |
207 | | - | |
208 | | - | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
209 | 198 | | |
210 | 199 | | |
211 | 200 | | |
| |||
0 commit comments