Skip to content

Commit 4e5b96d

Browse files
author
Thibault Cordier
committed
Merge branch 'master' into 452-coverage-validity
2 parents cda1250 + 1110092 commit 4e5b96d

19 files changed

+432
-226
lines changed

HISTORY.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ History
1010
* Reduce precision for test in `MapieCalibrator`.
1111
* Fix invalid certificate when downloading data.
1212
* Add citations utility to the documentation.
13+
* Add documentation for metrics.
1314
* Add explanation and example for symmetry argument in CQR.
1415

1516
0.8.3 (2024-03-01)

README.rst

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -172,23 +172,28 @@ and with the financial support from Région Ile de France and Confiance.ai.
172172
|Quantmetry| |Michelin| |ENS| |Confiance.ai| |IledeFrance|
173173

174174
.. |Quantmetry| image:: https://www.quantmetry.com/wp-content/uploads/2020/08/08-Logo-quant-Texte-noir.svg
175-
:height: 35
175+
:height: 35px
176+
:width: 140px
176177
:target: https://www.quantmetry.com/
177178

178179
.. |Michelin| image:: https://agngnconpm.cloudimg.io/v7/https://dgaddcosprod.blob.core.windows.net/corporate-production/attachments/cls05tqdd9e0o0tkdghwi9m7n-clooe1x0c3k3x0tlu4cxi6dpn-bibendum-salut.full.png
179-
:height: 35
180+
:height: 50px
181+
:width: 45px
180182
:target: https://www.michelin.com/en/
181183

182184
.. |ENS| image:: https://file.diplomeo-static.com/file/00/00/01/34/13434.svg
183-
:height: 35
185+
:height: 35px
186+
:width: 140px
184187
:target: https://ens-paris-saclay.fr/en
185188

186189
.. |Confiance.ai| image:: https://pbs.twimg.com/profile_images/1443838558549258264/EvWlv1Vq_400x400.jpg
187-
:height: 35
190+
:height: 45px
191+
:width: 45px
188192
:target: https://www.confiance.ai/
189193

190194
.. |IledeFrance| image:: https://www.iledefrance.fr/sites/default/files/logo/2024-02/logoGagnerok.svg
191-
:height: 35
195+
:height: 35px
196+
:width: 140px
192197
:target: https://www.iledefrance.fr/
193198

194199

doc/index.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,13 @@
5858
examples_calibration/index
5959
notebooks_calibration
6060

61+
.. toctree::
62+
:maxdepth: 2
63+
:hidden:
64+
:caption: METRICS
65+
66+
theoretical_description_metrics
67+
6168
.. toctree::
6269
:maxdepth: 2
6370
:hidden:

doc/notebooks_classification.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@ problems for computer vision settings that are too heavy to be included in the e
66
galleries.
77

88

9-
1. Estimating prediction sets on the Cifar10 dataset : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/Cifar10.ipynb>`_
10-
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
9+
1. Estimating prediction sets on the Cifar10 dataset : `cifar_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/Cifar10.ipynb>`_
10+
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1111

12-
2. Top-label calibration for outputs of ML models : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/top_label_calibration.ipynb>`_
13-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
12+
2. Top-label calibration for outputs of ML models : `top_label_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/top_label_calibration.ipynb>`_
13+
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

doc/notebooks_multilabel_classification.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ The following examples present advanced analyses
55
on multi-label classification problems with different
66
methods proposed in MAPIE.
77

8-
1. Overview of Recall Control for Multi-Label Classification : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/tutorial_multilabel_classification_recall.ipynb>`_
9-
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
8+
1. Overview of Recall Control for Multi-Label Classification : `recall_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/tutorial_multilabel_classification_recall.ipynb>`_
9+
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1010

11-
2. Overview of Precision Control for Multi-Label Classification : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/tutorial_multilabel_classification_precision.ipynb>`_
12-
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
11+
2. Overview of Precision Control for Multi-Label Classification : `precision_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/tutorial_multilabel_classification_precision.ipynb>`_
12+
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

doc/notebooks_regression.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,11 @@ This section lists a series of Jupyter notebooks hosted on the MAPIE Github repo
88
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
99

1010

11-
2. Estimating the uncertainties in the exoplanet masses : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/regression/exoplanets.ipynb>`_
12-
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
11+
2. Estimating the uncertainties in the exoplanet masses : `exoplanet_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/regression/exoplanets.ipynb>`_
12+
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1313

1414

15-
3. Estimating prediction intervals for time series forecast with EnbPI and ACI : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/regression/ts-changepoint.ipynb>`_
16-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
15+
3. Estimating prediction intervals for time series forecast with EnbPI and ACI : `ts_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/regression/ts-changepoint.ipynb>`_
16+
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1717

1818

doc/quick_start.rst

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,9 @@ In regression settings, **MAPIE** provides prediction intervals on single-output
77
In classification settings, **MAPIE** provides prediction sets on multi-class data.
88
In any case, **MAPIE** is compatible with any scikit-learn-compatible estimator.
99

10-
Estimate your prediction intervals
11-
==================================
1210

1311
1. Download and install the module
14-
----------------------------------
12+
==================================
1513

1614
Install via ``pip``:
1715

@@ -33,7 +31,7 @@ To install directly from the github repository :
3331
3432
3533
2. Run MapieRegressor
36-
---------------------
34+
=====================
3735

3836
Let us start with a basic regression problem.
3937
Here, we generate one-dimensional noisy data that we fit with a linear model.
@@ -114,8 +112,8 @@ It is given by the alpha parameter defined in ``MapieRegressor``, here equal to
114112
thus giving target coverages of ``0.95`` and ``0.68``.
115113
The effective coverage is the actual fraction of true labels lying in the prediction intervals.
116114

117-
2. Run MapieClassifier
118-
----------------------
115+
3. Run MapieClassifier
116+
=======================
119117

120118
Similarly, it's possible to do the same for a basic classification problem.
121119

doc/theoretical_description_binary_classification.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
.. title:: Theoretical Description : contents
1+
.. title:: Theoretical Description Binary Classification : contents
22

33
.. _theoretical_description_binay_classification:
44

5-
=======================
5+
#######################
66
Theoretical Description
7-
=======================
7+
#######################
88

99
There are mainly three different ways to handle uncertainty quantification in binary classification:
1010
calibration (see :doc:`theoretical_description_calibration`), confidence interval (CI) for the probability
@@ -83,8 +83,8 @@ for the labels of test objects which are guaranteed to be well-calibrated under
8383
that the observations are generated independently from the same distribution [2].
8484

8585

86-
4. References
87-
-------------
86+
References
87+
----------
8888

8989
[1] Gupta, Chirag, Aleksandr Podkopaev, and Aaditya Ramdas.
9090
"Distribution-free binary classification: prediction sets, confidence intervals, and calibration."

doc/theoretical_description_calibration.rst

Lines changed: 7 additions & 110 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,9 @@
22

33
.. _theoretical_description_calibration:
44

5-
=======================
5+
#######################
66
Theoretical Description
7-
=======================
8-
7+
#######################
98

109
One method for multi-class calibration has been implemented in MAPIE so far :
1110
Top-Label Calibration [1].
@@ -34,8 +33,8 @@ To apply calibration directly to a multi-class context, Gupta et al. propose a f
3433
a multi-class calibration to multiple binary calibrations (M2B).
3534

3635

37-
1. Top-Label
38-
------------
36+
Top-Label
37+
---------
3938

4039
Top-Label calibration is a calibration technique introduced by Gupta et al. to calibrate the model according to the highest score and
4140
the corresponding class (see [1] Section 2). This framework offers to apply binary calibration techniques to multi-class calibration.
@@ -50,109 +49,8 @@ according to Top-Label calibration if:
5049
Pr(Y = c(X) \mid h(X), c(X)) = h(X)
5150
5251
53-
2. Metrics for calibration
54-
--------------------------
55-
56-
**Expected calibration error**
57-
58-
The main metric to check if the calibration is correct is the Expected Calibration Error (ECE). It is based on two
59-
components, accuracy and confidence per bin. The number of bins is a hyperparamater :math:`M`, and we refer to a specific bin by
60-
:math:`B_m`.
61-
62-
.. math::
63-
\text{acc}(B_m) &= \frac{1}{\left| B_m \right|} \sum_{i \in B_m} {y}_i \\
64-
\text{conf}(B_m) &= \frac{1}{\left| B_m \right|} \sum_{i \in B_m} \hat{f}(x)_i
65-
66-
67-
The ECE is the combination of these two metrics combined.
68-
69-
.. math::
70-
\text{ECE} = \sum_{m=1}^M \frac{\left| B_m \right|}{n} \left| acc(B_m) - conf(B_m) \right|
71-
72-
In simple terms, once all the different bins from the confidence scores have been created, we check the mean accuracy of each bin.
73-
The absolute mean difference between the two is the ECE. Hence, the lower the ECE, the better the calibration was performed.
74-
75-
**Top-Label ECE**
76-
77-
In the top-label calibration, we only calculate the ECE for the top-label class. Hence, per top-label class, we condition the calculation
78-
of the accuracy and confidence based on the top label and take the average ECE for each top-label.
79-
80-
3. Statistical tests for calibration
81-
------------------------------------
82-
83-
**Kolmogorov-Smirnov test**
84-
85-
Kolmogorov-Smirnov test was derived in [2, 3, 4]. The idea is to consider the cumulative differences between sorted scores :math:`s_i`
86-
and their corresponding labels :math:`y_i` and to compare its properties to that of a standard Brownian motion. Let us consider the
87-
cumulative differences on sorted scores:
88-
89-
.. math::
90-
C_k = \frac{1}{N}\sum_{i=1}^k (s_i - y_i)
91-
92-
We also introduce a typical normalization scale :math:`\sigma`:
93-
94-
.. math::
95-
\sigma = \frac{1}{N}\sqrt{\sum_{i=1}^N s_i(1 - s_i)}
96-
97-
The Kolmogorov-Smirnov statistic is then defined as :
98-
99-
.. math::
100-
G = \max|C_k|/\sigma
101-
102-
It can be shown [2] that, under the null hypothesis of well-calibrated scores, this quantity asymptotically (i.e. when N goes to infinity)
103-
converges to the maximum absolute value of a standard Brownian motion over the unit interval :math:`[0, 1]`. [3, 4] also provide closed-form
104-
formulas for the cumulative distribution function (CDF) of the maximum absolute value of such a standard Brownian motion.
105-
So we state the p-value associated to the statistical test of well calibration as:
106-
107-
.. math::
108-
p = 1 - CDF(G)
109-
110-
**Kuiper test**
111-
112-
Kuiper test was derived in [2, 3, 4] and is very similar to Kolmogorov-Smirnov. This time, the statistic is defined as:
113-
114-
.. math::
115-
H = (\max_k|C_k| - \min_k|C_k|)/\sigma
116-
117-
It can be shown [2] that, under the null hypothesis of well-calibrated scores, this quantity asymptotically (i.e. when N goes to infinity)
118-
converges to the range of a standard Brownian motion over the unit interval :math:`[0, 1]`. [3, 4] also provide closed-form
119-
formulas for the cumulative distribution function (CDF) of the range of such a standard Brownian motion.
120-
So we state the p-value associated to the statistical test of well calibration as:
121-
122-
.. math::
123-
p = 1 - CDF(H)
124-
125-
**Spiegelhalter test**
126-
127-
Spiegelhalter test was derived in [6]. It is based on a decomposition of the Brier score:
128-
129-
.. math::
130-
B = \frac{1}{N}\sum_{i=1}^N(y_i - s_i)^2
131-
132-
where scores are denoted :math:`s_i` and their corresponding labels :math:`y_i`. This can be decomposed in two terms:
133-
134-
.. math::
135-
B = \frac{1}{N}\sum_{i=1}^N(y_i - s_i)(1 - 2s_i) + \frac{1}{N}\sum_{i=1}^N s_i(1 - s_i)
136-
137-
It can be shown that the first term has an expected value of zero under the null hypothesis of well calibration. So we interpret
138-
the second term as the Brier score expected value :math:`E(B)` under the null hypothesis. As for the variance of the Brier score, it can be
139-
computed as:
140-
141-
.. math::
142-
Var(B) = \frac{1}{N^2}\sum_{i=1}^N(1 - 2s_i)^2 s_i(1 - s_i)
143-
144-
So we can build a Z-score as follows:
145-
146-
.. math::
147-
Z = \frac{B - E(B)}{\sqrt{Var(B)}} = \frac{\sum_{i=1}^N(y_i - s_i)(1 - 2s_i)}{\sqrt{\sum_{i=1}^N(1 - 2s_i)^2 s_i(1 - s_i)}}
148-
149-
This statistic follows a normal distribution of cumulative distribution CDF so that we state the associated p-value:
150-
151-
.. math::
152-
p = 1 - CDF(Z)
153-
154-
3. References
155-
-------------
52+
References
53+
----------
15654

15755
[1] Gupta, Chirag, and Aaditya K. Ramdas.
15856
"Top-label calibration and multiclass-to-binary reductions."
@@ -171,8 +69,7 @@ arXiv preprint arXiv:2202.00100.
17169

17270
[4] D. A. Darling. A. J. F. Siegert.
17371
The First Passage Problem for a Continuous Markov Process.
174-
Ann. Math. Statist. 24 (4) 624 - 639, December,
175-
1953.
72+
Ann. Math. Statist. 24 (4) 624 - 639, December, 1953.
17673

17774
[5] William Feller.
17875
The Asymptotic Distribution of the Range of Sums of

doc/theoretical_description_classification.rst

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,10 @@
1-
.. title:: Theoretical Description : contents
1+
.. title:: Theoretical Description Classification : contents
22

33
.. _theoretical_description_classification:
44

5-
=======================
5+
#######################
66
Theoretical Description
7-
=======================
8-
7+
#######################
98

109
Three methods for multi-class uncertainty quantification have been implemented in MAPIE so far :
1110
LAC (that stands for Least Ambiguous set-valued Classifier) [1], Adaptive Prediction Sets [2, 3] and Top-K [3].
@@ -141,8 +140,10 @@ Despite the RAPS method having a relatively small set size, its coverage tends t
141140
of the last label in the prediction set. This randomization is done as follows:
142141

143142
- First : define the :math:`V` parameter:
143+
144144
.. math::
145145
V_i = (s_i(X_i, Y_i) - \hat{q}_{1-\alpha}) / \left(\hat{\mu}(X_i)_{\pi_k} + \lambda \mathbb{1} (k > k_{reg})\right)
146+
146147
- Compare each :math:`V_i` to :math:`U \sim` Unif(0, 1)
147148
- If :math:`V_i \leq U`, the last included label is removed, else we keep the prediction set as it is.
148149

@@ -227,8 +228,8 @@ where :
227228
228229
.. TO BE CONTINUED
229230
230-
5. References
231-
-------------
231+
References
232+
----------
232233

233234
[1] Mauricio Sadinle, Jing Lei, & Larry Wasserman.
234235
"Least Ambiguous Set-Valued Classifiers With Bounded Error Levels."

0 commit comments

Comments
 (0)