Skip to content

Commit dc774e4

Browse files
authored
Merge pull request #175 from scikit-learn-contrib/add-classif-theoretical-description
Update theoretical description and fix bugs in classification tutorial
2 parents 8661f7e + b6f8770 commit dc774e4

File tree

11 files changed

+351
-190
lines changed

11 files changed

+351
-190
lines changed

doc/api.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ Regression
1212
:template: class.rst
1313

1414
regression.MapieRegressor
15+
quantile_regression.MapieQuantileRegressor
16+
time_series_regression.MapieTimeSeriesRegressor
1517

1618
Classification
1719
==============
285 KB
Loading

doc/theoretical_description_classification.rst

Lines changed: 186 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,196 @@
66
Theoretical Description
77
=======================
88

9-
TO BE CONTINUED
109

11-
[2] Mauricio Sadinle, Jing Lei, & Larry Wasserman.
10+
Three methods for multi-class uncertainty-quantification have been implemented in MAPIE so far :
11+
LABEL [1], Adaptive Prediction Sets [2, 3] and Top-K [3].
12+
The difference between these methods is the way the conformity scores are computed.
13+
The figure below illustrates the three methods implemented in MAPIE:
14+
15+
.. image:: images/classification_methods.png
16+
:width: 600
17+
:align: center
18+
19+
For a classification problem in a standard independent and identically distributed (i.i.d) case,
20+
our training data :math:`(X, Y) = \{(x_1, y_1), \ldots, (x_n, y_n)\}`` has an unknown distribution :math:`P_{X, Y}`.
21+
22+
For any risk level :math:`\alpha` between 0 and 1, the methods implemented in MAPIE allow the user construct a prediction
23+
set :math:`\hat{C}_{n, \alpha}(X_{n+1})` for a new observation :math:`\left( X_{n+1},Y_{n+1} \right)` with a guarantee
24+
on the marginal coverage such that :
25+
26+
.. math::
27+
P \{Y_{n+1} \in \hat{C}_{n, \alpha}(X_{n+1}) \} \geq 1 - \alpha
28+
29+
30+
In words, for a typical risk level $\alpha$ of $10 \%$, we want to construct prediction sets that contain the true observations
31+
for at least $90 \%$ of the new test data points.
32+
Note that the guarantee is possible only on the marginal coverage, and not on the conditional coverage
33+
:math:`P \{Y_{n+1} \in \hat{C}_{n, \alpha}(X_{n+1}) | X_{n+1} = x_{n+1} \}` which depends on the location of the new test point in the distribution.
34+
35+
1. LABEL
36+
--------
37+
38+
In the LABEL method, the conformity score is defined as as one minus the score of the true label. For each point :math:`i` of the calibration set :
39+
40+
.. math::
41+
s_i(X_i, Y_i) = 1 - \hat{\mu}(X_i)_{Y_i}
42+
43+
Once the conformity scores :math:`{s_1, ..., s_n}` are estimated for all calibration points, we compute the :math:`(n+1)*(1-\alpha)/n` quantile
44+
:math:`\hat{q}` as follows :
45+
46+
.. math::
47+
\hat{q} = Quantile \left(s_1, ..., s_n ; \frac{\lceil(n+1)(1-\alpha)\rceil}{n}\right) \\
48+
49+
50+
Finally, we construct a prediction set by including all labels with a score higher than the estimated quantile :
51+
52+
.. math::
53+
\hat{C}(X_{test}) = \{y : \hat{\mu}(X_{test})_y \geq 1 - \hat{q}\}
54+
55+
56+
This simple approach allows us to construct prediction sets which have a theoretical guarantee on the marginal coverage.
57+
However, although this method generally results in small prediction sets, it tends to produce empty ones when the model is uncertain,
58+
for example at the border between two classes.
59+
60+
61+
2. Adaptive Prediction Sets (APS)
62+
---------------------------------
63+
64+
The so-called Adaptive Prediction Set (APS) method overcomes the problem encountered by the LABEL method through the construction of
65+
prediction sets which are by definition non-empty.
66+
The conformity scores are computed by summing the ranked scores of each label, from the higher to the lower until reaching the true
67+
label of the observation :
68+
69+
.. math::
70+
s_i(X_i, Y_i) = \sum^k_{j=1} \hat{\mu}(X_i)_{\pi_j} \quad \text{where} \quad Y_i = \pi_j
71+
72+
73+
The quantile :math:`\hat{q}` is then computed the same way as the LABEL method.
74+
For the construction of the prediction sets for a new test point, the same procedure of ranked summing is applied until reaching the quantile,
75+
as described in the following equation :
76+
77+
78+
.. math::
79+
\hat{C}(X_{test}) = \{\pi_1, ..., \pi_k\} \quad \text{where} \quad k = \text{inf}\{k : \sum^k_{j=1} \hat{\mu}(X_{test})_{\pi_j} \geq \hat{q}\}
80+
81+
82+
By default, the label whose cumulative score is above the quantile is included in the prediction set.
83+
However, its incorporation can also be chosen randomly based on the difference between its cumulative score and the quantile so the effective
84+
coverage remains close to the target (marginal) coverage. We refer the reader to [2, 3] for more details about this aspect.
85+
86+
87+
3. Top-K
88+
--------
89+
90+
Introduced by [3], the specificity of the Top-K method is that it will give the same prediction set size for all observations.
91+
The conformity score is the rank of the true label, with scores ranked from higher to lower.
92+
The prediction sets are build by taking the :math:`\hat{q}^{th}` higher scores. The procedure is described in the following equations :
93+
94+
.. math::
95+
s_i(X_i, Y_i) = j \quad \text{where} \quad Y_i = \pi_j \quad \text{and} \quad \hat{\mu}(X_i)_{\pi_1} > ... > \hat{\mu}(X_i)_{\pi_j} > ... > \hat{\mu}(X_i)_{\pi_n}
96+
97+
98+
.. math::
99+
\hat{q} = \left \lceil Quantile \left(s_1, ..., s_n ; \frac{\lceil(n+1)(1-\alpha)\rceil}{n}\right) \right\rceil
100+
101+
102+
.. math::
103+
\hat{C}(X_{test}) = \{\pi_1, ..., \pi_{\hat{q}}\}
104+
105+
As with other methods, this procedure allows the user to build prediction sets with guarantees on the marginal coverage.
106+
107+
108+
4. Split- and cross-conformal methods
109+
-------------------------------------
110+
111+
It should be noted that MAPIE includes split- and cross-conformal strategies for the LABEL and APS methods,
112+
but only the split-conformal one for Top-K.
113+
The implementation of the cross-conformal method follows algorithm 2 of [2].
114+
In short, conformity scores are calculated for all training instances in a cross-validation fashion from their corresponding out-of-fold models.
115+
By analogy with the CV+ method for regression, estimating the prediction sets is performed in four main steps:
116+
117+
- We split the training set into *K* disjoint subsets :math:`S_1, S_2, ..., S_K` of equal size.
118+
119+
- *K* regression functions :math:`\hat{\mu}_{-S_k}` are fitted on the training set with the
120+
corresponding :math:`k^{th}` fold removed.
121+
122+
- The corresponding *out-of-fold* conformity score is computed for each :math:`i^{th}` point
123+
124+
- Compare the conformity scores of training instances with the scores of each label for each new test point in order to
125+
decide whether or not the label should be included in the prediction set.
126+
For the APS method, the prediction set is constructed as follows (see equation 11 of [3]) :
127+
128+
.. math::
129+
C_{n, \alpha}(X_{n+1}) =
130+
\Big\{ y \in \mathcal{Y} : \sum_{i=1}^n {\rm 1} \Big[ E(X_i, Y_i, U_i; \hat{\pi}^{k(i)}) < E(X_{n+1}, y, U_{n+1}; \hat{\pi}^{k(i)}) \Big] < (1-\alpha)(n+1) \Big\}
131+
132+
where :
133+
134+
- :math:`E(X_i, Y_i, U_i; \hat{\pi}^{k(i)})` is the conformity score of training instance :math:`i`
135+
136+
- :math:`E(X_{n+1}, y, U_{n+1}; \hat{\pi}^{k(i)})` is the conformity score of label :math:`y` from a new test point.
137+
138+
139+
140+
141+
.. The :class:`mapie.regression.MapieClassifier` class implements several conformal methods
142+
.. for estimating predictions sets, i.e. a set of possibilities that include the true label
143+
.. with a given confidence level.
144+
.. The full-conformal methods being computationally intractable, we will focus on the split-
145+
.. and cross-conformal methods.
146+
147+
.. Before describing the methods, let's briefly present the mathematical setting.
148+
.. For a classification problem in a standard independent and identically distributed
149+
.. (i.i.d) case, our training data :math:`(X, Y) = \{(x_1, y_1), \ldots, (x_n, y_n)\}`
150+
.. has an unknown distribution :math:`P_{X, Y}`.
151+
152+
.. Given some target quantile :math:`\alpha` or associated target coverage level :math:`1-\alpha`,
153+
.. we aim at constructing a set of possible labels :math:`\hat{T}_{n, \alpha} \in {1, ..., K}`
154+
.. for a new feature vector :math:`X_{n+1}` such that
155+
156+
.. .. math::
157+
.. P \{Y_{n+1} \in \hat{T}_{n, \alpha}(X_{n+1}) \} \geq 1 - \alpha
158+
159+
160+
.. 1. Split-conformal method
161+
.. -------------------------
162+
163+
.. - In order to estimate prediction sets, one needs to "calibrate" so-called conformity scores
164+
.. on a given calibration set. The alpha-quantile of these conformity scores is then estimated
165+
.. and compared with the conformity scores of new test points output by the base model to assess
166+
.. whether a label must be included in the prediction set
167+
168+
.. - The split-conformal methodology can be summarized in the scheme below :
169+
.. - The training set is first split into a training set and a calibration set
170+
.. - The training set is used for training the model
171+
.. - The calibration set is only used for getting distribution of conformity scores output by
172+
.. the model trained only on the training set.
173+
174+
175+
.. 2. The "score" method
176+
.. ---------------------
177+
178+
.. 3. The "cumulated score" method
179+
.. -------------------------------
180+
181+
.. 4. The cross-conformal method
182+
.. -----------------------------
183+
184+
185+
186+
.. TO BE CONTINUED
187+
188+
5. References
189+
-------------
190+
191+
[1] Mauricio Sadinle, Jing Lei, & Larry Wasserman.
12192
"Least Ambiguous Set-Valued Classifiers With Bounded Error Levels."
13193
Journal of the American Statistical Association, 114:525, 223-234, 2019.
14194

15-
[3] Yaniv Romano, Matteo Sesia and Emmanuel J. Candès.
16-
"Classification with Valid and Adaptive Coverage." NeurIPS 202 (spotlight).
195+
[2] Yaniv Romano, Matteo Sesia and Emmanuel J. Candès.
196+
"Classification with Valid and Adaptive Coverage."
197+
NeurIPS 202 (spotlight), 2020.
17198

18-
[4] Anastasios Nikolas Angelopoulos, Stephen Bates, Michael Jordan and Jitendra Malik.
199+
[3] Anastasios Nikolas Angelopoulos, Stephen Bates, Michael Jordan and Jitendra Malik.
19200
"Uncertainty Sets for Image Classifiers using Conformal Prediction."
20201
International Conference on Learning Representations 2021.

doc/tutorial_classification.rst

Lines changed: 27 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,7 @@ Tutorial for classification
22
===========================
33

44
In this tutorial, we compare the prediction sets estimated by the
5-
conformal methods implemented in
6-
:class:``mapie.regression.MapieClassifier`` on a toy two-dimensional
7-
dataset.
5+
conformal methods implemented in MAPIE on a toy two-dimensional dataset.
86

97
Throughout this tutorial, we will answer the following questions:
108

@@ -13,31 +11,31 @@ Throughout this tutorial, we will answer the following questions:
1311

1412
- Is the chosen conformal method well calibrated ?
1513

16-
- What are the pros and cons of the conformal methods included in
17-
:class:``mapie.regression.MapieClassifier`` ?
14+
- What are the pros and cons of the conformal methods included in MAPIE
15+
?
1816

1917
1. Conformal Prediction method using the softmax score of the true label
2018
------------------------------------------------------------------------
2119

2220
We will use MAPIE to estimate a prediction set of several classes such
2321
that the probability that the true label of a new test point is included
2422
in the prediction set is always higher than the target confidence level
25-
: :math:``P(Y \in C) \geq 1 - \alpha``. We start by using the softmax
23+
: :math:`P(Y \in C) \geq 1 - \alpha`. We start by using the softmax
2624
score output by the base classifier as the conformity score on a toy
2725
two-dimensional dataset. We estimate the prediction sets as follows :
2826

2927
- First we generate a dataset with train, calibration and test, the
3028
model is fitted on the training set.
31-
- We set the conformal score :math:``S_i = \hat{f}(X_{i})_{y_i}`` the
29+
- We set the conformal score :math:`S_i = \hat{f}(X_{i})_{y_i}` the
3230
softmax output of the true class for each sample in the calibration
3331
set.
34-
- Then we define :math:``\hat{q}`` as being the
35-
:math:``(n + 1) (\alpha) / n`` previous quantile of
36-
:math:``S_{1}, ..., S_{n}`` (this is essentially the quantile
37-
:math:``\alpha``, but with a small sample correction).
38-
- Finally, for a new test data point (where :math:``X_{n + 1}`` is
39-
known but :math:``Y_{n + 1}`` is not), create a prediction set
40-
:math:``C(X_{n+1}) = \{y: \hat{f}(X_{n+1})_{y} > \hat{q}\}`` which
32+
- Then we define :math:`\hat{q}` as being the
33+
:math:`(n + 1) (\alpha) / n` previous quantile of
34+
:math:`S_{1}, ..., S_{n}` (this is essentially the quantile
35+
:math:`\alpha`, but with a small sample correction).
36+
- Finally, for a new test data point (where :math:`X_{n + 1}` is known
37+
but :math:`Y_{n + 1}` is not), create a prediction set
38+
:math:`C(X_{n+1}) = \{y: \hat{f}(X_{n+1})_{y} > \hat{q}\}` which
4139
includes all the classes with a sufficiently high softmax output.
4240

4341
We use a two-dimensional toy dataset with three labels. The distribution
@@ -93,11 +91,10 @@ Let’s see our training data.
9391

9492

9593
We fit our training data with a Gaussian Naive Base estimator. And then
96-
we apply :class:``mapie.classification.MapieClassifier`` in the
97-
calibration data with the method ``score`` to the estimator indicating
98-
that it has already been fitted with ``cv="prefit"``. We then estimate
99-
the prediction sets with differents alpha values with a ``fit`` and
100-
``predict`` process.
94+
we apply MAPIE in the calibration data with the method `score` to the
95+
estimator indicating that it has already been fitted with
96+
`cv="prefit"`. We then estimate the prediction sets with differents
97+
alpha values with a `fit` and `predict` process.
10198

10299
.. code-block:: python
103100
@@ -113,9 +110,9 @@ the prediction sets with differents alpha values with a ``fit`` and
113110
alpha = [0.2, 0.1, 0.05]
114111
y_pred_score, y_ps_score = mapie_score.predict(X_test_mesh, alpha=alpha)
115112
116-
- ``y_pred_score``: represents the prediction in the test set by the
113+
- `y_pred_score`: represents the prediction in the test set by the
117114
base estimator.
118-
- ``y_ps_score``: the prediction sets estimated by MAPIE with the
115+
- `y_ps_score`: the prediction sets estimated by MAPIE with the
119116
“score” method.
120117

121118
.. code-block:: python
@@ -212,10 +209,9 @@ prediction sets highlighting the uncertain behaviour of the base
212209
classifier.
213210

214211
Let’s now study the effective coverage and the mean prediction set
215-
widths as function of the :math:``1-\alpha`` target coverage. To this
216-
aim, we use once again the ``.predict()`` method of
217-
:class:``mapie.regression.MapieClassifier`` to estimate predictions sets
218-
on a large number of :math:``\alpha`` values.
212+
widths as function of the :math:`1-\alpha` target coverage. To this aim,
213+
we use once again the `.predict()` method of MAPIE to estimate
214+
predictions sets on a large number of :math:`\alpha` values.
219215

220216
.. code-block:: python
221217
@@ -260,13 +256,12 @@ on a large number of :math:``\alpha`` values.
260256
We saw in the previous section that the “score” method is well
261257
calibrated by providing accurate coverage levels. However, it tends to
262258
give null prediction sets for uncertain regions, especially when the
263-
:math:``\alpha`` value is high.
264-
:class:``mapie.classification.MapieClassifier`` includes another method,
265-
called Adaptive Prediction Set (APS), whose conformity score is the
266-
cumulated score of the softmax output until the true label is reached
267-
(see the theoretical description for more details). We will see in this
268-
Section that this method no longer estimates null prediction sets but by
269-
giving slightly bigger prediction sets.
259+
:math:`\alpha` value is high. MAPIE includes another method, called
260+
Adaptive Prediction Set (APS), whose conformity score is the cumulated
261+
score of the softmax output until the true label is reached (see the
262+
theoretical description for more details). We will see in this Section
263+
that this method no longer estimates null prediction sets but by giving
264+
slightly bigger prediction sets.
270265

271266
Let’s visualize the prediction sets obtained with the APS method on the
272267
test set after fitting MAPIE on the calibration set.
-407 Bytes
Loading

0 commit comments

Comments
 (0)