You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/quick_start.rst
+7-1Lines changed: 7 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,4 +40,10 @@ Here, we generate one-dimensional noisy data that we fit with a MLPRegressor: `U
40
40
3. Classification
41
41
=======================
42
42
43
-
Similarly, it's possible to do the same for a basic classification problem: `Use MAPIE to plot prediction sets <https://mapie.readthedocs.io/en/stable/examples_classification/1-quickstart/plot_quickstart_classification.html>`_
43
+
Similarly, it's possible to do the same for a basic classification problem: `Use MAPIE to plot prediction sets <https://mapie.readthedocs.io/en/stable/examples_classification/1-quickstart/plot_quickstart_classification.html>`_
44
+
45
+
46
+
4. Risk Control
47
+
=======================
48
+
49
+
MAPIE implements risk control methods for multilabel classification (in particular, image segmentation) and binary classification: `Use MAPIE to control risk for a binary classifier <https://mapie.readthedocs.io/en/stable/examples_risk_control/1-quickstart/plot_risk_control_binary_classification.html>`_
Copy file name to clipboardExpand all lines: doc/theoretical_description_risk_control.rst
+32-13Lines changed: 32 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,26 +13,43 @@ Getting started with risk control in MAPIE
13
13
Overview
14
14
========
15
15
16
+
This section provides an overview of risk control in MAPIE. For those unfamiliar with the concept of risk control, the next section provides an introduction to the topic.
17
+
16
18
Three methods of risk control have been implemented in MAPIE so far :
17
19
**Risk-Controlling Prediction Sets** (RCPS) [1], **Conformal Risk Control** (CRC) [2] and **Learn Then Test** (LTT) [3].
18
-
The difference between these methods is the way the conformity scores are computed.
19
20
20
-
As of now, MAPIE supports risk control for two machine learning tasks: **binary classification**, as well as **multi-label classification** (including applications like image segmentation).
21
+
As of now, MAPIE supports risk control for two machine learning tasks: **binary classification**, as well as **multi-label classification** (in particular applications like image segmentation).
21
22
The table below details the available methods for each task:
22
23
24
+
.. |br| raw:: html
25
+
26
+
<br />
27
+
23
28
.. list-table:: Available risk control methods in MAPIE for each ML task
24
29
:header-rows: 1
25
30
26
-
* - Risk control method
27
-
- Binary classification
28
-
- Multi-label classification (image segmentation)
31
+
* - Risk control |br| method
32
+
- Type of |br| control
33
+
- Assumption |br| on the data
34
+
- Non-monotonic |br| risks
35
+
- Binary |br| classification
36
+
- Multi-label |br| classification
29
37
* - RCPS
38
+
- Probability
39
+
- i.i.d.
40
+
- ❌
30
41
- ❌
31
42
- ✅
32
43
* - CRC
44
+
- Expectation
45
+
- Exchangeable
46
+
- ❌
33
47
- ❌
34
48
- ✅
35
49
* - LTT
50
+
- Probability
51
+
- i.i.d
52
+
- ✅
36
53
- ✅
37
54
- ✅
38
55
@@ -41,7 +58,7 @@ In MAPIE for multi-label classification, CRC and RCPS are used for recall contro
41
58
1. What is risk control?
42
59
========================
43
60
44
-
Before diving into risk control, let's take the simple example of a binary classification model, which separates the incoming data into the two classes thanks to its threshold: predictions above it are classified as 1, and those below as 0. Suppose we want to find a threshold that guarantees that our model achieves a certain level of precision. A naive, yet straightforward approach to do this is to evaluate how precision varies with different threshold values on a validation dataset. By plotting this relationship (see plot below), we can identify the range of thresholds that meet our desired precision requirement (green zone on the graph).
61
+
Before diving into risk control, let's take the simple example of a binary classification model, which separates the incoming data into two classes. Predicted probabilities above a given threshold (e.g., 0.5) correspond to predicting the "positive" class and probabilities below correspond to the "negative" class. Suppose we want to find a threshold that guarantees that our model achieves a certain level of precision. A naive, yet straightforward approach to do this is to evaluate how precision varies with different threshold values on a validation dataset. By plotting this relationship (see plot below), we can identify the range of thresholds that meet our desired precision requirement (green zone on the graph).
@@ -54,7 +71,7 @@ So far, so good. But here is the catch: while the chosen threshold effectively k
54
71
Risk control is the science of adjusting a model's parameter, typically denoted :math:`\lambda`, so that a given risk stays below a desired level with high probability on unseen data.
55
72
Note that here, the term *risk* is used to describe an undesirable outcome of the model (e.g., type I error): therefore, it is a value we want to minimize, and in our case, keep under a certain level. Also note that risk control can easily be applied to metrics we want to maximize (e.g., precision), simply by controlling the complement (e.g., 1-precision).
56
73
57
-
The strength of risk control lies in the statistical guarantees it provides on unseen data. Unlike the naive method presented earlier, it determines a value of :math:`\lambda` that ensures the risk is controlled *beyond* the training data.
74
+
The strength of risk control lies in the statistical guarantees it provides on unseen data. Unlike the naive method presented earlier, it determines a value of :math:`\lambda` that ensures the risk is controlled *beyond* the validation data.
58
75
59
76
Applying risk control to the previous example would allow us to get a new — albeit narrower — range of thresholds (blue zone on the graph) that are **statistically guaranteed**.
60
77
@@ -66,7 +83,7 @@ This guarantee is critical in a wide range of use cases (especially in high-stak
66
83
67
84
—
68
85
69
-
To express risk control in mathematical terms, we denote by R the risk we want to control, and introduce the following two parameters:
86
+
To express risk control in mathematical terms, we denote by :math:`R` the risk we want to control, and introduce the following two parameters:
70
87
71
88
- :math:`\alpha`: the target level below which we want the risk to remain, as shown in the figure below;
72
89
@@ -76,13 +93,13 @@ To express risk control in mathematical terms, we denote by R the risk we want t
76
93
77
94
- :math:`\delta`: the confidence level associated with the risk control.
78
95
79
-
In other words, the risk is said to be controlled if :math:`R \leq\alpha` with probability at least :math:`1 - \delta`.
96
+
In other words, the risk is said to be controlled if :math:`R \leq\alpha` with probability at least :math:`1 - \delta`, where the probability is over the randomness in the sampling of the dataset.
80
97
81
98
The three risk control methods implemented in MAPIE — RCPS, CRC and LTT — rely on different assumptions, and offer slightly different guarantees:
82
99
83
100
- **CRC** requires the data to be **exchangeable**, and gives a guarantee on the **expectation of the risk**: :math:`\mathbb{E}(R) \leq\alpha`;
84
101
85
-
- **RCPS** and **LTT** both impose stricter assumptions, requiring the data to be **independent and identically distributed** (i.i.d.), which implies exchangeability. The guarantee they provide is on the **probability that the risk does not exceed :math:`\alpha`**: :math:`\mathbb{P}(R \leq\alpha) \geq1 - \delta`.
102
+
- **RCPS** and **LTT** both impose stricter assumptions, requiring the data to be **independent and identically distributed** (i.i.d.), which implies exchangeability. The guarantee they provide is on the **probability that the risk does not exceed**:math:`\boldsymbol{\alpha}`: :math:`\mathbb{P}(R \leq\alpha) \geq1 - \delta`.
86
103
87
104
.. image:: images/risk_distribution.png
88
105
:width:600
@@ -94,12 +111,13 @@ The plot above gives a visual representation of the difference between the two t
94
111
95
112
- The risk is controlled in probability (RCPS/LTT) if at least :math:`1 - \delta` percent of its distribution over unseen data is below :math:`\alpha`.
96
113
97
-
Note that at the opposite of the other two methods, LTT allows to control any non-monotonic risk.
114
+
Note that contrary to the other two methods, LTT allows to control any non-monotonic risk.
98
115
99
116
The following section provides a detailed overview of each method.
100
117
101
118
2. Theoretical description
102
119
==========================
120
+
103
121
2.1 Risk-Controlling Prediction Sets
104
122
------------------------------------
105
123
2.1.1 General settings
@@ -234,7 +252,7 @@ We are going to present the Learn Then Test framework that allows the user to co
234
252
This method has been introduced in article [3].
235
253
The settings here are the same as RCPS and CRC, we just need to introduce some new parameters:
236
254
237
-
- Let :math:`\Lambda` be a discretized for our :math:`\lambda`, meaning that :math:`\Lambda = \{\lambda_1, ..., \lambda_n\}`.
255
+
- Let :math:`\Lambda` be a discretized set for our :math:`\lambda`, meaning that :math:`\Lambda = \{\lambda_1, ..., \lambda_n\}`.
238
256
239
257
- Let :math:`p_\lambda` be a valid p-value for the null hypothesis :math:`\mathbb{H}_j: R(\lambda_j)>\alpha`.
240
258
@@ -250,7 +268,7 @@ In order to find all the parameters :math:`\lambda` that satisfy the above condi
250
268
:math:`\{(x_1, y_1), \dots, (x_n, y_n)\}`.
251
269
252
270
- For each :math:`\lambda_j` in a discrete set :math:`\Lambda = \{\lambda_1, \lambda_2,\dots, \lambda_n\}`, we associate the null hypothesis
253
-
:math:`\mathcal{H}_j: R(\lambda_j) > \alpha`, as rejecting the hypothesis corresponds to selecting :math:`\lambda_j` as a point where risk the risk
271
+
:math:`\mathcal{H}_j: R(\lambda_j) > \alpha`, as rejecting the hypothesis corresponds to selecting :math:`\lambda_j` as a point where the risk
254
272
is controlled.
255
273
256
274
- For each null hypothesis, we compute a valid p-value using a concentration inequality :math:`p_{\lambda_j}`. Here we choose to compute the Hoeffding-Bentkus p-value
@@ -259,6 +277,7 @@ In order to find all the parameters :math:`\lambda` that satisfy the above condi
259
277
- Return :math:`\hat{\Lambda} = \mathcal{A}(\{p_j\}_{j\in\{1,\dots,\lvert\Lambda\rvert})`, where :math:`\mathcal{A}`, is an algorithm
260
278
that controls the family-wise error rate (FWER), for example, Bonferonni correction.
261
279
280
+
Note that a notebook testing theoretical guarantees of risk control in binary classification using a random classifier and synthetic data is available here: `theoretical_validity_tests.ipynb <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/risk_control/theoretical_validity_tests.ipynb>`__.
0 commit comments