Skip to content

Commit c7ae0cd

Browse files
ENH - First iteration of the new binary risk control feature (#749)
See PR description for detailed content.
1 parent 9093a34 commit c7ae0cd

23 files changed

+1735
-109
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ doc/_build/
1313
doc/examples_classification/
1414
doc/examples_regression/
1515
doc/examples_calibration/
16-
doc/examples_multilabel_classification/
16+
doc/examples_risk_control/
1717
doc/examples_mondrian/
1818
doc/auto_examples/
1919
doc/modules/generated/

AUTHORS.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,17 @@ Credits
55
Development Lead
66
----------------
77

8-
* Thibault Cordier <[email protected]>
98
* Vincent Blot <[email protected]>
10-
* Louis Lacombe <[email protected]>
119
* Valentin Laurent <[email protected]>
12-
* Hussein Jawad <[email protected]>
10+
* Adrien Le Coz <[email protected]>
1311

1412
Emeritus Core Developers
1513
------------------------
1614

1715
* Grégoire Martinon <[email protected]>
1816
* Vianney Taquet <[email protected]>
17+
* Thibault Cordier <[email protected]>
18+
* Louis Lacombe <[email protected]>
1919

2020
Contributors
2121
------------
@@ -43,6 +43,7 @@ Contributors
4343
* Ambros Marzetta <ambrosm>
4444
* Carl McBride Ellis <Carl-McBride-Ellis>
4545
* Baptiste Calot <[email protected]>
46+
* Hussein Jawad <[email protected]>
4647
* Leonardo Garma <[email protected]>
4748
* Mohammed Jawhar <[email protected]>
4849
* Syed Affan <[email protected]>

HISTORY.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ History
1616
* MAPIE now supports Python versions up to the latest release (currently 3.13)
1717
* Change `prefit` default value to `True` in split methods' docstrings to remain consistent with the implementation
1818
* Fix issue 699 to replace `TimeSeriesRegressor.partial_fit` with `TimeSeriesRegressor.update`
19+
* Revert incorrect renaming of calibration to conformalization in risk_control.py
1920

2021
1.0.1 (2025-05-22)
2122
------------------

README.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ MAPIE relies notably on the fields of Conformal Prediction and Distribution-Free
6060

6161
MAPIE runs on:
6262

63-
- Python >=3.9, <3.12
63+
- Python >=3.9
6464
- NumPy >=1.23
6565
- scikit-learn >=1.4
6666

doc/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ clean:
5050
-rm -rf $(BUILDDIR)/*
5151
-rm -rf examples_regression/
5252
-rm -rf examples_classification/
53-
-rm -rf examples_multilabel_classification/
53+
-rm -rf examples_risk_control/
5454
-rm -rf examples_calibration/
5555
-rm -rf examples_mondrian/
5656
-rm -rf generated/*

doc/api.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,8 @@ Risk Control
102102
:template: class.rst
103103

104104
mapie.risk_control.PrecisionRecallController
105+
mapie.risk_control.BinaryClassificationController
106+
mapie.risk_control.BinaryClassificationRisk
105107

106108
Calibration
107109
===========

doc/conf.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -321,14 +321,14 @@
321321
"examples_dirs": [
322322
"../examples/regression",
323323
"../examples/classification",
324-
"../examples/multilabel_classification",
324+
"../examples/risk_control",
325325
"../examples/calibration",
326326
"../examples/mondrian",
327327
],
328328
"gallery_dirs": [
329329
"examples_regression",
330330
"examples_classification",
331-
"examples_multilabel_classification",
331+
"examples_risk_control",
332332
"examples_calibration",
333333
"examples_mondrian",
334334
],

doc/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,8 @@
2424
:caption: Control prediction errors
2525

2626
theoretical_description_risk_control
27-
examples_multilabel_classification/1-quickstart/plot_tutorial_risk_control
27+
examples_risk_control/1-quickstart/plot_risk_control_binary_classification
28+
examples_risk_control/index
2829
external_risk_control_package
2930

3031
.. toctree::

doc/quick_start.rst

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,4 +40,10 @@ Here, we generate one-dimensional noisy data that we fit with a MLPRegressor: `U
4040
3. Classification
4141
=======================
4242

43-
Similarly, it's possible to do the same for a basic classification problem: `Use MAPIE to plot prediction sets <https://mapie.readthedocs.io/en/stable/examples_classification/1-quickstart/plot_quickstart_classification.html>`_
43+
Similarly, it's possible to do the same for a basic classification problem: `Use MAPIE to plot prediction sets <https://mapie.readthedocs.io/en/stable/examples_classification/1-quickstart/plot_quickstart_classification.html>`_
44+
45+
46+
4. Risk Control
47+
=======================
48+
49+
MAPIE implements risk control methods for multilabel classification (in particular, image segmentation) and binary classification: `Use MAPIE to control risk for a binary classifier <https://mapie.readthedocs.io/en/stable/examples_risk_control/1-quickstart/plot_risk_control_binary_classification.html>`_

doc/theoretical_description_risk_control.rst

Lines changed: 32 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -13,26 +13,43 @@ Getting started with risk control in MAPIE
1313
Overview
1414
========
1515

16+
This section provides an overview of risk control in MAPIE. For those unfamiliar with the concept of risk control, the next section provides an introduction to the topic.
17+
1618
Three methods of risk control have been implemented in MAPIE so far :
1719
**Risk-Controlling Prediction Sets** (RCPS) [1], **Conformal Risk Control** (CRC) [2] and **Learn Then Test** (LTT) [3].
18-
The difference between these methods is the way the conformity scores are computed.
1920

20-
As of now, MAPIE supports risk control for two machine learning tasks: **binary classification**, as well as **multi-label classification** (including applications like image segmentation).
21+
As of now, MAPIE supports risk control for two machine learning tasks: **binary classification**, as well as **multi-label classification** (in particular applications like image segmentation).
2122
The table below details the available methods for each task:
2223

24+
.. |br| raw:: html
25+
26+
<br />
27+
2328
.. list-table:: Available risk control methods in MAPIE for each ML task
2429
:header-rows: 1
2530

26-
* - Risk control method
27-
- Binary classification
28-
- Multi-label classification (image segmentation)
31+
* - Risk control |br| method
32+
- Type of |br| control
33+
- Assumption |br| on the data
34+
- Non-monotonic |br| risks
35+
- Binary |br| classification
36+
- Multi-label |br| classification
2937
* - RCPS
38+
- Probability
39+
- i.i.d.
40+
- ❌
3041
- ❌
3142
- ✅
3243
* - CRC
44+
- Expectation
45+
- Exchangeable
46+
- ❌
3347
- ❌
3448
- ✅
3549
* - LTT
50+
- Probability
51+
- i.i.d
52+
- ✅
3653
- ✅
3754
- ✅
3855

@@ -41,7 +58,7 @@ In MAPIE for multi-label classification, CRC and RCPS are used for recall contro
4158
1. What is risk control?
4259
========================
4360

44-
Before diving into risk control, let's take the simple example of a binary classification model, which separates the incoming data into the two classes thanks to its threshold: predictions above it are classified as 1, and those below as 0. Suppose we want to find a threshold that guarantees that our model achieves a certain level of precision. A naive, yet straightforward approach to do this is to evaluate how precision varies with different threshold values on a validation dataset. By plotting this relationship (see plot below), we can identify the range of thresholds that meet our desired precision requirement (green zone on the graph).
61+
Before diving into risk control, let's take the simple example of a binary classification model, which separates the incoming data into two classes. Predicted probabilities above a given threshold (e.g., 0.5) correspond to predicting the "positive" class and probabilities below correspond to the "negative" class. Suppose we want to find a threshold that guarantees that our model achieves a certain level of precision. A naive, yet straightforward approach to do this is to evaluate how precision varies with different threshold values on a validation dataset. By plotting this relationship (see plot below), we can identify the range of thresholds that meet our desired precision requirement (green zone on the graph).
4562

4663
.. image:: images/example_without_risk_control.png
4764
:width: 600
@@ -54,7 +71,7 @@ So far, so good. But here is the catch: while the chosen threshold effectively k
5471
Risk control is the science of adjusting a model's parameter, typically denoted :math:`\lambda`, so that a given risk stays below a desired level with high probability on unseen data.
5572
Note that here, the term *risk* is used to describe an undesirable outcome of the model (e.g., type I error): therefore, it is a value we want to minimize, and in our case, keep under a certain level. Also note that risk control can easily be applied to metrics we want to maximize (e.g., precision), simply by controlling the complement (e.g., 1-precision).
5673

57-
The strength of risk control lies in the statistical guarantees it provides on unseen data. Unlike the naive method presented earlier, it determines a value of :math:`\lambda` that ensures the risk is controlled *beyond* the training data.
74+
The strength of risk control lies in the statistical guarantees it provides on unseen data. Unlike the naive method presented earlier, it determines a value of :math:`\lambda` that ensures the risk is controlled *beyond* the validation data.
5875

5976
Applying risk control to the previous example would allow us to get a new — albeit narrower — range of thresholds (blue zone on the graph) that are **statistically guaranteed**.
6077

@@ -66,7 +83,7 @@ This guarantee is critical in a wide range of use cases (especially in high-stak
6683

6784
6885

69-
To express risk control in mathematical terms, we denote by R the risk we want to control, and introduce the following two parameters:
86+
To express risk control in mathematical terms, we denote by :math:`R` the risk we want to control, and introduce the following two parameters:
7087

7188
- :math:`\alpha`: the target level below which we want the risk to remain, as shown in the figure below;
7289

@@ -76,13 +93,13 @@ To express risk control in mathematical terms, we denote by R the risk we want t
7693

7794
- :math:`\delta`: the confidence level associated with the risk control.
7895

79-
In other words, the risk is said to be controlled if :math:`R \leq \alpha` with probability at least :math:`1 - \delta`.
96+
In other words, the risk is said to be controlled if :math:`R \leq \alpha` with probability at least :math:`1 - \delta`, where the probability is over the randomness in the sampling of the dataset.
8097

8198
The three risk control methods implemented in MAPIE — RCPS, CRC and LTT — rely on different assumptions, and offer slightly different guarantees:
8299

83100
- **CRC** requires the data to be **exchangeable**, and gives a guarantee on the **expectation of the risk**: :math:`\mathbb{E}(R) \leq \alpha`;
84101

85-
- **RCPS** and **LTT** both impose stricter assumptions, requiring the data to be **independent and identically distributed** (i.i.d.), which implies exchangeability. The guarantee they provide is on the **probability that the risk does not exceed :math:`\alpha`**: :math:`\mathbb{P}(R \leq \alpha) \geq 1 - \delta`.
102+
- **RCPS** and **LTT** both impose stricter assumptions, requiring the data to be **independent and identically distributed** (i.i.d.), which implies exchangeability. The guarantee they provide is on the **probability that the risk does not exceed** :math:`\boldsymbol{\alpha}`: :math:`\mathbb{P}(R \leq \alpha) \geq 1 - \delta`.
86103

87104
.. image:: images/risk_distribution.png
88105
:width: 600
@@ -94,12 +111,13 @@ The plot above gives a visual representation of the difference between the two t
94111

95112
- The risk is controlled in probability (RCPS/LTT) if at least :math:`1 - \delta` percent of its distribution over unseen data is below :math:`\alpha`.
96113

97-
Note that at the opposite of the other two methods, LTT allows to control any non-monotonic risk.
114+
Note that contrary to the other two methods, LTT allows to control any non-monotonic risk.
98115

99116
The following section provides a detailed overview of each method.
100117

101118
2. Theoretical description
102119
==========================
120+
103121
2.1 Risk-Controlling Prediction Sets
104122
------------------------------------
105123
2.1.1 General settings
@@ -234,7 +252,7 @@ We are going to present the Learn Then Test framework that allows the user to co
234252
This method has been introduced in article [3].
235253
The settings here are the same as RCPS and CRC, we just need to introduce some new parameters:
236254

237-
- Let :math:`\Lambda` be a discretized for our :math:`\lambda`, meaning that :math:`\Lambda = \{\lambda_1, ..., \lambda_n\}`.
255+
- Let :math:`\Lambda` be a discretized set for our :math:`\lambda`, meaning that :math:`\Lambda = \{\lambda_1, ..., \lambda_n\}`.
238256

239257
- Let :math:`p_\lambda` be a valid p-value for the null hypothesis :math:`\mathbb{H}_j: R(\lambda_j)>\alpha`.
240258

@@ -250,7 +268,7 @@ In order to find all the parameters :math:`\lambda` that satisfy the above condi
250268
:math:`\{(x_1, y_1), \dots, (x_n, y_n)\}`.
251269

252270
- For each :math:`\lambda_j` in a discrete set :math:`\Lambda = \{\lambda_1, \lambda_2,\dots, \lambda_n\}`, we associate the null hypothesis
253-
:math:`\mathcal{H}_j: R(\lambda_j) > \alpha`, as rejecting the hypothesis corresponds to selecting :math:`\lambda_j` as a point where risk the risk
271+
:math:`\mathcal{H}_j: R(\lambda_j) > \alpha`, as rejecting the hypothesis corresponds to selecting :math:`\lambda_j` as a point where the risk
254272
is controlled.
255273

256274
- For each null hypothesis, we compute a valid p-value using a concentration inequality :math:`p_{\lambda_j}`. Here we choose to compute the Hoeffding-Bentkus p-value
@@ -259,6 +277,7 @@ In order to find all the parameters :math:`\lambda` that satisfy the above condi
259277
- Return :math:`\hat{\Lambda} = \mathcal{A}(\{p_j\}_{j\in\{1,\dots,\lvert \Lambda \rvert})`, where :math:`\mathcal{A}`, is an algorithm
260278
that controls the family-wise error rate (FWER), for example, Bonferonni correction.
261279

280+
Note that a notebook testing theoretical guarantees of risk control in binary classification using a random classifier and synthetic data is available here: `theoretical_validity_tests.ipynb <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/risk_control/theoretical_validity_tests.ipynb>`__.
262281

263282
References
264283
==========

0 commit comments

Comments
 (0)