scikit-learn-contrib
diff --git a/‎HISTORY.rst‎
Lines changed: 1 addition & 0 deletions b/‎HISTORY.rst‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎README.rst‎
Lines changed: 10 additions & 5 deletions b/‎README.rst‎
Lines changed: 10 additions & 5 deletions
diff --git a/‎doc/index.rst‎
Lines changed: 7 additions & 0 deletions b/‎doc/index.rst‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎doc/notebooks_classification.rst‎
Lines changed: 4 additions & 4 deletions b/‎doc/notebooks_classification.rst‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎doc/notebooks_multilabel_classification.rst‎
Lines changed: 4 additions & 4 deletions b/‎doc/notebooks_multilabel_classification.rst‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎doc/notebooks_regression.rst‎
Lines changed: 4 additions & 4 deletions b/‎doc/notebooks_regression.rst‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎doc/quick_start.rst‎
Lines changed: 4 additions & 6 deletions b/‎doc/quick_start.rst‎
Lines changed: 4 additions & 6 deletions
diff --git a/‎doc/theoretical_description_binary_classification.rst‎
Lines changed: 5 additions & 5 deletions b/‎doc/theoretical_description_binary_classification.rst‎
Lines changed: 5 additions & 5 deletions
diff --git a/‎doc/theoretical_description_calibration.rst‎
Lines changed: 7 additions & 110 deletions b/‎doc/theoretical_description_calibration.rst‎
Lines changed: 7 additions & 110 deletions
diff --git a/‎doc/theoretical_description_classification.rst‎
Lines changed: 7 additions & 6 deletions b/‎doc/theoretical_description_classification.rst‎
Lines changed: 7 additions & 6 deletions
@@ -10,6 +10,7 @@ History
 * Reduce precision for test in `MapieCalibrator`.
 * Fix invalid certificate when downloading data.
 * Add citations utility to the documentation.
+* Add documentation for metrics.
 * Add explanation and example for symmetry argument in CQR.
 
 0.8.3 (2024-03-01)
 
@@ -172,23 +172,28 @@ and with the financial support from Région Ile de France and Confiance.ai.
 |Quantmetry| |Michelin| |ENS| |Confiance.ai| |IledeFrance|
 
 .. |Quantmetry| image:: https://www.quantmetry.com/wp-content/uploads/2020/08/08-Logo-quant-Texte-noir.svg
-    :height: 35
+    :height: 35px
+    :width: 140px
     :target: https://www.quantmetry.com/
 
 .. |Michelin| image:: https://agngnconpm.cloudimg.io/v7/https://dgaddcosprod.blob.core.windows.net/corporate-production/attachments/cls05tqdd9e0o0tkdghwi9m7n-clooe1x0c3k3x0tlu4cxi6dpn-bibendum-salut.full.png
-    :height: 35
+    :height: 50px
+    :width: 45px
     :target: https://www.michelin.com/en/
 
 .. |ENS| image:: https://file.diplomeo-static.com/file/00/00/01/34/13434.svg
-    :height: 35
+    :height: 35px
+    :width: 140px
     :target: https://ens-paris-saclay.fr/en
 
 .. |Confiance.ai| image:: https://pbs.twimg.com/profile_images/1443838558549258264/EvWlv1Vq_400x400.jpg
-    :height: 35
+    :height: 45px
+    :width: 45px
     :target: https://www.confiance.ai/
 
 .. |IledeFrance| image:: https://www.iledefrance.fr/sites/default/files/logo/2024-02/logoGagnerok.svg
-    :height: 35
+    :height: 35px
+    :width: 140px
     :target: https://www.iledefrance.fr/
 
 
 
@@ -58,6 +58,13 @@
    examples_calibration/index
    notebooks_calibration
 
+.. toctree::
+   :maxdepth: 2
+   :hidden:
+   :caption: METRICS
+
+   theoretical_description_metrics
+
 .. toctree::
    :maxdepth: 2
    :hidden:
 
@@ -6,8 +6,8 @@ problems for computer vision settings that are too heavy to be included in the e
 galleries.
 
 
-1. Estimating prediction sets on the Cifar10 dataset : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/Cifar10.ipynb>`_
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
+1. Estimating prediction sets on the Cifar10 dataset : `cifar_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/Cifar10.ipynb>`_
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
-2. Top-label calibration for outputs of ML models : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/top_label_calibration.ipynb>`_
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+2. Top-label calibration for outputs of ML models : `top_label_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/top_label_calibration.ipynb>`_
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
@@ -5,8 +5,8 @@ The following examples present advanced analyses
 on multi-label classification problems with different 
 methods proposed in MAPIE.
 
-1. Overview of Recall Control for Multi-Label Classification : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/tutorial_multilabel_classification_recall.ipynb>`_
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+1. Overview of Recall Control for Multi-Label Classification : `recall_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/tutorial_multilabel_classification_recall.ipynb>`_
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
-2. Overview of Precision Control for Multi-Label Classification : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/tutorial_multilabel_classification_precision.ipynb>`_
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+2. Overview of Precision Control for Multi-Label Classification : `precision_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/tutorial_multilabel_classification_precision.ipynb>`_
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
@@ -8,11 +8,11 @@ This section lists a series of Jupyter notebooks hosted on the MAPIE Github repo
 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
 
-2. Estimating the uncertainties in the exoplanet masses : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/regression/exoplanets.ipynb>`_
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+2. Estimating the uncertainties in the exoplanet masses : `exoplanet_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/regression/exoplanets.ipynb>`_
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
 
-3. Estimating prediction intervals for time series forecast with EnbPI and ACI : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/regression/ts-changepoint.ipynb>`_
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+3. Estimating prediction intervals for time series forecast with EnbPI and ACI : `ts_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/regression/ts-changepoint.ipynb>`_
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
 
@@ -7,11 +7,9 @@ In regression settings, **MAPIE** provides prediction intervals on single-output
 In classification settings, **MAPIE** provides prediction sets on multi-class data.
 In any case, **MAPIE** is compatible with any scikit-learn-compatible estimator.
 
-Estimate your prediction intervals
-==================================
 
 1. Download and install the module
-----------------------------------
+==================================
 
 Install via ``pip``:
 
@@ -33,7 +31,7 @@ To install directly from the github repository :
 
 
 2. Run MapieRegressor
----------------------
+=====================
 
 Let us start with a basic regression problem. 
 Here, we generate one-dimensional noisy data that we fit with a linear model.
@@ -114,8 +112,8 @@ It is given by the alpha parameter defined in ``MapieRegressor``, here equal to
 thus giving target coverages of ``0.95`` and ``0.68``.
 The effective coverage is the actual fraction of true labels lying in the prediction intervals.
 
-2. Run MapieClassifier
-----------------------
+3. Run MapieClassifier
+=======================
 
 Similarly, it's possible to do the same for a basic classification problem.
 
 
@@ -1,10 +1,10 @@
-.. title:: Theoretical Description : contents
+.. title:: Theoretical Description Binary Classification : contents
 
 .. _theoretical_description_binay_classification:
 
-=======================
+#######################
 Theoretical Description
-=======================
+#######################
 
 There are mainly three different ways to handle uncertainty quantification in binary classification:
 calibration (see :doc:`theoretical_description_calibration`), confidence interval (CI) for the probability
@@ -83,8 +83,8 @@ for the labels of test objects which are guaranteed to be well-calibrated under
 that the observations are generated independently from the same distribution [2].
 
 
-4. References
--------------
+References
+----------
 
 [1] Gupta, Chirag, Aleksandr Podkopaev, and Aaditya Ramdas.
 "Distribution-free binary classification: prediction sets, confidence intervals, and calibration."
 
@@ -2,10 +2,9 @@
 
 .. _theoretical_description_calibration:
 
-=======================
+#######################
 Theoretical Description
-=======================
-
+#######################
 
 One method for multi-class calibration has been implemented in MAPIE so far :
 Top-Label Calibration [1].
@@ -34,8 +33,8 @@ To apply calibration directly to a multi-class context, Gupta et al. propose a f
 a multi-class calibration to multiple binary calibrations (M2B).
 
 
-1. Top-Label
-------------
+Top-Label
+---------
 
 Top-Label calibration is a calibration technique introduced by Gupta et al. to calibrate the model according to the highest score and
 the corresponding class (see [1] Section 2). This framework offers to apply binary calibration techniques to multi-class calibration.
@@ -50,109 +49,8 @@ according to Top-Label calibration if:
     Pr(Y = c(X) \mid h(X), c(X)) = h(X)
 
 
-2. Metrics for calibration
---------------------------
-
-**Expected calibration error**
-
-The main metric to check if the calibration is correct is the Expected Calibration Error (ECE). It is based on two
-components, accuracy and confidence per bin. The number of bins is a hyperparamater :math:`M`, and we refer to a specific bin by
-:math:`B_m`.
-
-.. math::
-    \text{acc}(B_m) &= \frac{1}{\left| B_m \right|} \sum_{i \in B_m} {y}_i \\
-    \text{conf}(B_m) &= \frac{1}{\left| B_m \right|} \sum_{i \in B_m} \hat{f}(x)_i
-
-
-The ECE is the combination of these two metrics combined.
-
-.. math::
-    \text{ECE} = \sum_{m=1}^M \frac{\left| B_m \right|}{n} \left| acc(B_m) - conf(B_m) \right|
-
-In simple terms, once all the different bins from the confidence scores have been created, we check the mean accuracy of each bin.
-The absolute mean difference between the two is the ECE. Hence, the lower the ECE, the better the calibration was performed. 
-
-**Top-Label ECE**
-
-In the top-label calibration, we only calculate the ECE for the top-label class. Hence, per top-label class, we condition the calculation
-of the accuracy and confidence based on the top label and take the average ECE for each top-label.
-
-3. Statistical tests for calibration
-------------------------------------
-
-**Kolmogorov-Smirnov test**
-
-Kolmogorov-Smirnov test was derived in [2, 3, 4]. The idea is to consider the cumulative differences between sorted scores :math:`s_i`
-and their corresponding labels :math:`y_i` and to compare its properties to that of a standard Brownian motion. Let us consider the
-cumulative differences on sorted scores: 
-
-.. math::
-    C_k = \frac{1}{N}\sum_{i=1}^k (s_i - y_i)
-
-We also introduce a typical normalization scale :math:`\sigma`:
-
-.. math::
-    \sigma = \frac{1}{N}\sqrt{\sum_{i=1}^N s_i(1 - s_i)}
-
-The Kolmogorov-Smirnov statistic is then defined as : 
-
-.. math::
-   G = \max|C_k|/\sigma
-
-It can be shown [2] that, under the null hypothesis of well-calibrated scores, this quantity asymptotically (i.e. when N goes to infinity)
-converges to the maximum absolute value of a standard Brownian motion over the unit interval :math:`[0, 1]`. [3, 4] also provide closed-form 
-formulas for the cumulative distribution function (CDF) of the maximum absolute value of such a standard Brownian motion.
-So we state the p-value associated to the statistical test of well calibration as:
-
-.. math::
-   p = 1 - CDF(G)
-
-**Kuiper test**
-
-Kuiper test was derived in [2, 3, 4] and is very similar to Kolmogorov-Smirnov. This time, the statistic is defined as:
-
-.. math::
-   H = (\max_k|C_k| - \min_k|C_k|)/\sigma
-
-It can be shown [2] that, under the null hypothesis of well-calibrated scores, this quantity asymptotically (i.e. when N goes to infinity)
-converges to the range of a standard Brownian motion over the unit interval :math:`[0, 1]`. [3, 4] also provide closed-form 
-formulas for the cumulative distribution function (CDF) of the range of such a standard Brownian motion.
-So we state the p-value associated to the statistical test of well calibration as:
-
-.. math::
-   p = 1 - CDF(H)
-
-**Spiegelhalter test**
-
-Spiegelhalter test was derived in [6]. It is based on a decomposition of the Brier score: 
-
-.. math::
-   B = \frac{1}{N}\sum_{i=1}^N(y_i - s_i)^2
-
-where scores are denoted :math:`s_i` and their corresponding labels :math:`y_i`. This can be decomposed in two terms:
-
-.. math::
-   B = \frac{1}{N}\sum_{i=1}^N(y_i - s_i)(1 - 2s_i) + \frac{1}{N}\sum_{i=1}^N s_i(1 - s_i)
-
-It can be shown that the first term has an expected value of zero under the null hypothesis of well calibration. So we interpret
-the second term as the Brier score expected value :math:`E(B)` under the null hypothesis. As for the variance of the Brier score, it can be
-computed as:
-
-.. math::
-   Var(B) = \frac{1}{N^2}\sum_{i=1}^N(1 - 2s_i)^2 s_i(1 - s_i)
-
-So we can build a Z-score as follows: 
-
-.. math::
-   Z = \frac{B - E(B)}{\sqrt{Var(B)}} = \frac{\sum_{i=1}^N(y_i - s_i)(1 - 2s_i)}{\sqrt{\sum_{i=1}^N(1 - 2s_i)^2 s_i(1 - s_i)}}
-
-This statistic follows a normal distribution of cumulative distribution CDF so that we state the associated p-value:
-
-.. math::
-   p = 1 - CDF(Z)
-
-3. References
--------------
+References
+----------
 
 [1] Gupta, Chirag, and Aaditya K. Ramdas.
 "Top-label calibration and multiclass-to-binary reductions."
@@ -171,8 +69,7 @@ arXiv preprint arXiv:2202.00100.
 
 [4] D. A. Darling. A. J. F. Siegert.
 The First Passage Problem for a Continuous Markov Process.
-Ann. Math. Statist. 24 (4) 624 - 639, December,
-1953.
+Ann. Math. Statist. 24 (4) 624 - 639, December, 1953.
 
 [5] William Feller.
 The Asymptotic Distribution of the Range of Sums of
 
@@ -1,11 +1,10 @@
-.. title:: Theoretical Description : contents
+.. title:: Theoretical Description Classification : contents
 
 .. _theoretical_description_classification:
 
-=======================
+#######################
 Theoretical Description
-=======================
-
+#######################
 
 Three methods for multi-class uncertainty quantification have been implemented in MAPIE so far :
 LAC (that stands for Least Ambiguous set-valued Classifier) [1], Adaptive Prediction Sets [2, 3] and Top-K [3].
@@ -141,8 +140,10 @@ Despite the RAPS method having a relatively small set size, its coverage tends t
 of the last label in the prediction set. This randomization is done as follows:
 
 - First : define the :math:`V` parameter:
+
 .. math::
    V_i = (s_i(X_i, Y_i) - \hat{q}_{1-\alpha}) / \left(\hat{\mu}(X_i)_{\pi_k} + \lambda \mathbb{1} (k > k_{reg})\right)
+
 - Compare each :math:`V_i` to :math:`U \sim` Unif(0, 1)
 - If :math:`V_i \leq U`, the last included label is removed, else we keep the prediction set as it is.
 
@@ -227,8 +228,8 @@ where :
 
 .. TO BE CONTINUED
 
-5. References
--------------
+References
+----------
 
 [1] Mauricio Sadinle, Jing Lei, & Larry Wasserman.
 "Least Ambiguous Set-Valued Classifiers With Bounded Error Levels."