DOC: use conformalization set instead of conformity set (which is sometimes used as prediction set in the literature) (#706)

Valentin-Laurent · web-flow · commit 77957b9ffd06 · 2025-05-20T18:54:35.000+02:00
diff --git a/doc/split_cross_conformal.rst b/doc/split_cross_conformal.rst
@@ -1,5 +1,5 @@
 ################################################################
-The conformity (or "calibration") set
+The conformalization (or "calibration") set
 ################################################################
 
 **MAPIE** is based on two types of techniques for measuring uncertainty in regression and classification:
@@ -10,14 +10,14 @@ The conformity (or "calibration") set
 In all cases, the training/conformalization process can be broken down as follows:
 
 - Train a model using the training set (or full dataset if cross-conformal).
-- Estimate conformity scores using the conformity set (or full dataset if cross-conformal).
+- Estimate conformity scores using the conformalization set (or full dataset if cross-conformal).
 - Predict target on test data to obtain prediction intervals/sets based on these conformity scores.
 
 
 1. Split conformal predictions
 ==============================
 
-- Compute conformity scores ("conformalization") on a conformity set not seen by the model during training.
+- Compute conformity scores ("conformalization") on a conformalization set not seen by the model during training.
   (Use :func:`~mapie.utils.train_conformalize_test_split` to obtain the different sets.)
 
 **MAPIE** then uses the conformity scores to estimate sets associated with the desired coverage on new data with strong theoretical guarantees.
diff --git a/examples/classification/2-advanced-analysis/plot_crossconformal.py b/examples/classification/2-advanced-analysis/plot_crossconformal.py
@@ -8,7 +8,7 @@
 on the resulting coverage estimated by
 :class:`~mapie_v1.classification.SplitConformalClassifier`.
 We then adopt a cross-validation approach in which the
-conformity scores of all conformity sets are used to
+conformity scores of all conformalization sets are used to
 estimate the quantile. We demonstrate that this second
 "cross-conformal" approach gives more robust prediction
 sets with accurate conformity plots.
@@ -19,7 +19,7 @@
 
 We start the tutorial by splitting our training dataset
 in ``K`` folds, and sequentially use each fold as a
-conformity set, while the ``K-1`` folds remaining are
+conformalization set, while the ``K-1`` folds remaining are
 used for training the base model using
 the ``prefit=True`` option of
 :class:`~mapie_v1.classification.SplitConformalClassifier`.
@@ -103,7 +103,7 @@
 
 ##############################################################################
 # We split our training dataset into 5 folds and use each fold as a
-# conformity set. Each conformity set is therefore used to estimate the
+# conformalization set. Each conformalization set is therefore used to estimate the
 # conformity scores and the given quantiles for the two methods implemented in
 # :class:`~mapie_v1.classification.SplitConformalClassifier`.
 
@@ -165,7 +165,7 @@
 # train/conformalization splitting can slightly impact our results.
 #
 # Let's now visualize this impact on the number of labels included in each
-# prediction set induced by the different conformity sets.
+# prediction set induced by the different conformalization sets.
 
 
 def plot_results(
@@ -206,7 +206,7 @@ def plot_results(
 
 ##############################################################################
 # The prediction sets and the resulting coverages slightly vary among
-# conformity sets. Let's now visualize the coverage score and the
+# conformalization sets. Let's now visualize the coverage score and the
 # prediction set size of each fold and for both conformity scores, when
 # ``confidence_level`` = 0.9.
 
@@ -230,7 +230,7 @@ def plot_results(
 
 ##############################################################################
 # Let's now compare the coverages and prediction set sizes obtained with the
-# different folds used as conformity sets.
+# different folds used as conformalization sets.
 
 
 def plot_coverage_width(
@@ -298,7 +298,7 @@ def plot_coverage_width(
 #
 # - It prevents us from using the whole training set for training our base model;
 #
-# - The prediction sets are impacted by the way we extract the conformity set.
+# - The prediction sets are impacted by the way we extract the conformalization set.
 
 ##############################################################################
 # 2. Aggregating the conformity scores through cross-validation
diff --git a/examples/classification/2-advanced-analysis/plot_main-tutorial-binary-classification.py b/examples/classification/2-advanced-analysis/plot_main-tutorial-binary-classification.py
@@ -57,7 +57,7 @@
 #
 # * We set the conformal score ``Sᵢ = 𝑓̂(Xᵢ)ᵧᵢ``
 #   from the softmax output of the true class for each sample
-#   in the conformity set.
+#   in the conformalization set.
 #
 # * Then we define ``q̂`` as being the
 #   ``(n + 1) (1 - α) / n``
diff --git a/examples/mondrian/1-quickstart/plot_main-tutorial-mondrian-regression.py b/examples/mondrian/1-quickstart/plot_main-tutorial-mondrian-regression.py
@@ -19,8 +19,9 @@
 Please note that the coverage obtained with Mondrian depends on the size of the
 groups: therefore, the groups must be large enough for the coverage to represent the
 model's performance on each of them accurately. If the groups are too small (e.g.,
-fewer than 200 samples within the group's conformity set), the conformalization may
-become unstable, likely resulting in high variance in the effective coverage obtained.
+fewer than 200 samples within the group's conformalization set), the conformalization
+may become unstable, likely resulting in high variance in the effective coverage
+obtained.
 
 
 Throughout this tutorial, we will answer the following questions:
@@ -102,7 +103,7 @@
 plt.show()
 
 #######################################################################################
-# 2. Split the dataset into a training set, a conformity set, and a test set
+# 2. Split the dataset into a training set, a conformalization set, and a test set
 # ------------------------------------------------------------------------------------
 
 (X_train, X_conformalize, X_test,
@@ -118,14 +119,14 @@
 )
 
 ##############################################################################
-# We plot the training set, the conformity set, and the test set.
+# We plot the training set, the conformalization set, and the test set.
 
 
 f, ax = plt.subplots(1, 3, figsize=(15, 5))
 ax[0].scatter(X_train, y_train, c=partition_train)
 ax[0].set_title("Train set")
 ax[1].scatter(X_conformalize, y_conformalize, c=partition_conformalize)
-ax[1].set_title("Conformity set")
+ax[1].set_title("Conformalization set")
 ax[2].scatter(X_test, y_test, c=partition_test)
 ax[2].set_title("Test set")
 plt.show()
@@ -145,7 +146,7 @@
 
 
 #######################################################################################
-# Conformalize a SplitConformalRegressor on the conformity set
+# Conformalize a SplitConformalRegressor on the conformalization set
 # *************************************************************************************
 
 
@@ -212,9 +213,9 @@
 
 
 #######################################################################################
-# Conformalize a SplitConformalRegressor on the conformity set for each group
+# Conformalize a SplitConformalRegressor on the conformalization set for each group
 # *************************************************************************************
-# For each group in the conformity set, we conformalize a distinct
+# For each group in the conformalization set, we conformalize a distinct
 # :class:`~mapie.regression.SplitConformalRegressor`.
 
 
diff --git a/mapie/classification.py b/mapie/classification.py
@@ -48,7 +48,7 @@ class SplitConformalClassifier:
 
     1. The ``fit`` method (optional) fits the base classifier to the training data.
     2. The ``conformalize`` method estimates the uncertainty of the base classifier by
-       computing conformity scores on the conformity set.
+       computing conformity scores on the conformalization set.
     3. The ``predict_set`` method predicts labels and sets of labels.
 
     Parameters
@@ -194,15 +194,15 @@ def conformalize(
     ) -> Self:
         """
         Estimates the uncertainty of the base classifier by computing
-        conformity scores on the conformity set.
+        conformity scores on the conformalization set.
 
         Parameters
         ----------
         X_conformalize : ArrayLike
-            Features of the conformity set.
+            Features of the conformalization set.
 
         y_conformalize : ArrayLike
-            Targets of the conformity set.
+            Targets of the conformalization set.
 
         predict_params : Optional[dict], default=None
             Parameters to pass to the ``predict`` and ``predict_proba`` methods
@@ -788,10 +788,10 @@ def _get_classes_info(
                 )
             if n_classes > n_unique_y_labels:
                 warnings.warn(
-                    "WARNING: your conformity dataset has less labels"
+                    "WARNING: your conformalization dataset has less labels"
                     + " than your training dataset (training"
                     + f" has {n_classes} unique labels while"
-                    + f" conformity have {n_unique_y_labels} unique labels"
+                    + f" conformalization have {n_unique_y_labels} unique labels"
                 )
 
         else:
diff --git a/mapie/conformity_scores/sets/aps.py b/mapie/conformity_scores/sets/aps.py
@@ -19,7 +19,7 @@ class APSConformityScore(NaiveConformityScore):
     """
     Adaptive Prediction Sets (APS) method-based non-conformity score.
     It is based on the sum of the softmax outputs of the labels until the true
-    label is reached, on the conformity set. See [1] for more details.
+    label is reached, on the conformalization set. See [1] for more details.
 
     References
     ----------
diff --git a/mapie/conformity_scores/sets/lac.py b/mapie/conformity_scores/sets/lac.py
@@ -17,7 +17,7 @@ class LACConformityScore(BaseClassificationScore):
     non conformity score (also formerly called ``"score"``).
 
     It is based on the scores (i.e. 1 minus the softmax score of the true
-    label) on the conformity set.
+    label) on the conformalization set.
 
     References
     ----------
diff --git a/mapie/conformity_scores/sets/topk.py b/mapie/conformity_scores/sets/topk.py
@@ -18,7 +18,7 @@ class TopKConformityScore(BaseClassificationScore):
     Top-K method-based non-conformity score.
 
     It is based on the sorted index of the probability of the true label in the
-    softmax outputs, on the conformity set. In case two probabilities are
+    softmax outputs, on the conformalization set. In case two probabilities are
     equal, both are taken, thus, the size of some prediction sets may be
     different from the others.
 
diff --git a/mapie/regression/quantile_regression.py b/mapie/regression/quantile_regression.py
@@ -37,7 +37,7 @@ class ConformalizedQuantileRegressor:
        regressor: a model to predict the target, and models to predict upper
        and lower quantiles around the target.
     2. The ``conformalize`` method estimates the uncertainty of the quantile models
-       using the conformity set.
+       using the conformalization set.
     3. The ``predict_interval`` computes prediction points and intervals.
 
     Parameters
@@ -177,15 +177,15 @@ def conformalize(
     ) -> Self:
         """
         Estimates the uncertainty of the quantile regressors by computing
-        conformity scores on the conformity set.
+        conformity scores on the conformalization set.
 
         Parameters
         ----------
         X_conformalize : ArrayLike
-            Features of the conformity set.
+            Features of the conformalization set.
 
         y_conformalize : ArrayLike
-            Targets of the conformity set.
+            Targets of the conformalization set.
 
         predict_params : Optional[dict], default=None
             Parameters to pass to the ``predict`` method of the regressors.
diff --git a/mapie/regression/regression.py b/mapie/regression/regression.py
@@ -45,7 +45,7 @@ class SplitConformalRegressor:
 
     1. The ``fit`` method (optional) fits the base regressor to the training data.
     2. The ``conformalize`` method estimates the uncertainty of the base regressor by
-       computing conformity scores on the conformity set.
+       computing conformity scores on the conformalization set.
     3. The ``predict_interval`` method predicts points and intervals.
 
     Parameters
@@ -190,15 +190,15 @@ def conformalize(
     ) -> Self:
         """
         Estimates the uncertainty of the base regressor by computing
-        conformity scores on the conformity set.
+        conformity scores on the conformalization set.
 
         Parameters
         ----------
         X_conformalize : ArrayLike
-            Features of the conformity set.
+            Features of the conformalization set.
 
         y_conformalize : ArrayLike
-            Targets of the conformity set.
+            Targets of the conformalization set.
 
         predict_params : Optional[dict], default=None
             Parameters to pass to the ``predict`` method of the base regressor.
diff --git a/mapie/risk_control.py b/mapie/risk_control.py
@@ -593,7 +593,7 @@ def fit(
             Training labels.
 
         conformalize_size: Optional[float]
-            Size of the conformity dataset with respect to X if the
+            Size of the conformalization dataset with respect to X if the
             given model is ``None`` need to fit a LogisticRegression.
 
             By default .3
diff --git a/mapie/tests/test_classification.py b/mapie/tests/test_classification.py
@@ -1792,7 +1792,7 @@ def test_warning_not_all_label_in_calib() -> None:
         cv="prefit", random_state=random_state
     )
     with pytest.warns(
-        UserWarning, match=r".*WARNING: your conformity dataset.*"
+        UserWarning, match=r".*WARNING: your conformalization dataset.*"
     ):
         mapie_clf.fit(X_mapie, y_mapie)
 
diff --git a/mapie/utils.py b/mapie/utils.py
@@ -32,13 +32,13 @@ def train_conformalize_test_split(
     random_state: Optional[int] = None,
     shuffle: bool = True,
 ) -> Tuple[NDArray, NDArray, NDArray, NDArray, NDArray, NDArray]:
-    """Split arrays or matrices into train, conformity and test subsets.
+    """Split arrays or matrices into train, conformalization and test subsets.
 
     Utility similar to sklearn.model_selection.train_test_split
     for splitting data into 3 sets.
 
     We advise to give the major part of the data points to the train set
-    and at least 200 data points to the conformity set.
+    and at least 200 data points to the conformalization set.
 
     Parameters
     ----------

Original file line number	Diff line number	Diff line change
`@@ -57,7 +57,7 @@`
`57`	`57`	`#`
`58`	`58`	# * We set the conformal score ``Sᵢ = 𝑓̂(Xᵢ)ᵧᵢ``
`59`	`59`	`# from the softmax output of the true class for each sample`
`60`		`-# in the conformity set.`
	`60`	`+# in the conformalization set.`
`61`	`61`	`#`
`62`	`62`	# * Then we define ``q̂`` as being the
`63`	`63`	# ``(n + 1) (1 - α) / n``