Skip to content

Commit 77957b9

Browse files
DOC: use conformalization set instead of conformity set (which is sometimes used as prediction set in the literature) (#706)
1 parent 2c21137 commit 77957b9

File tree

13 files changed

+41
-40
lines changed

13 files changed

+41
-40
lines changed

doc/split_cross_conformal.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
################################################################
2-
The conformity (or "calibration") set
2+
The conformalization (or "calibration") set
33
################################################################
44

55
**MAPIE** is based on two types of techniques for measuring uncertainty in regression and classification:
@@ -10,14 +10,14 @@ The conformity (or "calibration") set
1010
In all cases, the training/conformalization process can be broken down as follows:
1111

1212
- Train a model using the training set (or full dataset if cross-conformal).
13-
- Estimate conformity scores using the conformity set (or full dataset if cross-conformal).
13+
- Estimate conformity scores using the conformalization set (or full dataset if cross-conformal).
1414
- Predict target on test data to obtain prediction intervals/sets based on these conformity scores.
1515

1616

1717
1. Split conformal predictions
1818
==============================
1919

20-
- Compute conformity scores ("conformalization") on a conformity set not seen by the model during training.
20+
- Compute conformity scores ("conformalization") on a conformalization set not seen by the model during training.
2121
(Use :func:`~mapie.utils.train_conformalize_test_split` to obtain the different sets.)
2222

2323
**MAPIE** then uses the conformity scores to estimate sets associated with the desired coverage on new data with strong theoretical guarantees.

examples/classification/2-advanced-analysis/plot_crossconformal.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
on the resulting coverage estimated by
99
:class:`~mapie_v1.classification.SplitConformalClassifier`.
1010
We then adopt a cross-validation approach in which the
11-
conformity scores of all conformity sets are used to
11+
conformity scores of all conformalization sets are used to
1212
estimate the quantile. We demonstrate that this second
1313
"cross-conformal" approach gives more robust prediction
1414
sets with accurate conformity plots.
@@ -19,7 +19,7 @@
1919
2020
We start the tutorial by splitting our training dataset
2121
in ``K`` folds, and sequentially use each fold as a
22-
conformity set, while the ``K-1`` folds remaining are
22+
conformalization set, while the ``K-1`` folds remaining are
2323
used for training the base model using
2424
the ``prefit=True`` option of
2525
:class:`~mapie_v1.classification.SplitConformalClassifier`.
@@ -103,7 +103,7 @@
103103

104104
##############################################################################
105105
# We split our training dataset into 5 folds and use each fold as a
106-
# conformity set. Each conformity set is therefore used to estimate the
106+
# conformalization set. Each conformalization set is therefore used to estimate the
107107
# conformity scores and the given quantiles for the two methods implemented in
108108
# :class:`~mapie_v1.classification.SplitConformalClassifier`.
109109

@@ -165,7 +165,7 @@
165165
# train/conformalization splitting can slightly impact our results.
166166
#
167167
# Let's now visualize this impact on the number of labels included in each
168-
# prediction set induced by the different conformity sets.
168+
# prediction set induced by the different conformalization sets.
169169

170170

171171
def plot_results(
@@ -206,7 +206,7 @@ def plot_results(
206206

207207
##############################################################################
208208
# The prediction sets and the resulting coverages slightly vary among
209-
# conformity sets. Let's now visualize the coverage score and the
209+
# conformalization sets. Let's now visualize the coverage score and the
210210
# prediction set size of each fold and for both conformity scores, when
211211
# ``confidence_level`` = 0.9.
212212

@@ -230,7 +230,7 @@ def plot_results(
230230

231231
##############################################################################
232232
# Let's now compare the coverages and prediction set sizes obtained with the
233-
# different folds used as conformity sets.
233+
# different folds used as conformalization sets.
234234

235235

236236
def plot_coverage_width(
@@ -298,7 +298,7 @@ def plot_coverage_width(
298298
#
299299
# - It prevents us from using the whole training set for training our base model;
300300
#
301-
# - The prediction sets are impacted by the way we extract the conformity set.
301+
# - The prediction sets are impacted by the way we extract the conformalization set.
302302

303303
##############################################################################
304304
# 2. Aggregating the conformity scores through cross-validation

examples/classification/2-advanced-analysis/plot_main-tutorial-binary-classification.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@
5757
#
5858
# * We set the conformal score ``Sᵢ = 𝑓̂(Xᵢ)ᵧᵢ``
5959
# from the softmax output of the true class for each sample
60-
# in the conformity set.
60+
# in the conformalization set.
6161
#
6262
# * Then we define ``q̂`` as being the
6363
# ``(n + 1) (1 - α) / n``

examples/mondrian/1-quickstart/plot_main-tutorial-mondrian-regression.py

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,9 @@
1919
Please note that the coverage obtained with Mondrian depends on the size of the
2020
groups: therefore, the groups must be large enough for the coverage to represent the
2121
model's performance on each of them accurately. If the groups are too small (e.g.,
22-
fewer than 200 samples within the group's conformity set), the conformalization may
23-
become unstable, likely resulting in high variance in the effective coverage obtained.
22+
fewer than 200 samples within the group's conformalization set), the conformalization
23+
may become unstable, likely resulting in high variance in the effective coverage
24+
obtained.
2425
2526
2627
Throughout this tutorial, we will answer the following questions:
@@ -102,7 +103,7 @@
102103
plt.show()
103104

104105
#######################################################################################
105-
# 2. Split the dataset into a training set, a conformity set, and a test set
106+
# 2. Split the dataset into a training set, a conformalization set, and a test set
106107
# ------------------------------------------------------------------------------------
107108

108109
(X_train, X_conformalize, X_test,
@@ -118,14 +119,14 @@
118119
)
119120

120121
##############################################################################
121-
# We plot the training set, the conformity set, and the test set.
122+
# We plot the training set, the conformalization set, and the test set.
122123

123124

124125
f, ax = plt.subplots(1, 3, figsize=(15, 5))
125126
ax[0].scatter(X_train, y_train, c=partition_train)
126127
ax[0].set_title("Train set")
127128
ax[1].scatter(X_conformalize, y_conformalize, c=partition_conformalize)
128-
ax[1].set_title("Conformity set")
129+
ax[1].set_title("Conformalization set")
129130
ax[2].scatter(X_test, y_test, c=partition_test)
130131
ax[2].set_title("Test set")
131132
plt.show()
@@ -145,7 +146,7 @@
145146

146147

147148
#######################################################################################
148-
# Conformalize a SplitConformalRegressor on the conformity set
149+
# Conformalize a SplitConformalRegressor on the conformalization set
149150
# *************************************************************************************
150151

151152

@@ -212,9 +213,9 @@
212213

213214

214215
#######################################################################################
215-
# Conformalize a SplitConformalRegressor on the conformity set for each group
216+
# Conformalize a SplitConformalRegressor on the conformalization set for each group
216217
# *************************************************************************************
217-
# For each group in the conformity set, we conformalize a distinct
218+
# For each group in the conformalization set, we conformalize a distinct
218219
# :class:`~mapie.regression.SplitConformalRegressor`.
219220

220221

mapie/classification.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ class SplitConformalClassifier:
4848
4949
1. The ``fit`` method (optional) fits the base classifier to the training data.
5050
2. The ``conformalize`` method estimates the uncertainty of the base classifier by
51-
computing conformity scores on the conformity set.
51+
computing conformity scores on the conformalization set.
5252
3. The ``predict_set`` method predicts labels and sets of labels.
5353
5454
Parameters
@@ -194,15 +194,15 @@ def conformalize(
194194
) -> Self:
195195
"""
196196
Estimates the uncertainty of the base classifier by computing
197-
conformity scores on the conformity set.
197+
conformity scores on the conformalization set.
198198
199199
Parameters
200200
----------
201201
X_conformalize : ArrayLike
202-
Features of the conformity set.
202+
Features of the conformalization set.
203203
204204
y_conformalize : ArrayLike
205-
Targets of the conformity set.
205+
Targets of the conformalization set.
206206
207207
predict_params : Optional[dict], default=None
208208
Parameters to pass to the ``predict`` and ``predict_proba`` methods
@@ -788,10 +788,10 @@ def _get_classes_info(
788788
)
789789
if n_classes > n_unique_y_labels:
790790
warnings.warn(
791-
"WARNING: your conformity dataset has less labels"
791+
"WARNING: your conformalization dataset has less labels"
792792
+ " than your training dataset (training"
793793
+ f" has {n_classes} unique labels while"
794-
+ f" conformity have {n_unique_y_labels} unique labels"
794+
+ f" conformalization have {n_unique_y_labels} unique labels"
795795
)
796796

797797
else:

mapie/conformity_scores/sets/aps.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ class APSConformityScore(NaiveConformityScore):
1919
"""
2020
Adaptive Prediction Sets (APS) method-based non-conformity score.
2121
It is based on the sum of the softmax outputs of the labels until the true
22-
label is reached, on the conformity set. See [1] for more details.
22+
label is reached, on the conformalization set. See [1] for more details.
2323
2424
References
2525
----------

mapie/conformity_scores/sets/lac.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ class LACConformityScore(BaseClassificationScore):
1717
non conformity score (also formerly called ``"score"``).
1818
1919
It is based on the scores (i.e. 1 minus the softmax score of the true
20-
label) on the conformity set.
20+
label) on the conformalization set.
2121
2222
References
2323
----------

mapie/conformity_scores/sets/topk.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ class TopKConformityScore(BaseClassificationScore):
1818
Top-K method-based non-conformity score.
1919
2020
It is based on the sorted index of the probability of the true label in the
21-
softmax outputs, on the conformity set. In case two probabilities are
21+
softmax outputs, on the conformalization set. In case two probabilities are
2222
equal, both are taken, thus, the size of some prediction sets may be
2323
different from the others.
2424

mapie/regression/quantile_regression.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ class ConformalizedQuantileRegressor:
3737
regressor: a model to predict the target, and models to predict upper
3838
and lower quantiles around the target.
3939
2. The ``conformalize`` method estimates the uncertainty of the quantile models
40-
using the conformity set.
40+
using the conformalization set.
4141
3. The ``predict_interval`` computes prediction points and intervals.
4242
4343
Parameters
@@ -177,15 +177,15 @@ def conformalize(
177177
) -> Self:
178178
"""
179179
Estimates the uncertainty of the quantile regressors by computing
180-
conformity scores on the conformity set.
180+
conformity scores on the conformalization set.
181181
182182
Parameters
183183
----------
184184
X_conformalize : ArrayLike
185-
Features of the conformity set.
185+
Features of the conformalization set.
186186
187187
y_conformalize : ArrayLike
188-
Targets of the conformity set.
188+
Targets of the conformalization set.
189189
190190
predict_params : Optional[dict], default=None
191191
Parameters to pass to the ``predict`` method of the regressors.

mapie/regression/regression.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ class SplitConformalRegressor:
4545
4646
1. The ``fit`` method (optional) fits the base regressor to the training data.
4747
2. The ``conformalize`` method estimates the uncertainty of the base regressor by
48-
computing conformity scores on the conformity set.
48+
computing conformity scores on the conformalization set.
4949
3. The ``predict_interval`` method predicts points and intervals.
5050
5151
Parameters
@@ -190,15 +190,15 @@ def conformalize(
190190
) -> Self:
191191
"""
192192
Estimates the uncertainty of the base regressor by computing
193-
conformity scores on the conformity set.
193+
conformity scores on the conformalization set.
194194
195195
Parameters
196196
----------
197197
X_conformalize : ArrayLike
198-
Features of the conformity set.
198+
Features of the conformalization set.
199199
200200
y_conformalize : ArrayLike
201-
Targets of the conformity set.
201+
Targets of the conformalization set.
202202
203203
predict_params : Optional[dict], default=None
204204
Parameters to pass to the ``predict`` method of the base regressor.

0 commit comments

Comments
 (0)