Skip to content

Commit 6512312

Browse files
DOC: fix display of mathematical equations in generated notebooks (#562)
DOC: fix display of mathematical equations in generated notebooks (#562)
1 parent 0179598 commit 6512312

File tree

11 files changed

+85
-85
lines changed

11 files changed

+85
-85
lines changed

examples/classification/1-quickstart/plot_comp_methods_on_2d_dataset.py

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
# We will use MAPIE to estimate a prediction set of several classes such that
1414
# the probability that the true label of a new test point is included in the
1515
# prediction set is always higher than the target confidence level :
16-
# :math:`1 - \alpha`.
16+
# ``1 - α``.
1717
# Throughout this tutorial, we compare two conformity scores :
1818
# softmax score or cumulated softmax score.
1919
# We start by using the softmax score or cumulated score output by the base
@@ -23,18 +23,18 @@
2323
# * First we generate a dataset with train, calibration and test, the model
2424
# is fitted in the training set.
2525
#
26-
# * We set the conformal score :math:`S_i = \hat{f}(X_{i})_{y_i}`
26+
# * We set the conformal score ``Sᵢ = 𝑓̂(Xᵢ)ᵧᵢ``
2727
# from the softmax output of the true class or the cumulated score
2828
# (by decreasing order) for each sample in the calibration set.
2929
#
30-
# * Then we define :math:`\hat{q}` as being the
31-
# :math:`(n + 1) (1 - \alpha) / n`
32-
# previous quantile of :math:`S_{1}, ..., S_{n}` (this is essentially the
33-
# quantile :math:`\alpha`, but with a small sample correction).
30+
# * Then we define as being the
31+
# ``(n + 1)(1 - α) / n``
32+
# previous quantile of ``S₁, ..., Sₙ`` (this is essentially the
33+
# quantile α, but with a small sample correction).
3434
#
35-
# * Finally, for a new test data point (where :math:`X_{n + 1}` is known but
36-
# :math:`Y_{n + 1}` is not), create a prediction set
37-
# :math:`C(X_{n+1}) = \{y: \hat{f}(X_{n+1})_{y} > \hat{q}\}` which includes
35+
# * Finally, for a new test data point (where ``Xₙ₊₁`` is known but
36+
# ``Yₙ₊₁`` is not), create a prediction set
37+
# ``C(Xₙ₊₁) = {y: 𝑓̂(Xₙ₊₁)ᵧ > q̂}`` which includes
3838
# all the classes with a sufficiently high conformity score.
3939
#
4040
# We use a two-dimensional dataset with three labels.
@@ -241,7 +241,7 @@ def plot_results(
241241
# in ambiguous regions.
242242
#
243243
# Let's now compare the effective coverage and the average of prediction set
244-
# widths as function of the :math:`1-\alpha` target coverage.
244+
# widths as function of the ``1 - α`` target coverage.
245245

246246
alpha_ = np.arange(0.02, 0.98, 0.02)
247247
coverage, mean_width = {}, {}
@@ -288,6 +288,6 @@ def plot_results(
288288

289289
##############################################################################
290290
# It is seen that both methods give coverages close to the target coverages,
291-
# regardless of the :math:`\alpha` value. However, the "aps"
291+
# regardless of the ``α`` value. However, the "aps"
292292
# produces slightly bigger prediction sets, but without empty regions
293293
# (if the selection of the last label is not randomized).

examples/classification/4-tutorials/plot_crossconformal.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@
1818
of this documentation.
1919
2020
We start the tutorial by splitting our training dataset
21-
in :math:`K` folds and sequentially use each fold as a
22-
calibration set, the :math:`K-1` folds remaining folds are
21+
in ``K`` folds and sequentially use each fold as a
22+
calibration set, the ``K-1`` folds remaining folds are
2323
used for training the base model using
2424
the ``cv="prefit"`` option of
2525
:class:`~mapie.classification.MapieClassifier`.

examples/classification/4-tutorials/plot_main-tutorial-binary-classification.py

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -45,26 +45,26 @@
4545
# We will use MAPIE to estimate a prediction set such that
4646
# the probability that the true label of a new test point is included in the
4747
# prediction set is always higher than the target confidence level :
48-
# :math:`1 - \alpha`.
48+
# ``1 - α``.
4949
# We start by using the softmax score output by the base
5050
# classifier as the conformity score on a toy two-dimensional dataset.
5151
# We estimate the prediction sets as follows :
5252
#
5353
# * First we generate a dataset with train, calibration and test, the model
5454
# is fitted in the training set.
5555
#
56-
# * We set the conformal score :math:`S_i = \hat{f}(X_{i})_{y_i}`
56+
# * We set the conformal score ``Sᵢ = 𝑓̂(Xᵢ)ᵧᵢ``
5757
# from the softmax output of the true class for each sample
5858
# in the calibration set.
5959
#
60-
# * Then we define :math:`\hat{q}` as being the
61-
# :math:`(n + 1) (1 - \alpha) / n`
62-
# previous quantile of :math:`S_{1}, ..., S_{n}` (this is essentially the
63-
# quantile :math:`\alpha`, but with a small sample correction).
60+
# * Then we define ``q̂`` as being the
61+
# ``(n + 1) (1 - α) / n``
62+
# previous quantile of ``S₁, ..., Sₙ`` (this is essentially the
63+
# quantile ``α``, but with a small sample correction).
6464
#
65-
# * Finally, for a new test data point (where :math:`X_{n + 1}` is known but
66-
# :math:`Y_{n + 1}` is not), create a prediction set
67-
# :math:`C(X_{n+1}) = \{y: \hat{f}(X_{n+1})_{y} > \hat{q}\}` which includes
65+
# * Finally, for a new test data point (where ``Xₙ₊₁`` is known but
66+
# ``Yₙ₊₁`` is not), create a prediction set
67+
# ``C(Xₙ₊₁) = {y: 𝑓̂(Xₙ₊₁)ᵧ > q̂}`` which includes
6868
# all the classes with a sufficiently high conformity score.
6969
#
7070
# We use a two-dimensional dataset with two classes (i.e. YES or NO).
@@ -281,7 +281,7 @@ def plot_results(
281281

282282
##############################################################################
283283
# Let's now compare the effective coverage and the average of prediction set
284-
# widths as function of the :math:`1-\alpha` target coverage.
284+
# widths as function of the ``1 - α`` target coverage.
285285

286286
alpha_ = np.arange(0.02, 0.98, 0.02)
287287

@@ -332,7 +332,7 @@ def plot_coverages_widths(alpha, coverage, width, method):
332332

333333
##############################################################################
334334
# It is seen that the method gives coverages close to the target coverages,
335-
# regardless of the :math:`\alpha` value.
335+
# regardless of the ``α`` value.
336336

337337
alpha_ = np.arange(0.02, 0.16, 0.01)
338338

examples/classification/4-tutorials/plot_main-tutorial-classification.py

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
# We will use MAPIE to estimate a prediction set of several classes such
3434
# that the probability that the true label of a new test point is included
3535
# in the prediction set is always higher than the target confidence level :
36-
# :math:`P(Y_{n+1} \in \hat{C}_{n, \alpha}(X_{n+1}) \geq 1 - \alpha`.
36+
# ``P(Yₙ₊₁ ∈ Ĉₙ,α(Xₙ₊₁)) ≥ 1 - α``
3737
# We start by using the softmax score output by the base classifier as the
3838
# conformity score on a toy two-dimensional dataset.
3939
#
@@ -42,17 +42,17 @@
4242
# * Generate a dataset with train, calibration and test, the model is
4343
# fitted on the training set.
4444
#
45-
# * Set the conformal score :math:`S_i = \hat{f}(X_{i})_{y_i}` the softmax
45+
# * Set the conformal score ``Sᵢ = 𝑓̂(Xᵢ)ᵧᵢ``, the softmax
4646
# output of the true class for each sample in the calibration set.
4747
#
48-
# * Define :math:`\hat{q}` as being the :math:`(n + 1) (\alpha) / n`
49-
# previous quantile of :math:`S_{1}, ..., S_{n}`
50-
# (this is essentially the quantile :math:`\alpha`, but with a small sample
48+
# * Define ``q̂`` as being the ``(n + 1)) / n``
49+
# previous quantile of ``S₁, ..., Sₙ``
50+
# (this is essentially the quantile ``α``, but with a small sample
5151
# correction).
5252
#
53-
# * Finally, for a new test data point (where :math:`X_{n + 1}` is known but
54-
# :math:`Y_{n + 1}` is not), create a prediction set
55-
# :math:`C(X_{n+1}) = \{y: \hat{f}(X_{n+1})_{y} > \hat{q}\}` which includes
53+
# * Finally, for a new test data point (where ``Xₙ₊₁`` is known but
54+
# ``Yₙ₊₁`` is not), create a prediction set
55+
# ``C(Xₙ₊₁) = {y: 𝑓̂(Xₙ₊₁)ᵧ > q̂}`` which includes
5656
# all the classes with a sufficiently high softmax output.
5757

5858
# We use a two-dimensional toy dataset with three labels. The distribution of
@@ -205,9 +205,9 @@ def plot_results(alphas, X, y_pred, y_ps):
205205
# classifier.
206206
#
207207
# Let’s now study the effective coverage and the mean prediction set widths
208-
# as function of the :math:`1-\alpha` target coverage. To this aim, we use once
208+
# as function of the ``1 - α`` target coverage. To this aim, we use once
209209
# again the ``predict`` method of MAPIE to estimate predictions sets on a
210-
# large number of :math:`\alpha` values.
210+
# large number of ``α`` values.
211211

212212
alpha2 = np.arange(0.02, 0.98, 0.02)
213213
_, y_ps_score2 = mapie_score.predict(X_test, alpha=alpha2)
@@ -243,7 +243,7 @@ def plot_coverages_widths(alpha, coverage, width, method):
243243
#
244244
# We saw in the previous section that the "lac" method is well calibrated by
245245
# providing accurate coverage levels. However, it tends to give null
246-
# prediction sets for uncertain regions, especially when the :math:`\alpha`
246+
# prediction sets for uncertain regions, especially when the ``α``
247247
# value is high.
248248
# MAPIE includes another method, called Adaptive Prediction Set (APS),
249249
# whose conformity score is the cumulated score of the softmax output until

examples/multilabel_classification/1-quickstart/plot_tutorial_multilabel_classification.py

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -102,16 +102,16 @@
102102
# Bernstein and Waudby-Smith–Ramdas).
103103
# The two methods give two different guarantees on the risk:
104104
#
105-
# * RCPS: :math:`P(R(\mathcal{T}_{\hat{\lambda}})\leq\alpha)\geq 1-\delta`
106-
# where :math:`R(\mathcal{T}_{\hat{\lambda}})`
107-
# is the risk we want to control and :math:`\alpha` is the desired risk
105+
# * RCPS: ``𝒫(R(𝒯̂λ̂) ≤ α) ≥ 1 − δ``
106+
# where ``R(𝒯̂λ̂)``
107+
# is the risk we want to control and α is the desired risk
108108
#
109-
# * CRC: :math:`\mathbb{E}\left[L_{n+1}(\hat{\lambda})\right] \leq \alpha`
110-
# where :math:`L_{n+1}(\hat{\lambda})` is the risk of a new observation and
111-
# :math:`\alpha` is the desired risk
109+
# * CRC: ``𝐸[Lₙ₊₁(λ̂)] ≤ α``
110+
# where ``Lₙ₊₁(λ̂)`` is the risk of a new observation and
111+
# ``α`` is the desired risk
112112
#
113113
# In both cases, the objective of the method is to find the optimal value of
114-
# :math:`\lambda` (threshold above which we consider a label as being present)
114+
# ``λ`` (threshold above which we consider a label as being present)
115115
# such that the recall on the test points is at least equal to the required
116116
# recall.
117117

@@ -156,7 +156,7 @@
156156
# * The actual recall (which should be always near to the required one):
157157
# we can see that they are close to each other.
158158
# * The value of the threshold: we see that the threshold is decreasing as
159-
# :math:`1 - \alpha` increases, which is what is expected because a
159+
# ``1 - α`` increases, which is what is expected because a
160160
# smaller threshold will give larger prediction sets, hence a larger
161161
# recall.
162162
#
@@ -179,11 +179,11 @@
179179
##############################################################################
180180
# 2 - Plots where we choose a specific risk value (0.1 in our case) and look at
181181
# the average risk, the UCB of the risk (for RCPS methods) and the choice of
182-
# the threshold :math:`\lambda`
182+
# the threshold ``λ``.
183183
# * We can see that among the RCPS methods, the Bernstein method
184-
# gives the best results as for a given value of :math:`\alpha`
184+
# gives the best results as for a given value of ``α``
185185
# as we are above the required recall but with a larger value of
186-
# :math:`\lambda` than the two others bounds.
186+
# ``λ`` than the two others bounds.
187187
# * The CRC method gives the best results since it guarantees the coverage
188188
# with a larger threshold.
189189

@@ -223,20 +223,20 @@
223223
# In this part, we will use LTT to control precision.
224224
# At the opposite of the 2 previous method, LTT can handle non-monotonous loss.
225225
# The procedure consist in multiple hypothesis testing. This is why the output
226-
# of this procedure isn't reduce to one value of :math:`\lambda`.
226+
# of this procedure isn't reduce to one value of ``λ``.
227227
#
228-
# More precisely, we look after all the :math:`\lambda` that sastisfy the
228+
# More precisely, we look after all the ``λ`` that sastisfy the
229229
# following:
230-
# :math:`\mathbb{P}(R(\mathcal{T}_{\lambda}) \leq \alpha ) \geq 1 - \delta`,
231-
# where :math:`R(\mathcal{T}_{\lambda})` is the risk we want to control and
232-
# each :math:`\lambda`` should satisfy FWER control.
233-
# :math:`\alpha` is the desired risk.
230+
# ``𝒫(R(𝒯̂λ̂) ≤ α) ≥ 1 − δ``,
231+
# where ``R(𝒯̂λ̂)`` is the risk we want to control and
232+
# each ``λ`` should satisfy FWER control.
233+
# ``α`` is the desired risk.
234234
#
235-
# Notice that the procedure will diligently examine each :math:`\lambda`
236-
# such that the risk remains below level :math:`\alpha`, meaning not
237-
# every :math:`\lambda` will be considered.
238-
# This means that a for a :math:`\lambda` such that risk is below
239-
# :math:`\alpha`
235+
# Notice that the procedure will diligently examine each ``λ``
236+
# such that the risk remains below level ``α``, meaning not
237+
# every ``λ`` will be considered.
238+
# This means that a for a ``λ`` such that risk is below
239+
# ``α``
240240
# doesn't necessarly pass the FWER control! This is what we are going to
241241
# explore.
242242

@@ -267,7 +267,7 @@
267267
##############################################################################
268268
# 3.2 Valid parameters for precision control
269269
# ----------------------------------------------------------------------------
270-
# We can see that not all :math:`\lambda` such that risk is below the orange
270+
# We can see that not all ``λ`` such that risk is below the orange
271271
# line are choosen by the procedure. Otherwise, all the lambdas that are
272272
# in the red rectangle verify family wise error rate control and allow to
273273
# control precision at the desired level with a high probability.

examples/regression/1-quickstart/plot_cqr_symmetry_difference.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,4 +111,4 @@
111111
# each bound, allowing for more flexible and accurate intervals that reflect
112112
# the heteroscedastic nature of the data. The resulting effective coverages
113113
# demonstrate the theoretical guarantee of the target coverage level
114-
# :math:`1 - \alpha`.
114+
# ``1 - α``.

examples/regression/1-quickstart/plot_prefit.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ def f(x: NDArray) -> NDArray:
7474
# quantile regression using
7575
# :class:`~mapie.quanitle_regression.MapieQuantileRegressor`. Note that the
7676
# three estimators need to be trained at quantile values of
77-
# :math:`(\alpha/2, 1-(\alpha/2), 0.5)`.
77+
# ``(α/2, 1-(α/2), 0.5)``.
7878

7979

8080
# Train a MLPRegressor for MapieRegressor

examples/regression/2-advanced-analysis/plot-coverage-width-based-criterion.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
# Estimating the aleatoric uncertainty of heteroscedastic noisy data
3434
# ---------------------------------------------------------------------
3535
#
36-
# Let's define again the :math:`x \times \sin(x)` function and another simple
36+
# Let's define again the ``x * sin(x)`` function and another simple
3737
# function that generates one-dimensional data with normal noise uniformely
3838
# in a given interval.
3939

@@ -70,7 +70,7 @@ def get_1d_data_with_heteroscedastic_noise(
7070
##############################################################################
7171
# We first generate noisy one-dimensional data uniformely on an interval.
7272
# Here, the noise is considered as *heteroscedastic*, since it will increase
73-
# linearly with :math:`x`.
73+
# linearly with `x`.
7474

7575
min_x, max_x, n_samples, noise = 0, 5, 300, 0.5
7676
(
@@ -92,7 +92,7 @@ def get_1d_data_with_heteroscedastic_noise(
9292
##############################################################################
9393
# As mentioned previously, we fit our training data with a simple
9494
# polynomial function. Here, we choose a degree equal to 10 so the function
95-
# is able to perfectly fit :math:`x \times \sin(x)`.
95+
# is able to perfectly fit ``x * sin(x)``.
9696

9797
degree_polyn = 10
9898
polyn_model = Pipeline(

examples/regression/2-advanced-analysis/plot_nested-cv.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@
2222
cross-validation occurs on the training fold, optimizing hyperparameters.
2323
This ensures that residuals seen by MAPIE are never seen by the algorithm
2424
beforehand. However, this method is much heavier computationally since
25-
it results in :math:`N * P` calculations, where *N* is the number of
25+
it results in ``N * P`` calculations, where *N* is the number of
2626
*out-of-fold* models and *P* the number of parameter search cross-validations,
27-
versus :math:`N + P` for the non-nested approach.
27+
versus ``N + P`` for the non-nested approach.
2828
2929
Here, we compare the two strategies on a toy dataset. We use the Random
3030
Forest Regressor as a base regressor for the CV+ strategy. For the sake of

examples/regression/4-tutorials/plot_cqr_tutorial.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,7 @@ def plot_prediction_intervals(
230230

231231
##############################################################################
232232
# We proceed to using MAPIE to return the predictions and prediction intervals.
233-
# We will use an :math:`\alpha=0.2`, this means a target coverage of 0.8
233+
# We will use an ``α=0.2``, this means a target coverage of 0.8
234234
# (recall that this parameter needs to be initialized directly when setting
235235
# :class:`~mapie.quantile_regression.MapieQuantileRegressor` and when using
236236
# :class:`~mapie.regression.MapieRegressor`, it needs to be set in the
@@ -241,7 +241,7 @@ def plot_prediction_intervals(
241241
# model on a training set and then calibrates on the calibration set.
242242
# * ``cv="prefit"`` meaning that you can train your models with the correct
243243
# quantile values (must be given in the following order:
244-
# :math:`(\alpha, 1-(\alpha/2), 0.5)` and given to MAPIE as an iterable
244+
# ``(α, 1-(α/2), 0.5)`` and given to MAPIE as an iterable
245245
# object. (Check the examples for how to use prefit in MAPIE)
246246
#
247247
# Additionally, note that there is a list of accepted models by
@@ -413,7 +413,7 @@ def get_coverages_widths_by_bins(
413413

414414
##############################################################################
415415
# What we observe from these results is that none of the methods seems to
416-
# have conditional coverage at the target :math:`1 - \alpha`. However, we can
416+
# have conditional coverage at the target ``1 - α``. However, we can
417417
# clearly notice that the CQR seems to better adapt to large prices. Its
418418
# conditional coverage is closer to the target coverage not only for higher
419419
# prices, but also for lower prices where the other methods have a higher

0 commit comments

Comments
 (0)