Skip to content

Commit f08a30c

Browse files
committed
Merge branch 'main' of github.com:mlr-org/mlr3book
2 parents fff1d68 + 1331789 commit f08a30c

File tree

8 files changed

+78
-20
lines changed

8 files changed

+78
-20
lines changed

DESCRIPTION

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ Imports:
2828
mlr3filters,
2929
mlr3fselect,
3030
mlr3hyperband,
31+
mlr3inferr,
3132
mlr3learners,
3233
mlr3oml,
3334
mlr3mbo,
@@ -46,6 +47,10 @@ Imports:
4647
stringi
4748
Remotes:
4849
mlr-org/mlr3extralearners,
50+
mlr-org/mlr3batchmark,
51+
mlr-org/mlr3proba,
52+
mlr-org/mlr3fairness,
53+
mlr-org/mlr3inferr
4954
mlr-org/mlr3proba
5055
Encoding: UTF-8
5156
Roxygen: list(markdown = TRUE)

R/zzz.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ NULL
55

66
db = new.env()
77
db$index = c("base", "utils", "datasets", "data.table", "stats", "batchtools")
8-
db$hosted = c("paradox", "mlr3misc", "mlr3", "mlr3data", "mlr3db", "mlr3proba", "mlr3pipelines", "mlr3learners", "mlr3filters", "bbotk", "mlr3tuning", "mlr3viz", "mlr3fselect", "mlr3cluster", "mlr3spatiotempcv", "mlr3spatial", "mlr3extralearners", "mlr3tuningspaces", "mlr3hyperband", "mlr3mbo", "mlr3verse", "mlr3benchmark", "mlr3oml", "mlr3batchmark", "mlr3fairness")
8+
db$hosted = c("paradox", "mlr3misc", "mlr3", "mlr3data", "mlr3db", "mlr3proba", "mlr3pipelines", "mlr3learners", "mlr3filters", "bbotk", "mlr3tuning", "mlr3viz", "mlr3fselect", "mlr3cluster", "mlr3spatiotempcv", "mlr3spatial", "mlr3extralearners", "mlr3tuningspaces", "mlr3hyperband", "mlr3mbo", "mlr3verse", "mlr3benchmark", "mlr3oml", "mlr3batchmark", "mlr3fairness", "mlr3inferr")
99

1010
lgr = NULL
1111

book/book.bib

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1436,6 +1436,25 @@ @book{hutter2019automated
14361436
publisher = {Springer},
14371437
keywords = {}
14381438
}
1439+
1440+
@misc{kuempelfischer2024ciforge,
1441+
title={Constructing Confidence Intervals for 'the' Generalization Error -- a Comprehensive Benchmark Study},
1442+
author={Hannah Schulz-Kümpel and Sebastian Fischer and Thomas Nagler and Anne-Laure Boulesteix and Bernd Bischl and Roman Hornung},
1443+
year={2024},
1444+
eprint={2409.18836},
1445+
archivePrefix={arXiv},
1446+
primaryClass={stat.ML},
1447+
url={https://arxiv.org/abs/2409.18836},
1448+
}
1449+
1450+
@article{bayle2020cross,
1451+
title={Cross-validation confidence intervals for test error},
1452+
author={Bayle, Pierre and Bayle, Alexandre and Janson, Lucas and Mackey, Lester},
1453+
journal={Advances in Neural Information Processing Systems},
1454+
volume={33},
1455+
pages={16339--16350},
1456+
year={2020}
1457+
}
14391458
@article{yu_quantile_2003,
14401459
author = {Yu, Keming and Lu, Zudi and Stander, Julian},
14411460
doi = {10.1111/1467-9884.00363},
@@ -1447,3 +1466,13 @@ @article{yu_quantile_2003
14471466
volume = {52},
14481467
year = {2003},
14491468
}
1469+
@book{koenker_quantile_2005,
1470+
address = {Cambridge},
1471+
series = {Econometric Society Monographs},
1472+
title = {Quantile Regression},
1473+
isbn = {978-0-521-84573-1},
1474+
publisher = {Cambridge University Press},
1475+
author = {Koenker, Roger},
1476+
year = {2005},
1477+
doi = {10.1017/CBO9780511754098},
1478+
}

book/chapters/appendices/errata.qmd

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ This appendix lists changes to the online version of this book to chapters inclu
1818
## 3. Evaluation and Benchmarking
1919

2020
* Use `$encapsulate()` method instead of the `$encapsulate` and `$fallback` fields.
21+
* A section on the `mlr3inferr` package was added.
2122

2223
## 4. Hyperparameter Optimization
2324

book/chapters/chapter13/beyond_regression_and_classification.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1031,7 +1031,7 @@ ggplot(plot_data, aes(x = x, y = loss, color = tau)) +
10311031
But note: While many ML models based on empirical risk minimization use the pinball loss for estimating quantiles, some model classes might work differently.
10321032
However, since the underlying training procedure of a model is external to `mlr3`, we are more concerned with resampling and evaluating quantile regression models.
10331033
This works in exactly the same manner as for other tasks.
1034-
Because we provide only a brief overview of quantile regression, we recommend @yu_quantile_2003 if you are interested in a methodological introduction to the topic.
1034+
Because we provide only a brief overview of quantile regression, we recommend @yu_quantile_2003 if you are interested in a methodological introduction to the topic and @koenker_quantile_2005 for a more expansive treatment of quantile regression.
10351035

10361036
### Synthetic data set generation {#sec-data-generation}
10371037

book/chapters/chapter3/evaluation_and_benchmarking.qmd

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -285,6 +285,31 @@ print(plt2)
285285
```
286286

287287

288+
### Confidence Intervals (+) {#sec-resampling-ci}
289+
290+
Instead of relying solely on point estimates, CIs offer a measure of uncertainty of this estimate, allowing us to understand the reliability of our performance measurement.
291+
While constructing CIs for the generalization error is challenging due to the complex nature of the inference problem, some methods have been shown to work well in practice [@kuempelfischer2024ciforge].
292+
When employing such methods, it is important to be aware that they can fail in some cases -- e.g. in the presence of outliers or instable learning procedures -- and to be aware that the resulting CIs can either be too conservative or too liberal.
293+
294+
The `r ref_pkg("mlr3inferr")` package extends the `mlr3` ecosystem with both inference methods and new resampling strategies.
295+
The inference methods are implemented as `r ref("Measure")` objects that take in another measure for which to compute the CI.
296+
Below, we demonstrate how to use the inference method suggested by @bayle2020cross to compute a CI for the cross-validation result from the previous section.
297+
As opposed to other mlr3 measures, the result is not a scalar value, but a vector containing the point estimate, as well as the lower and upper bounds of the CI for the specified alpha level.
298+
299+
```{r}
300+
library(mlr3inferr)
301+
# alpha = 0.05 is also the default
302+
msr_ci = msr("ci.wald_cv", msr("classif.acc"), alpha = 0.05)
303+
rr$aggregate(msr_ci)
304+
```
305+
306+
We can also use `msr("ci")`, which will automatically select the appropriate inference measure for the given resampling strategy.
307+
A list of available inference methods can be found on the package website: `r link("https://mlr3inferr.mlr-org.com/")`.
308+
309+
```{r}
310+
rr$aggregate(msr("ci", msr("classif.acc")))
311+
```
312+
288313
### ResampleResult Objects {#sec-resampling-inspect}
289314

290315
As well as being useful for estimating the generalization performance, the `r ref("ResampleResult")` object can also be used for model inspection.
@@ -576,6 +601,21 @@ plt = plt + ggplot2::scale_fill_manual(values = c("grey30", "grey50", "grey70"))
576601
print(plt)
577602
```
578603

604+
It is also possible to plot confidence intervals by setting the type of plot to `"ci"`.
605+
606+
```{r}
607+
#| fig-height: 3
608+
#| fig-width: 6
609+
#| label: fig-benchmark-ci
610+
#| fig-cap: 'Confidence intervals for accuracy scores for each learner across resampling iterations and the three tasks. Random forests (`lrn("classif.ranger")`) consistently outperforms the other learners.'
611+
#| fig-alt: Nine confidence intervals, one corresponding to each task/learner combination. In all cases the random forest performs best and the featureless baseline the worst.
612+
#| echo: false
613+
#| warning: false
614+
#| message: false
615+
autoplot(bmr, "ci", measure = msr("ci", msr("classif.acc")))
616+
```
617+
618+
579619
## Evaluation of Binary Classifiers {#sec-roc}
580620

581621
In @sec-basics-classif-learner we touched on the concept of a confusion matrix and how it can be used to break down classification errors in more detail.

book/chapters/chapter9/preprocessing.qmd

Lines changed: 0 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -236,17 +236,9 @@ invisible(dev.off())
236236
magick::image_trim(fig)
237237
```
238238

239-
::: {.callout-warning}
240-
241-
Currently, there is a bug in the mlr3pipelines package that causes the following code chunk to fail.
242-
See https://github.com/mlr-org/mlr3pipelines/issues/894 for more details.
243-
244-
:::
245-
246239
Using this pipeline we can now run experiments with `lrn("regr.ranger")`, which cannot handle missing data; we also compare a simpler pipeline that only uses OOR imputation to demonstrate performance differences resulting from different strategies.
247240

248241
```{r preprocessing-015}
249-
#| eval: false
250242
glrn_rf_impute_hist = as_learner(impute_hist %>>% lrn("regr.ranger"))
251243
glrn_rf_impute_hist$id = "RF_imp_Hist"
252244
@@ -450,19 +442,10 @@ tsk_ames_ext$data(1,
450442
c("energy_means", "energy_mins", "energy_maxs", "energy_vars"))
451443
```
452444

453-
::: {.callout-warning}
454-
455-
This code chunk does not work due to the bug in the `mlr3pipelines` package.
456-
See the warning message above for more details.
457-
458-
:::
459-
460-
461445
These outputs look sensible compared to @fig-energy so we can now run our final benchmark experiment using feature extraction.
462446
We do not need to add the `PipeOp` to each learner as we can apply it once (as above) before any model training by applying it to all available data.
463447

464448
```{r preprocessing-026, warning=FALSE, R.options = list(datatable.print.nrows = 13, datatable.print.class = FALSE, datatable.print.keys = FALSE, datatable.print.trunc.cols = TRUE)}
465-
#| eval: false
466449
learners = list(lrn_baseline, lrn("regr.rpart"), glrn_xgb_impact,
467450
glrn_rf_impute_oor, glrn_lm_robust, glrn_log_lm_robust)
468451

book/common/chap_auths.csv

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Chapter Number,Title,Authors
22
1,Introduction and Overview,"Lars Kotthoff, Raphael Sonabend, Natalie Foss, Bernd Bischl"
33
2,Data and Basic Modeling,"Natalie Foss, Lars Kotthoff"
4-
3,Evaluation and Benchmarking,"Giuseppe Casalicchio, Lukas Burk"
4+
3,Evaluation and Benchmarking,"Giuseppe Casalicchio, Lukas Burk, Sebastian Fischer"
55
4,Hyperparameter Optimization,"Marc Becker, Lennart Schneider, Sebastian Fischer"
66
5,Advanced Tuning Methods and Black Box Optimization,"Lennart Schneider, Marc Becker"
77
6,Feature Selection,Marvin N. Wright

0 commit comments

Comments
 (0)