Skip to content

Commit 1331789

Browse files
authored
feat(section): add information about mlr3inferr (#855)
1 parent b6c3a53 commit 1331789

File tree

6 files changed

+67
-2
lines changed

6 files changed

+67
-2
lines changed

DESCRIPTION

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ Imports:
2828
mlr3filters,
2929
mlr3fselect,
3030
mlr3hyperband,
31+
mlr3inferr,
3132
mlr3learners,
3233
mlr3oml,
3334
mlr3mbo,
@@ -46,6 +47,10 @@ Imports:
4647
stringi
4748
Remotes:
4849
mlr-org/mlr3extralearners,
50+
mlr-org/mlr3batchmark,
51+
mlr-org/mlr3proba,
52+
mlr-org/mlr3fairness,
53+
mlr-org/mlr3inferr
4954
mlr-org/mlr3proba
5055
Encoding: UTF-8
5156
Roxygen: list(markdown = TRUE)

R/zzz.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ NULL
55

66
db = new.env()
77
db$index = c("base", "utils", "datasets", "data.table", "stats", "batchtools")
8-
db$hosted = c("paradox", "mlr3misc", "mlr3", "mlr3data", "mlr3db", "mlr3proba", "mlr3pipelines", "mlr3learners", "mlr3filters", "bbotk", "mlr3tuning", "mlr3viz", "mlr3fselect", "mlr3cluster", "mlr3spatiotempcv", "mlr3spatial", "mlr3extralearners", "mlr3tuningspaces", "mlr3hyperband", "mlr3mbo", "mlr3verse", "mlr3benchmark", "mlr3oml", "mlr3batchmark", "mlr3fairness")
8+
db$hosted = c("paradox", "mlr3misc", "mlr3", "mlr3data", "mlr3db", "mlr3proba", "mlr3pipelines", "mlr3learners", "mlr3filters", "bbotk", "mlr3tuning", "mlr3viz", "mlr3fselect", "mlr3cluster", "mlr3spatiotempcv", "mlr3spatial", "mlr3extralearners", "mlr3tuningspaces", "mlr3hyperband", "mlr3mbo", "mlr3verse", "mlr3benchmark", "mlr3oml", "mlr3batchmark", "mlr3fairness", "mlr3inferr")
99

1010
lgr = NULL
1111

book/book.bib

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1436,6 +1436,25 @@ @book{hutter2019automated
14361436
publisher = {Springer},
14371437
keywords = {}
14381438
}
1439+
1440+
@misc{kuempelfischer2024ciforge,
1441+
title={Constructing Confidence Intervals for 'the' Generalization Error -- a Comprehensive Benchmark Study},
1442+
author={Hannah Schulz-Kümpel and Sebastian Fischer and Thomas Nagler and Anne-Laure Boulesteix and Bernd Bischl and Roman Hornung},
1443+
year={2024},
1444+
eprint={2409.18836},
1445+
archivePrefix={arXiv},
1446+
primaryClass={stat.ML},
1447+
url={https://arxiv.org/abs/2409.18836},
1448+
}
1449+
1450+
@article{bayle2020cross,
1451+
title={Cross-validation confidence intervals for test error},
1452+
author={Bayle, Pierre and Bayle, Alexandre and Janson, Lucas and Mackey, Lester},
1453+
journal={Advances in Neural Information Processing Systems},
1454+
volume={33},
1455+
pages={16339--16350},
1456+
year={2020}
1457+
}
14391458
@article{yu_quantile_2003,
14401459
author = {Yu, Keming and Lu, Zudi and Stander, Julian},
14411460
doi = {10.1111/1467-9884.00363},

book/chapters/appendices/errata.qmd

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ This appendix lists changes to the online version of this book to chapters inclu
1818
## 3. Evaluation and Benchmarking
1919

2020
* Use `$encapsulate()` method instead of the `$encapsulate` and `$fallback` fields.
21+
* A section on the `mlr3inferr` package was added.
2122

2223
## 4. Hyperparameter Optimization
2324

book/chapters/chapter3/evaluation_and_benchmarking.qmd

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -285,6 +285,31 @@ print(plt2)
285285
```
286286

287287

288+
### Confidence Intervals (+) {#sec-resampling-ci}
289+
290+
Instead of relying solely on point estimates, CIs offer a measure of uncertainty of this estimate, allowing us to understand the reliability of our performance measurement.
291+
While constructing CIs for the generalization error is challenging due to the complex nature of the inference problem, some methods have been shown to work well in practice [@kuempelfischer2024ciforge].
292+
When employing such methods, it is important to be aware that they can fail in some cases -- e.g. in the presence of outliers or instable learning procedures -- and to be aware that the resulting CIs can either be too conservative or too liberal.
293+
294+
The `r ref_pkg("mlr3inferr")` package extends the `mlr3` ecosystem with both inference methods and new resampling strategies.
295+
The inference methods are implemented as `r ref("Measure")` objects that take in another measure for which to compute the CI.
296+
Below, we demonstrate how to use the inference method suggested by @bayle2020cross to compute a CI for the cross-validation result from the previous section.
297+
As opposed to other mlr3 measures, the result is not a scalar value, but a vector containing the point estimate, as well as the lower and upper bounds of the CI for the specified alpha level.
298+
299+
```{r}
300+
library(mlr3inferr)
301+
# alpha = 0.05 is also the default
302+
msr_ci = msr("ci.wald_cv", msr("classif.acc"), alpha = 0.05)
303+
rr$aggregate(msr_ci)
304+
```
305+
306+
We can also use `msr("ci")`, which will automatically select the appropriate inference measure for the given resampling strategy.
307+
A list of available inference methods can be found on the package website: `r link("https://mlr3inferr.mlr-org.com/")`.
308+
309+
```{r}
310+
rr$aggregate(msr("ci", msr("classif.acc")))
311+
```
312+
288313
### ResampleResult Objects {#sec-resampling-inspect}
289314

290315
As well as being useful for estimating the generalization performance, the `r ref("ResampleResult")` object can also be used for model inspection.
@@ -576,6 +601,21 @@ plt = plt + ggplot2::scale_fill_manual(values = c("grey30", "grey50", "grey70"))
576601
print(plt)
577602
```
578603

604+
It is also possible to plot confidence intervals by setting the type of plot to `"ci"`.
605+
606+
```{r}
607+
#| fig-height: 3
608+
#| fig-width: 6
609+
#| label: fig-benchmark-ci
610+
#| fig-cap: 'Confidence intervals for accuracy scores for each learner across resampling iterations and the three tasks. Random forests (`lrn("classif.ranger")`) consistently outperforms the other learners.'
611+
#| fig-alt: Nine confidence intervals, one corresponding to each task/learner combination. In all cases the random forest performs best and the featureless baseline the worst.
612+
#| echo: false
613+
#| warning: false
614+
#| message: false
615+
autoplot(bmr, "ci", measure = msr("ci", msr("classif.acc")))
616+
```
617+
618+
579619
## Evaluation of Binary Classifiers {#sec-roc}
580620

581621
In @sec-basics-classif-learner we touched on the concept of a confusion matrix and how it can be used to break down classification errors in more detail.

book/common/chap_auths.csv

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Chapter Number,Title,Authors
22
1,Introduction and Overview,"Lars Kotthoff, Raphael Sonabend, Natalie Foss, Bernd Bischl"
33
2,Data and Basic Modeling,"Natalie Foss, Lars Kotthoff"
4-
3,Evaluation and Benchmarking,"Giuseppe Casalicchio, Lukas Burk"
4+
3,Evaluation and Benchmarking,"Giuseppe Casalicchio, Lukas Burk, Sebastian Fischer"
55
4,Hyperparameter Optimization,"Marc Becker, Lennart Schneider, Sebastian Fischer"
66
5,Advanced Tuning Methods and Black Box Optimization,"Lennart Schneider, Marc Becker"
77
6,Feature Selection,Marvin N. Wright

0 commit comments

Comments
 (0)