diff --git a/book/chapters/chapter1/introduction_and_overview.qmd b/book/chapters/chapter1/introduction_and_overview.qmd index ccf4778ef..15bdeccc8 100644 --- a/book/chapters/chapter1/introduction_and_overview.qmd +++ b/book/chapters/chapter1/introduction_and_overview.qmd @@ -3,6 +3,13 @@ aliases: - "/introduction_and_overview.html" --- + +```{r} +# extra packages that must be installed in the docker image +remotes::install_github("mlr-org/mlr3@mirai") +remotes::install_cran("mirai") +``` + # Introduction and Overview {#sec-introduction} {{< include ../../common/_setup.qmd >}} diff --git a/book/chapters/chapter10/advanced_technical_aspects_of_mlr3.qmd b/book/chapters/chapter10/advanced_technical_aspects_of_mlr3.qmd index 7d76f27f2..247d49af9 100644 --- a/book/chapters/chapter10/advanced_technical_aspects_of_mlr3.qmd +++ b/book/chapters/chapter10/advanced_technical_aspects_of_mlr3.qmd @@ -491,6 +491,101 @@ lrn_rpart$parallel_predict = TRUE prediction = lrn_rpart$predict(tsk_sonar) ``` +### Parallelization with `mirai` {#sec-parallel-mirai} + +```{r, include = FALSE} +mirai::daemons(0) +``` + +With `mlr3` 1.0.0, we integrated the `r ref_pkg("mirai")` package as an alternative parallelization backend. +`mirai` provides a lightweight approach to parallelization by starting persistent R sessions called daemons that evaluate tasks in parallel. +These daemons can be launched either locally or on remote machines via SSH or cluster managers. +Compared to the `r ref_pkg("future")` package, `mirai` has significantly lower overhead per task. +Like parallelization with `future`, users only need to configure the backend before starting any computations. +The following sections demonstrate how to use `mirai` for parallelizing resamplings, benchmarks, and tuning. + +To use `mirai` for parallelization, we first need to start the daemons. +We start two daemons and check the status of the daemons. + +```{r, eval = FALSE} +library(mirai) + +mirai::daemons(2) + +mirai::status() +``` + +We parallelize a three-fold CV for a decision tree on the sonar task. + +```{r} +tsk_sonar = tsk("sonar") +lrn_rpart = lrn("classif.rpart") +rsmp_cv3 = rsmp("cv", folds = 3) +system.time({resample(tsk_sonar, lrn_rpart, rsmp_cv3)}) +``` + +One advantage of `mirai` is that it eliminates the need to manually set chunk sizes, as it automatically handles task distribution efficiently. + +Since the daemons are already running, we can proceed directly with the tuning example. + +```{r} +instance = tune( + tnr("random_search", batch_size = 12), + tsk("penguins"), + lrn("classif.rpart", minsplit = to_tune(2, 128)), + rsmp("cv", folds = 3), + term_evals = 20 +) + +instance$archive$n_evals +``` + +`mirai` also supports nested resampling, where the outer loop can be parallelized while the inner loop runs sequentially. +We start a daemon for each outer resampling iteration. +The inner loop runs sequentially. + +```{r} +# reset daemons +mirai::daemons(0) + +mirai::daemons(5) + +lrn_rpart = lrn("classif.rpart", + minsplit = to_tune(2, 128)) + +lrn_rpart_tuned = auto_tuner(tnr("random_search", batch_size = 2), + lrn_rpart, rsmp("cv", folds = 3), msr("classif.ce"), 2) + +rr = resample(tsk("penguins"), lrn_rpart_tuned, rsmp("cv", folds = 5)) +``` + +We can also parallelize both outer and inner loops using the `everywhere()` function to set up daemons for the inner loop on the daemons of the outer loop. + +```{r, eval = FALSE} +# reset daemons +mirai::daemons(0) + +mirai::daemons(5) + +everywhere({ + mirai::daemons(3) +}) +``` + +Note that running the outer loop in the main session while parallelizing the inner loop is currently not supported. +However, you can run the outer loop in a single daemon and the inner loop on multiple daemons + +```{r, eval = FALSE} +# reset daemons +mirai::daemons(0) + +mirai::daemons(1) + +everywhere({ + mirai::daemons(3) +}) +``` + ## Error Handling {#sec-error-handling} In large experiments, it is not uncommon that a model fit or prediction fails with an error.\index{debugging}