Add 'Launch mirai workers via HPC job scheduler' section to help("mirai_cluster", package = "future.mirai") [#12]

HenrikBengtsson · HenrikBengtsson · commit 373ab8c282e7 · 2025-09-05T17:49:14.000-07:00
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,5 +1,5 @@
 Package: future.mirai
-Version: 0.10.1-9001
+Version: 0.10.1-9002
 Depends:
     future (>= 1.49.0)
 Imports:
@@ -28,5 +28,5 @@ Language: en-US
 Encoding: UTF-8
 URL: https://future.mirai.futureverse.org, https://github.com/futureverse/future.mirai
 BugReports: https://github.com/futureverse/future.mirai/issues
-RoxygenNote: 7.3.2
+RoxygenNote: 7.3.3
 Roxygen: list(markdown = TRUE)
diff --git a/NEWS.md b/NEWS.md
@@ -1,6 +1,9 @@
 # Version (development version)
 
- * ...
+## Documentation
+
+ * Add 'Launch mirai workers via HPC job scheduler' section to
+   `help("mirai_cluster", package = "future.mirai")`.
 
 
 # Version 0.10.1
@@ -19,7 +22,7 @@
    in other errors.
  
  * `resolved()` on a mirai future already known to be interrupted
-   would requery the mirai object, instead of returning TRUE
+   would re-query the mirai object, instead of returning TRUE
    immediately.
  
 
diff --git a/R/MiraiFutureBackend-class.R b/R/MiraiFutureBackend-class.R
@@ -416,6 +416,116 @@ tweak.mirai_cluster <- function(strategy, ..., penvir = parent.frame()) {
 #'
 #' @return Nothing.
 #'
+#' @section Launch mirai workers via HPC job scheduler:
+#'
+#' If you have access to a high-performance-compute (HPC) environment
+#' with a job scheduler, you can use **future.mirai** and **mirai**
+#' to run parallel workers distributed on the computer cluster.
+#' How to set these workers up is explained in [mirai::cluster_config()],
+#' which should work for our most common job schedulers, e.g. Slurm,
+#' Sun/Son of/Oracle/Univa/Altair Grid Engine (SGE), OpenLava,
+#' Load Sharing Facility (LSF), and TORQUE/PBS.
+#'
+#' _Note: Not all compute clusters support running **mirai** workers
+#' this way. This is because **mirai** workers need to establish a
+#' TCP connection back to machine that launched the workers, but some
+#' systems have security policies disallowing such connections from
+#' being established. This is often configured in the firewall and
+#' can only be controlled by the admins. If your system has such
+#' rules, you will find that the mirai jobs are launched and running
+#' on the scheduler, but we will wait forever for the mirai workers
+#' to connect back, i.e. `mirai::info()[["connections"]]` is always
+#' zero in the below example._
+#'
+#' Briefly, to launch a cluster of mirai workers on an HPC cluster, we
+#' need to:
+#'
+#'  1. configure [mirai::cluster_config()] for the job scheduler,
+#'
+#'  2. use configuration to launch workers using [mirai::daemons()].
+#'
+#' The first step is specific to each job scheduler and this is where
+#' you control things such as how much memory each worker gets, for
+#' how long the may run, which environment modules to load, and which
+#' environment modules to load, if any.
+#' The second step is the same regardless of job scheduler.
+#' Here is an example for how to run parallel mirai workers on a
+#' Slurm scheduler and then use these in futureverse.
+#'
+#' ```r
+#' # Here we give each worker 200 MiB of RAM and a maximum of one hour
+#' # to run. Unless we specify '-l pe smp N', each mirai worker is
+#' # allotted one CPU core, which impacts nested parallelization. 
+#' # R is provided via environment module 'r' on this cluster.
+#' config <- mirai::cluster_config(command = "sbatch", options = "
+#'   #SBATCH --job-name=mirai
+#'   #SBATCH --time=01:00:00
+#'   #SBATCH --mem=200M
+#'   module load r
+#' ")
+#'
+#' # -------------------------------------------------------------------
+#' # Launch eight mirai workers, via equally many jobs, wait for all of
+#' # them to become available, and use them in futureverse
+#' # -------------------------------------------------------------------
+#' workers <- 8
+#' mirai::daemons(n = workers, url = mirai::host_url(), remote = config)
+#' while(mirai::info()[["connections"]] < workers) Sys.sleep(1.0)
+#' plan(future.mirai::mirai_cluster)
+#' 
+#' # Verify that futures are resolved on a compute node
+#' f <- future({
+#'   data.frame(
+#'     hostname = Sys.info()[["nodename"]],
+#'           os = Sys.info()[["sysname"]],
+#'        cores = unname(parallelly::availableCores()),
+#'      modules = Sys.getenv("LOADEDMODULES")
+#'   )
+#' })
+#' info <- value(f)
+#' print(info)
+#' #>   hostname    os cores  modules
+#' #> 1      n12 Linux     1  r/4.5.1
+#' 
+#' # Shut down parallel workers
+#' plan(sequential)
+#' mirai::daemons(0)
+#' ```
+#'
+#' If you are on SGE, you can use the following configuration:
+#'
+#' ```r
+#' # -----------------------------------------------------------
+#' # Configure mirai to launch R workers via the job scheduler
+#' # -----------------------------------------------------------
+#' # Here we give each worker 200 MiB of RAM and a maximum of
+#' # one hour to run. Unless we specify '-l pe smp N', each
+#' # mirai worker is allotted one CPU core, which impacts
+#' # nested parallelization. To make sure R is available to
+#' # launch the mirai workers, we load environment module
+#' # 'r/x.y.z', where 'x.y.z' is the current R version.
+#' config <- mirai::cluster_config(command = "qsub", options = "
+#'   #$ -N mirai
+#'   #$ -j y
+#'   #$ -cwd
+#'   #$ -l h_rt=01:00:00
+#'   #$ -l mem_free=200M
+#'   module load r
+#' ")
+#' ```
+#'
+#' Everything else is the same.
+#'
+#' _Comment_: [mirai::cluster_config()] configures the jobs to run vanilla
+#' POSIX shells, i.e. `/bin/sh`. This might be too strict for some users.
+#' If your setup requires your jobs to be run using Bash (`/bin/sh`), you
+#' can tweak the configuration `config` object manually to do so;
+#'
+#' ```r
+#' config$command <- "/bin/bash"
+#' config$args <- sub("/bin/sh", config$command, config$args)
+#' ```
+#'
 #' @example incl/mirai_cluster.R
 #'
 #' @importFrom future future
diff --git a/inst/WORDLIST b/inst/WORDLIST
@@ -1,13 +1,21 @@
+FutureInterruptError
 Futureverse
+HPC
+LSF
 MiraiFuture
 ORCID
+OpenLava
 RJ
+SGE
+Slurm
 TLS
+Univa
 callr
 doFuture
 doi
 finalizers
 furrr
+futureverse
 globals
 mirai
 multisession
diff --git a/man/mirai_cluster.Rd b/man/mirai_cluster.Rd