66
77# future.batchtools: A Future API for Parallel and Distributed Processing using 'batchtools'
88
9+ ## TL;DR
10+
11+ Here is an example on how evaluate R expression on a Slurm
12+ high-performance compute (HPC) cluster.
13+
14+ ``` r
15+ library(future )
16+
17+ # Limit runtime to 10 minutes and memory to 400 MiB per future,
18+ # request a parallel environment with four slots on a single host.
19+ # On this system, R is available via environment module 'r'. By
20+ # specifying 'r/4.5.1', 'module load r/4.5.1' will be added to
21+ # the submitted job script.
22+ plan(future.batchtools :: batchtools_slurm , resources = list (
23+ time = " 00:10:00" , mem = " 400M" , nodes = 1 , ntasks = 4 ,
24+ modules = c(" r/4.5.1" )
25+ ))
26+
27+ # Give it a spin
28+ f <- future({
29+ data.frame (
30+ hostname = Sys.info()[[" nodename" ]],
31+ os = Sys.info()[[" sysname" ]],
32+ cores = unname(parallelly :: availableCores()),
33+ modules = Sys.getenv(" LOADEDMODULES" )
34+ )
35+ })
36+ info <- value(f )
37+ print(info )
38+ # > hostname os cores modules
39+ # > 1 n12 Linux 4 r/4.5.1
40+ ```
41+
942## Introduction
1043
1144The ** [ future] ** package provides a generic API for using futures in
@@ -24,93 +57,50 @@ high-performance computing (HPC) clusters via a simple switch in
2457settings - without having to change any code at all.
2558
2659For instance, if ** batchtools** is properly configured, the below two
27- expressions for futures ` x ` and ` y ` will be processed on two different
28- compute nodes:
60+ expressions for two futures will be processed on two different compute
61+ nodes:
2962
3063``` r
3164library(future )
3265plan(future.batchtools :: batchtools_slurm )
3366
34- x %<- % { Sys.sleep(5 ); 3.14 }
35- y %<- % { Sys.sleep(5 ); 2.71 }
67+ f_x <- future({ Sys.sleep(5 ); 3.14 })
68+ f_y <- future({ Sys.sleep(5 ); 2.71 })
69+ x <- value(f_x )
70+ y <- value(f_y )
3671x + y
3772# > [1] 5.85
3873```
3974
4075This is just a toy example to illustrate what futures look like and
4176how to work with them.
4277
43- A more realistic example comes from the field of cancer research
44- where very large data FASTQ files, which hold a large number of short
45- DNA sequence reads, are produced. The first step toward a biological
46- interpretation of these data is to align the reads in each sample
47- (one FASTQ file) toward the human genome. In order to speed this up,
48- we can have each file be processed by a separate compute node and each
49- node we can use 24 parallel processes such that each process aligns a
50- separate chromosome. Here is an outline of how this nested parallelism
51- could be implemented using futures.
52-
53- ``` r
54- library(future )
55- library(listenv )
56-
57- # # The first level of futures should be submitted to the
58- # # cluster using batchtools. The second level of futures
59- # # should be using multisession, where the number of
60- # # parallel processes is automatically decided based on
61- # # what the cluster grants to each compute node.
62- plan(list (future.batchtools :: batchtools_slurm , multisession ))
63-
64- # # Find all samples (one FASTQ file per sample)
65- fqs <- dir(pattern = " [.]fastq$" )
66-
67- # # The aligned results are stored in BAM files
68- bams <- listenv()
69-
70- # # For all samples (FASTQ files) ...
71- for (ss in seq_along(fqs )) {
72- fq <- fqs [ss ]
73-
74- # # ... use futures to align them ...
75- bams [[ss ]] %<- % {
76- bams_ss <- listenv()
77- # # ... and for each FASTQ file use a second layer
78- # # of futures to align the individual chromosomes
79- for (cc in 1 : 24 ) {
80- bams_ss [[cc ]] %<- % htseq :: align(fq , chr = cc )
81- }
82- # # Resolve the "chromosome" futures and return as a list
83- as.list(bams_ss )
84- }
85- }
86- # # Resolve the "sample" futures and return as a list
87- bams <- as.list(bams )
88- ```
78+ For an introduction as well as full details on how to use futures,
79+ please see < https://www.futureverse.org > or consult the package
80+ vignettes of the ** [ future] ** package.
8981
90- Note that a user who do not have access to a cluster could use the
91- same script processing samples sequentially and chromosomes in
92- parallel on a single machine using:
9382
94- ``` r
95- plan(list (sequential , multisession ))
96- ```
83+ ## Demos
9784
98- or samples in parallel and chromosomes sequentially using:
85+ The ** [ future] ** package provides a demo using futures for calculating
86+ a set of Mandelbrot planes. The demo does not assume anything about
87+ what type of futures are used. _ The user has full control of how
88+ futures are evaluated_ . For instance, to use local batchtools
89+ futures, run the demo as:
9990
10091``` r
101- plan(list (multisession , sequential ))
92+ library(future )
93+ plan(future.batchtools :: batchtools_local )
94+ demo(" mandelbrot" , package = " future" , ask = FALSE )
10295```
10396
104- For an introduction as well as full details on how to use futures,
105- please consult the package vignettes of the ** [ future] ** package.
106-
107-
10897
109- ## Choosing batchtools backend
98+ ## Available batchtools backend
11099
111100The ** future.batchtools** package implements a generic future wrapper
112101for all batchtools backends. Below are the most common types of
113- batchtools backends.
102+ batchtools backends. For other types of parallel and distributed
103+ backends, please see < https://www.futureverse.org/backends.html > .
114104
115105
116106| Backend | Description | Alternative in future package
@@ -125,64 +115,16 @@ batchtools backends.
125115| ` batchtools_local ` | sequential evaluation in a separate R process (on current machine) | ` plan(cluster, workers = I(1)) `
126116
127117
128- ### Examples
129-
130- Below is an examples on how use resolve futures via a Slurm scheduler.
131-
132- ``` r
133- library(future )
134-
135- # Limit runtime to 10 minutes and memory to 400 MiB per future,
136- # request a parallel environment with four slots on a single host.
137- # On this system, R is available via environment module 'r'. By
138- # specifying 'r/4.5.1', 'module load r/4.5.1' will be added to
139- # the submitted job script.
140- plan(future.batchtools :: batchtools_slurm , resources = list (
141- time = " 00:10:00" , mem = " 400M" , nodes = 1 , ntasks = 4 ,
142- modules = c(" r/4.5.1" )
143- ))
144-
145- # Give it a spin
146- f <- future({
147- data.frame (
148- hostname = Sys.info()[[" nodename" ]],
149- os = Sys.info()[[" sysname" ]],
150- cores = unname(parallelly :: availableCores()),
151- modules = Sys.getenv(" LOADEDMODULES" )
152- )
153- })
154- info <- value(f )
155- print(info )
156- # > hostname os cores modules
157- # > 1 n12 Linux 4 r/4.5.1
158- ```
159-
160- ## Demos
161-
162- The ** [ future] ** package provides a demo using futures for calculating
163- a set of Mandelbrot planes. The demo does not assume anything about
164- what type of futures are used. _ The user has full control of how
165- futures are evaluated_ . For instance, to use local batchtools
166- futures, run the demo as:
167-
168- ``` r
169- library(future )
170- plan(future.batchtools :: batchtools_local )
171- demo(" mandelbrot" , package = " future" , ask = FALSE )
172- ```
173118
174119
175120[ batchtools ] : https://cran.r-project.org/package=batchtools
176- [ brew ] : https://cran.r-project.org/package=brew
177121[ future ] : https://cran.r-project.org/package=future
178122[ future.batchtools ] : https://cran.r-project.org/package=future.batchtools
179- [ batchtools configuration ] : https://batchtools.mlr-org.com/articles/batchtools.html
180123[ TORQUE ] : https://en.wikipedia.org/wiki/TORQUE
181124[ Slurm ] : https://en.wikipedia.org/wiki/Slurm_Workload_Manager
182125[ Sun/Oracle Grid Engine (SGE) ] : https://en.wikipedia.org/wiki/Oracle_Grid_Engine
183126[ Load Sharing Facility (LSF) ] : https://en.wikipedia.org/wiki/Platform_LSF
184127[ OpenLava ] : https://en.wikipedia.org/wiki/OpenLava
185- [ Docker Swarm ] : https://docs.docker.com/swarm/
186128
187129## Installation
188130R package future.batchtools is available on [ CRAN] ( https://cran.r-project.org/package=future.batchtools ) and can be installed in R as:
0 commit comments