You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+80-41Lines changed: 80 additions & 41 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,24 +7,26 @@
7
7
# future.batchtools: A Future API for Parallel and Distributed Processing using 'batchtools'
8
8
9
9
## Introduction
10
-
The [future] package provides a generic API for using futures in R.
11
-
A future is a simple yet powerful mechanism to evaluate an R expression
12
-
and retrieve its value at some point in time. Futures can be resolved
13
-
in many different ways depending on which strategy is used.
14
-
There are various types of synchronous and asynchronous futures to
15
-
choose from in the [future] package.
16
-
17
-
This package, [future.batchtools], provides a type of futures that
18
-
utilizes the [batchtools] package. This means that _any_ type of
19
-
backend that the batchtools package supports can be used as a future.
20
-
More specifically, future.batchtools will allow you or users of your
21
-
package to leverage the compute power of high-performance computing
22
-
(HPC) clusters via a simple switch in settings - without having to
23
-
change any code at all.
24
-
25
-
For instance, if batchtools is properly configures, the below two
10
+
11
+
The **[future]** package provides a generic API for using futures in
12
+
R. A future is a simple yet powerful mechanism to evaluate an R
13
+
expression and retrieve its value at some point in time. Futures can
14
+
be resolved in many different ways depending on which strategy is
15
+
used. There are various types of synchronous and asynchronous futures
16
+
to choose from in the **[future]** package.
17
+
18
+
This package, **[future.batchtools]**, provides a type of futures that
19
+
utilizes the **[batchtools]** package. This means that _any_ type of
20
+
backend that the **batchtools** package supports can be used as a
21
+
future. More specifically, **future.batchtools** will allow you or
22
+
users of your package to leverage the compute power of
23
+
high-performance computing (HPC) clusters via a simple switch in
24
+
settings - without having to change any code at all.
25
+
26
+
For instance, if **batchtools** is properly configures, the below two
26
27
expressions for futures `x` and `y` will be processed on two different
27
28
compute nodes:
29
+
28
30
```r
29
31
> library("future.batchtools")
30
32
> plan(batchtools_torque)
@@ -34,8 +36,9 @@ compute nodes:
34
36
>x+y
35
37
[1] 5.85
36
38
```
37
-
This is obviously a toy example to illustrate what futures look like
38
-
and how to work with them.
39
+
40
+
This is just a toy example to illustrate what futures look like and
41
+
how to work with them.
39
42
40
43
A more realistic example comes from the field of cancer research
41
44
where very large data FASTQ files, which hold a large number of short
@@ -46,6 +49,7 @@ we can have each file be processed by a separate compute node and each
46
49
node we can use 24 parallel processes such that each process aligns a
47
50
separate chromosome. Here is an outline of how this nested parallelism
48
51
could be implemented using futures.
52
+
49
53
```r
50
54
library("future")
51
55
library("listenv")
@@ -81,22 +85,29 @@ for (ss in seq_along(fqs)) {
81
85
## Resolve the "sample" futures and return as a list
82
86
bams<- as.list(bams)
83
87
```
84
-
Note that a user who do not have access to a cluster could use the same script processing samples sequentially and chromosomes in parallel on a single machine using:
88
+
89
+
Note that a user who do not have access to a cluster could use the
90
+
same script processing samples sequentially and chromosomes in
91
+
parallel on a single machine using:
92
+
85
93
```r
86
94
plan(list(sequential, multisession))
87
95
```
96
+
88
97
or samples in parallel and chromosomes sequentially using:
98
+
89
99
```r
90
100
plan(list(multisession, sequential))
91
101
```
92
102
93
103
For an introduction as well as full details on how to use futures,
94
-
please consult the package vignettes of the [future] package.
104
+
please consult the package vignettes of the **[future]** package.
95
105
96
106
97
107
98
108
## Choosing batchtools backend
99
-
The future.batchtools package implements a generic future wrapper
109
+
110
+
The **future.batchtools** package implements a generic future wrapper
100
111
for all batchtools backends. Below are the most common types of
101
112
batchtools backends.
102
113
@@ -109,18 +120,22 @@ batchtools backends.
109
120
| `batchtools_lsf` | Futures are evaluated via a [Load Sharing Facility (LSF)] job scheduler | N/A
110
121
| `batchtools_openlava` | Futures are evaluated via an [OpenLava] job scheduler | N/A
111
122
| `batchtools_custom` | Futures are evaluated via a custom batchtools configuration R script or via a set of cluster functions | N/A
112
-
| `batchtools_interactive` | sequential evaluation in the calling R environment | `plan(transparent)`
113
123
| `batchtools_multicore` | parallel evaluation by forking the current R process | `plan(multicore)`
114
124
| `batchtools_local` | sequential evaluation in a separate R process (on current machine) | `plan(cluster, workers = "localhost")`
115
125
116
126
117
127
### Examples
118
128
119
-
Below is an examples illustrating how to use `batchtools_torque` to configure the batchtools backend. For further details and examples on how to configure batchtools, see the [batchtools configuration] wiki page.
129
+
Below is an examples illustrating how to use `batchtools_torque` to
130
+
configure the batchtools backend. For further details and examples on
131
+
how to configure batchtools, see the [batchtools configuration] wiki
132
+
page.
133
+
134
+
To configure **batchtools** for job schedulers we need to setup a
135
+
`*.tmpl` template file that is used to generate the script used by the
136
+
scheduler. This is what a template file for TORQUE / PBS may look
137
+
like:
120
138
121
-
To configure batchtools for job schedulers we need to setup a `*.tmpl` template
122
-
file that is used to generate the script used by the scheduler.
123
-
This is what a template file for TORQUE / PBS may look like:
124
139
```sh
125
140
#!/bin/bash
126
141
@@ -147,45 +162,69 @@ This is what a template file for TORQUE / PBS may look like:
147
162
## Launch R and evaluated the batchtools R job
148
163
Rscript -e 'batchtools::doJobCollection("<%= uri %>")'
149
164
```
150
-
If this template is saved to file `batchtools.torque.tmpl` (without period)
151
-
in the working directory or as `.batchtools.torque.tmpl` (with period) the
152
-
user's home directory, then it will be automatically located by the
153
-
batchtools framework and loaded when doing:
165
+
166
+
If this template is saved to file `batchtools.torque.tmpl` (without
167
+
period) in the working directory or as `.batchtools.torque.tmpl` (with
168
+
period) the user's home directory, then it will be automatically
169
+
located by the **batchtools** framework and loaded when doing:
170
+
154
171
```r
155
172
> plan(batchtools_torque)
156
173
```
157
-
Resource parameters can be specified via argument `resources` which should be a named list and is passed as is to the template file. For example, to request that each job would get alloted 12 cores (one a single machine) and up to 5 GiB of memory, use:
174
+
175
+
Resource parameters can be specified via argument `resources` which
176
+
should be a named list and is passed as is to the template file. For
177
+
example, to request that each job would get alloted 12 cores (one a
To specify the `resources` argument at the same time as using nested future strategies, one can use `tweak()` to tweak the default arguments. For instance,
184
+
To specify the `resources` argument at the same time as using nested
185
+
future strategies, one can use `tweak()` to tweak the default
causes the first level of futures to be submitted via the TORQUE job scheduler requesting 12 cores and 5 GiB of memory per job. The second level of futures will be evaluated using multisession using the 12 cores given to each job by the scheduler.
170
194
171
-
A similar filename format is used for the other types of job schedulers supported. For instance, for Slurm the template file should be named `./batchtools.slurm.tmpl` or `~/.batchtools.slurm.tmpl` in order for
195
+
causes the first level of futures to be submitted via the TORQUE job
196
+
scheduler requesting 12 cores and 5 GiB of memory per job. The second
197
+
level of futures will be evaluated using multisession using the 12
198
+
cores given to each job by the scheduler.
199
+
200
+
A similar filename format is used for the other types of job
201
+
schedulers supported. For instance, for Slurm the template file
202
+
should be named `./batchtools.slurm.tmpl` or
203
+
`~/.batchtools.slurm.tmpl` in order for
204
+
172
205
```r
173
206
> plan(batchtools_slurm)
174
207
```
175
-
to locate the file automatically. To specify this template file explicitly, use argument `template`, e.g.
208
+
209
+
to locate the file automatically. To specify this template file
0 commit comments