Skip to content

Commit 43f7347

Browse files
README: full revamp
1 parent 5a250e0 commit 43f7347

File tree

3 files changed

+109
-226
lines changed

3 files changed

+109
-226
lines changed

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
Package: future.batchtools
2-
Version: 0.12.2-9967
2+
Version: 0.12.2-9968
33
Depends:
44
R (>= 3.2.0),
55
parallelly,

README.md

Lines changed: 54 additions & 112 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,39 @@
66

77
# future.batchtools: A Future API for Parallel and Distributed Processing using 'batchtools'
88

9+
## TL;DR
10+
11+
Here is an example on how evaluate R expression on a Slurm
12+
high-performance compute (HPC) cluster.
13+
14+
```r
15+
library(future)
16+
17+
# Limit runtime to 10 minutes and memory to 400 MiB per future,
18+
# request a parallel environment with four slots on a single host.
19+
# On this system, R is available via environment module 'r'. By
20+
# specifying 'r/4.5.1', 'module load r/4.5.1' will be added to
21+
# the submitted job script.
22+
plan(future.batchtools::batchtools_slurm, resources = list(
23+
time = "00:10:00", mem = "400M", nodes=1, ntasks=4,
24+
modules = c("r/4.5.1")
25+
))
26+
27+
# Give it a spin
28+
f <- future({
29+
data.frame(
30+
hostname = Sys.info()[["nodename"]],
31+
os = Sys.info()[["sysname"]],
32+
cores = unname(parallelly::availableCores()),
33+
modules = Sys.getenv("LOADEDMODULES")
34+
)
35+
})
36+
info <- value(f)
37+
print(info)
38+
#> hostname os cores modules
39+
#> 1 n12 Linux 4 r/4.5.1
40+
```
41+
942
## Introduction
1043

1144
The **[future]** package provides a generic API for using futures in
@@ -24,93 +57,50 @@ high-performance computing (HPC) clusters via a simple switch in
2457
settings - without having to change any code at all.
2558

2659
For instance, if **batchtools** is properly configured, the below two
27-
expressions for futures `x` and `y` will be processed on two different
28-
compute nodes:
60+
expressions for two futures will be processed on two different compute
61+
nodes:
2962

3063
```r
3164
library(future)
3265
plan(future.batchtools::batchtools_slurm)
3366

34-
x %<-% { Sys.sleep(5); 3.14 }
35-
y %<-% { Sys.sleep(5); 2.71 }
67+
f_x <- future({ Sys.sleep(5); 3.14 })
68+
f_y <- future({ Sys.sleep(5); 2.71 })
69+
x <- value(f_x)
70+
y <- value(f_y)
3671
x + y
3772
#> [1] 5.85
3873
```
3974

4075
This is just a toy example to illustrate what futures look like and
4176
how to work with them.
4277

43-
A more realistic example comes from the field of cancer research
44-
where very large data FASTQ files, which hold a large number of short
45-
DNA sequence reads, are produced. The first step toward a biological
46-
interpretation of these data is to align the reads in each sample
47-
(one FASTQ file) toward the human genome. In order to speed this up,
48-
we can have each file be processed by a separate compute node and each
49-
node we can use 24 parallel processes such that each process aligns a
50-
separate chromosome. Here is an outline of how this nested parallelism
51-
could be implemented using futures.
52-
53-
```r
54-
library(future)
55-
library(listenv)
56-
57-
## The first level of futures should be submitted to the
58-
## cluster using batchtools. The second level of futures
59-
## should be using multisession, where the number of
60-
## parallel processes is automatically decided based on
61-
## what the cluster grants to each compute node.
62-
plan(list(future.batchtools::batchtools_slurm, multisession))
63-
64-
## Find all samples (one FASTQ file per sample)
65-
fqs <- dir(pattern = "[.]fastq$")
66-
67-
## The aligned results are stored in BAM files
68-
bams <- listenv()
69-
70-
## For all samples (FASTQ files) ...
71-
for (ss in seq_along(fqs)) {
72-
fq <- fqs[ss]
73-
74-
## ... use futures to align them ...
75-
bams[[ss]] %<-% {
76-
bams_ss <- listenv()
77-
## ... and for each FASTQ file use a second layer
78-
## of futures to align the individual chromosomes
79-
for (cc in 1:24) {
80-
bams_ss[[cc]] %<-% htseq::align(fq, chr = cc)
81-
}
82-
## Resolve the "chromosome" futures and return as a list
83-
as.list(bams_ss)
84-
}
85-
}
86-
## Resolve the "sample" futures and return as a list
87-
bams <- as.list(bams)
88-
```
78+
For an introduction as well as full details on how to use futures,
79+
please see <https://www.futureverse.org> or consult the package
80+
vignettes of the **[future]** package.
8981

90-
Note that a user who do not have access to a cluster could use the
91-
same script processing samples sequentially and chromosomes in
92-
parallel on a single machine using:
9382

94-
```r
95-
plan(list(sequential, multisession))
96-
```
83+
## Demos
9784

98-
or samples in parallel and chromosomes sequentially using:
85+
The **[future]** package provides a demo using futures for calculating
86+
a set of Mandelbrot planes. The demo does not assume anything about
87+
what type of futures are used. _The user has full control of how
88+
futures are evaluated_. For instance, to use local batchtools
89+
futures, run the demo as:
9990

10091
```r
101-
plan(list(multisession, sequential))
92+
library(future)
93+
plan(future.batchtools::batchtools_local)
94+
demo("mandelbrot", package = "future", ask = FALSE)
10295
```
10396

104-
For an introduction as well as full details on how to use futures,
105-
please consult the package vignettes of the **[future]** package.
106-
107-
10897

109-
## Choosing batchtools backend
98+
## Available batchtools backend
11099

111100
The **future.batchtools** package implements a generic future wrapper
112101
for all batchtools backends. Below are the most common types of
113-
batchtools backends.
102+
batchtools backends. For other types of parallel and distributed
103+
backends, please see <https://www.futureverse.org/backends.html>.
114104

115105

116106
| Backend | Description | Alternative in future package
@@ -125,64 +115,16 @@ batchtools backends.
125115
| `batchtools_local` | sequential evaluation in a separate R process (on current machine) | `plan(cluster, workers = I(1))`
126116

127117

128-
### Examples
129-
130-
Below is an examples on how use resolve futures via a Slurm scheduler.
131-
132-
```r
133-
library(future)
134-
135-
# Limit runtime to 10 minutes and memory to 400 MiB per future,
136-
# request a parallel environment with four slots on a single host.
137-
# On this system, R is available via environment module 'r'. By
138-
# specifying 'r/4.5.1', 'module load r/4.5.1' will be added to
139-
# the submitted job script.
140-
plan(future.batchtools::batchtools_slurm, resources = list(
141-
time = "00:10:00", mem = "400M", nodes=1, ntasks=4,
142-
modules = c("r/4.5.1")
143-
))
144-
145-
# Give it a spin
146-
f <- future({
147-
data.frame(
148-
hostname = Sys.info()[["nodename"]],
149-
os = Sys.info()[["sysname"]],
150-
cores = unname(parallelly::availableCores()),
151-
modules = Sys.getenv("LOADEDMODULES")
152-
)
153-
})
154-
info <- value(f)
155-
print(info)
156-
#> hostname os cores modules
157-
#> 1 n12 Linux 4 r/4.5.1
158-
```
159-
160-
## Demos
161-
162-
The **[future]** package provides a demo using futures for calculating
163-
a set of Mandelbrot planes. The demo does not assume anything about
164-
what type of futures are used. _The user has full control of how
165-
futures are evaluated_. For instance, to use local batchtools
166-
futures, run the demo as:
167-
168-
```r
169-
library(future)
170-
plan(future.batchtools::batchtools_local)
171-
demo("mandelbrot", package = "future", ask = FALSE)
172-
```
173118

174119

175120
[batchtools]: https://cran.r-project.org/package=batchtools
176-
[brew]: https://cran.r-project.org/package=brew
177121
[future]: https://cran.r-project.org/package=future
178122
[future.batchtools]: https://cran.r-project.org/package=future.batchtools
179-
[batchtools configuration]: https://batchtools.mlr-org.com/articles/batchtools.html
180123
[TORQUE]: https://en.wikipedia.org/wiki/TORQUE
181124
[Slurm]: https://en.wikipedia.org/wiki/Slurm_Workload_Manager
182125
[Sun/Oracle Grid Engine (SGE)]: https://en.wikipedia.org/wiki/Oracle_Grid_Engine
183126
[Load Sharing Facility (LSF)]: https://en.wikipedia.org/wiki/Platform_LSF
184127
[OpenLava]: https://en.wikipedia.org/wiki/OpenLava
185-
[Docker Swarm]: https://docs.docker.com/swarm/
186128

187129
## Installation
188130
R package future.batchtools is available on [CRAN](https://cran.r-project.org/package=future.batchtools) and can be installed in R as:

0 commit comments

Comments
 (0)