You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: dev/vignettes/_mirai.Rmd
+74-39Lines changed: 74 additions & 39 deletions
Original file line number
Diff line number
Diff line change
@@ -39,7 +39,7 @@ A mirai is either *unresolved* if the result has yet to be received, or *resolve
39
39
For a mirai `m`, the result is available at `m$data` once it has resolved.
40
40
Normally this will be the return value of the evaluated expression.
41
41
If the expression errored, caused the process to crash, or timed out then this will be an 'errorValue' instead.
42
-
See the section [Errors in a mirai](#errors-in-a-mirai) below.
42
+
See the section [Error Handling](#error-handling) below.
43
43
44
44
Rather than repeatedly checking `unresolved(m)`, it is more efficient to wait for and collect its value by using `m[]`.
45
45
@@ -161,7 +161,7 @@ for (i in 1:10) {
161
161
By testing the return value of each mirai for errors, error-handling code is able to automate recovery and re-attempts, as above.
162
162
The result is a resilient and fault-tolerant pipeline that minimizes downtime by eliminating interruptions of long computes.
163
163
164
-
### 3. Errors in a mirai
164
+
### 3. Error Handling
165
165
166
166
If execution in a mirai fails, the error message is returned as a character string of class 'miraiError' and 'errorValue' to facilitate debugging.
167
167
@@ -218,7 +218,17 @@ is_error_value(m5$data)
218
218
```
219
219
`is_error_value()` tests for all mirai execution errors, user interrupts and timeouts.
220
220
221
-
### 4. Local Daemons
221
+
### 4. Random Number Generation
222
+
223
+
mirai employs L'Ecuyer-CMRG streams for random number generation in the same way as base R's own parallel package. This is a widely-adopted, statistically-sound method, suitable for parallel computation.
224
+
225
+
Streams essentially cut into the RNG's period (a very long sequence of pseudo-random numbers) at intervals that are far apart from each other that they do not in practice overlap. This ensures that statistical results obtained from parallel computations remain correct and valid. The method of generating streams is recursive.
226
+
227
+
By default (when the `seed` argument to `daemons()` is `NULL`) mirai initiates a new stream for each daemon launched, in the same manner as base R. This guarantees that the results are statistically-sound, although it does not guarantee numerical reproducibility between parallel runs. Firstly, using different numbers of daemons would cause mirai tasks to be sent to different daemons. Secondly, when using dispatcher, mirai tasks are sent dynamically to the next available daemon, and this is not guaranteed to be the same one on each run.
228
+
229
+
Supplying an explicit integer `seed` to `daemons()` turns on reproducible RNG. Instead of initiating a new stream for each daemon, now a stream is initiated for each `mirai()`. This is slightly computationally wasteful (although posing a negligible effect on performance), but it does guarantee the same results across runs, and regardless of the number of daemons used.
230
+
231
+
### 5. Local Daemons
222
232
223
233
Daemons, or persistent background processes, may be set to receive `mirai()` requests.
224
234
@@ -286,7 +296,7 @@ Requesting the status now shows 6 connections, along with the host URL:
286
296
status()
287
297
```
288
298
289
-
#### Everywhere
299
+
#### everywhere()
290
300
291
301
`everywhere()` may be used to evaluate an expression on all connected daemons and persist the resultant state, regardless of a daemon's 'cleanup' setting.
292
302
```{r}
@@ -330,27 +340,7 @@ everywhere(
330
340
daemons(0)
331
341
```
332
342
333
-
#### With Method
334
-
335
-
`daemons()` has a `with()` method, which evaluates an expression with daemons created for the duration of the expression and automatically torn down upon completion.
336
-
337
-
It was originally designed for running a Shiny app with the desired number of daemons, as in the example below:
338
-
339
-
```{r}
340
-
#| label: withshiny
341
-
#| eval: false
342
-
with(daemons(4), shiny::runApp(app))
343
-
```
344
-
345
-
> Note: it is assumed the app is already created.
346
-
Wrapping a call to `shiny::shinyApp()` would not work as `runApp()` is implicitly called when the app is printed, however printing occurs only after `with()` has returned, hence the app would run outside of the scope of the `with()` statement.
347
-
348
-
In the case of a Shiny app, all mirai calls will be executed before the app returns as the app itself is blocking.
349
-
In the case of other expressions, be sure to call the results (or collect the values) of all mirai within the expression to ensure that they all complete before the daemons are torn down.
350
-
351
-
If specifying a [compute profile](#compute-profiles) for the `daemons()` call, all calls with `.compute = NULL` within the `with()` clause will default to this compute profile.
352
-
353
-
### 5. Remote Daemons
343
+
### 6. Remote Daemons
354
344
355
345
The daemons interface may also be used to send tasks for computation to remote daemon processes on the network.
356
346
@@ -381,9 +371,9 @@ daemons(0)
381
371
```
382
372
Closing the connection causes all connected daemons to exit automatically. If using dispatcher, it will cause dispatcher to exit, and in turn all connected daemons when their respective connections with the dispatcher are terminated.
383
373
384
-
### 6. Launching Remote Daemons
374
+
### 7. Launching Remote Daemons
385
375
386
-
The launcher analogy is appropriate, as these are ways of executing a daemon on the machine of your choice, very much like launching a satellite. Once deployed, the daemon connects back to your host process through it's own communications (TCP or TLS over TCP).
376
+
The launcher analogy is appropriate, as these are ways of deploying a daemon on the machine of your choice, very much like launching a satellite. Once deployed, the daemon connects back to your host process through it's own communications (TCP or TLS over TCP).
387
377
388
378
The local launcher simply runs an `Rscript` instance via a local shell. The remote launcher uses a method to run this `Rscript` command on a remote machine.
389
379
@@ -397,7 +387,7 @@ There are currently 3 options for generating remote launch configurations:
397
387
398
388
The return value of all of these functions is a simple list. This means that they may be pre-constructed, saved and re-used whenever the same configuration is required.
399
389
400
-
#### i. SSH Direct Connection
390
+
#### SSH Direct Connection
401
391
402
392
This method is appropriate for internal networks and in trusted, properly-configured environments where it is safe for your machine to accept incoming connections on certain ports.
403
393
In the examples below, the remote daemons connect back directly to port 5555 on the local machine.
@@ -425,7 +415,7 @@ daemons(
425
415
)
426
416
```
427
417
428
-
#### ii. SSH Tunnelling
418
+
#### SSH Tunnelling
429
419
430
420
Use SSH tunnelling to launch daemons on any machine you are able to access via SSH, whether on the local network or the cloud.
431
421
SSH key-based authentication must already be in place, but no other configuration is required.
@@ -456,7 +446,7 @@ daemons(
456
446
)
457
447
```
458
448
459
-
#### iii. HPC Cluster Resource Managers
449
+
#### HPC Cluster Resource Managers
460
450
461
451
`cluster_config()` may be used to deploy daemons using a cluster resource manager / scheduler.
462
452
@@ -514,7 +504,7 @@ daemons(
514
504
)
515
505
```
516
506
517
-
#### iv. Generic Remote Configuration
507
+
#### Generic Remote Configuration
518
508
519
509
`remote_config()` provides a generic, flexible framework for running any shell command that may be used to deploy daemons.
520
510
@@ -534,7 +524,7 @@ daemons(
534
524
)
535
525
```
536
526
537
-
#### v. Manual Deployment
527
+
#### Manual Deployment
538
528
539
529
As an alternative to automated launches, calling `launch_remote()` without specifying 'remote' may be used to return the shell commands for deploying daemons manually.
540
530
@@ -546,7 +536,7 @@ launch_remote()
546
536
daemons(0)
547
537
```
548
538
549
-
### 7. TLS Secure Connections
539
+
### 8. TLS Secure Connections
550
540
551
541
TLS provides a robust solution for securing communications from the local machine to remote daemons.
552
542
@@ -594,7 +584,7 @@ The CA may be a public CA or internal to an organisation.
594
584
- If these are concatenated together as a single character string `certchain`, then the character vector comprising this and an empty character string `c(certchain, "")` may be supplied to 'tlscert'.
595
585
- Alternatively, if these are written to a file (and the file replicated on the remote machines), then the 'tlscert' argument may also be specified as a path/filename (assuming these are the same on each machine).
596
586
597
-
### 8. Compute Profiles
587
+
### 9. Compute Profiles
598
588
599
589
`daemons()` has a `.compute` argument to specify separate sets of daemons (*compute profiles*) that operate totally independently. This is useful for managing tasks with heterogeneous compute requirements:
600
590
@@ -608,12 +598,57 @@ To create a 'mirai' task using a specific compute profile, specify the `.compute
608
598
609
599
Similarly, functions such as `status()`, `launch_local()` or `launch_remote()` should be specified with the desired `.compute` argument.
610
600
611
-
###9. Random Number Generation
601
+
#### `with_daemons()` and `local_daemons()`
612
602
613
-
mirai employs L'Ecuyer-CMRG streams for random number generation. This is a widely-adopted, statistically-sound method deemed safe for parallel computation, and the same as that employed by base R's own parallel package.
603
+
`daemons()` returns (invisibly) the compute profile of daemons created as a character string. Supplying this to `with_daemons()` or `local_daemons()` automatically sets the default compute profile of all package functions within the relevant scope.
614
604
615
-
Streams essentially cut into the RNG's period (a very long sequence of pseudo-random numbers) at intervals that are far apart from each other that they do not in practice overlap. This ensures that statistical results obtained from parallel computations remain correct and valid. The method of generating streams is recursive.
605
+
```{r}
606
+
#| label: withdaemons
607
+
d1 <- daemons(1, .compute = "cpu")
608
+
d2 <- daemons(1, .compute = "gpu")
616
609
617
-
By default (when the `seed` argument to `daemons()` is `NULL`) mirai initiates a new stream for each daemon launched, in the same manner as base R. This guarantees that the results are statistically-sound, although it does not guarantee numerical reproducibility between parallel runs. Firstly, using different numbers or workers would cause mirai to be sent to different workers. Secondly, when using dispatcher, mirai are sent dynamically to the next available daemon, and this is not guaranteed to be the same each time.
610
+
with_daemons(d1, {
611
+
s1 <- status()
612
+
m1 <- mirai(Sys.getpid())
613
+
})
618
614
619
-
Supplying an explicit integer `seed` to `daemons()` turns on reproducible RNG. Instead of initiating a new stream for each daemon, now a stream is initiated for each `mirai()`. This is slightly computationally wasteful (although posing a negligible effect on performance), but it does guarantee the same results across runs, and regardless of the number of daemons used.
615
+
with_daemons(d2, {
616
+
s2 <- status()
617
+
m2 <- mirai(Sys.getpid())
618
+
m3 <- mirai(Sys.getpid(), .compute = "cpu")
619
+
local_daemons(d1)
620
+
m4 <- mirai(Sys.getpid())
621
+
})
622
+
623
+
s1$daemons
624
+
m1[]
625
+
626
+
s2$daemons
627
+
m2[] # different to m1
628
+
629
+
m3[] # same as m1
630
+
m4[] # same as m1
631
+
632
+
with_daemons("cpu", daemons(0))
633
+
with_daemons("gpu", daemons(0))
634
+
```
635
+
636
+
#### With Method
637
+
638
+
`daemons()` also has a `with()` method, which evaluates an expression with daemons created for the duration of the expression and automatically torn down upon completion. All package functions within the `with()` scope default to using the compute profile of the daemons created.
639
+
640
+
It was originally designed for running a Shiny app with the desired number of daemons, as in the example below:
Wrapping a call to `shiny::shinyApp()` would not work as `runApp()` is implicitly called when the app is printed, however printing occurs only after `with()` has returned, hence the app would run outside of the scope of the `with()` statement.
652
+
653
+
In the case of a Shiny app, all mirai calls will be executed before the app returns as the app itself is blocking.
654
+
In the case of other expressions, be sure to call or collect the values of all mirai within the expression to ensure that they all complete before the daemons are torn down.
0 commit comments