Skip to content

Commit 13cb8d0

Browse files
authored
Update vignettes for error behaviour and random number generation (#409)
* Update about mirai errors * Update for RNG
1 parent 621ebea commit 13cb8d0

File tree

4 files changed

+69
-55
lines changed

4 files changed

+69
-55
lines changed

dev/vignettes/_mirai.Rmd

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -187,8 +187,7 @@ m3[]
187187
188188
m3$data$stack.trace
189189
```
190-
Elements of the original error condition are also accessible via `$` on the error object.
191-
For example, additional metadata recorded by `rlang::abort()` is preserved:
190+
A 'miraiError' inherits from the original condition classes and hence can be caught or re-thrown. The elements of the original error condition are also accessible via `$` on the error object. Additional metadata recorded by `rlang::abort()` is preserved:
192191
```{r}
193192
#| label: metaexample
194193
f <- function(x) if (x > 0) stop("positive")
@@ -608,3 +607,13 @@ The daemons settings are saved under the named profile.
608607
To create a 'mirai' task using a specific compute profile, specify the `.compute` argument to `mirai()`, which uses the 'default' compute profile if this is `NULL`.
609608

610609
Similarly, functions such as `status()`, `launch_local()` or `launch_remote()` should be specified with the desired `.compute` argument.
610+
611+
### 9. Random Number Generation
612+
613+
mirai employs L'Ecuyer-CMRG streams for random number generation. This is a widely-adopted, statistically-sound method deemed safe for parallel computation, and the same as that employed by base R's own parallel package.
614+
615+
Streams essentially cut into the RNG's period (a very long sequence of pseudo-random numbers) at intervals that are far apart from each other that they do not in practice overlap. This ensures that statistical results obtained from parallel computations remain correct and valid. The method of generating streams is recursive.
616+
617+
By default (when the `seed` argument to `daemons()` is `NULL`) mirai initiates a new stream for each daemon launched, in the same manner as base R. This guarantees that the results are statistically-sound, although it does not guarantee numerical reproducibility between parallel runs. Firstly, using different numbers or workers would cause mirai to be sent to different workers. Secondly, when using dispatcher, mirai are sent dynamically to the next available daemon, and this is not guaranteed to be the same each time.
618+
619+
Supplying an explicit integer `seed` to `daemons()` turns on reproducible RNG. Instead of initiating a new stream for each daemon, now a stream is initiated for each `mirai()`. This is slightly computationally wasteful (although posing a negligible effect on performance), but it does guarantee the same results across runs, and regardless of the number of daemons used.

dev/vignettes/_v06-questions.Rmd

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -36,9 +36,8 @@ On the other hand, if your code previously used the `globals` argument to supply
3636
Note that this would only work in the case of a named list and not the other forms that `globals` can take.
3737

3838
Regardless of using a `mirai()` or `future_promise()`, we recommend that you pass globals explicitly in production code.
39-
This is as globals detection is never 100% perfect, and there is always some element of guesswork.
40-
Edge cases can lead to unpredictable failures or silently incorrect results.
41-
Explicit passing of variables allows for transparent and reliable behaviour, that remains completely robust over time.
39+
This is as globals detection is never 100% perfect, and there is always some element of guesswork, with edge cases leading to unpredictable results.
40+
Explicit passing of variables allows for transparent and reliable behaviour, remaining robust over time.
4241

4342
**Capture globals using `environment()`:**
4443

@@ -104,8 +103,7 @@ The random seed is not reset after each mirai call to ensure that however many r
104103

105104
Hence normally, the random seed should be set once on the host process when daemons are created, rather than in each daemon.
106105

107-
If it is required to set the seed in each daemon, this should be done using an independent method and set each time random draws are required.
108-
Another option would be to set the random seed within a local execution scope to prevent the global random seed on each daemon from being affected.
106+
For numerical reproducibility, set the `seed` argument to `daemons()` (see the Random Number Generation section of the reference vignette for further details).
109107

110108
### 3. Accessing package functions during development
111109

vignettes/mirai.Rmd

Lines changed: 50 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -83,9 +83,9 @@ unresolved(m)
8383
# Do other stuff
8484

8585
collect_mirai(m)
86-
#> [1] 3.981811 3.211179 4.608007 4.129569 2.873390
86+
#> [1] 2.825820 3.842733 5.808984 4.915793 2.771071
8787
m[]
88-
#> [1] 3.981811 3.211179 4.608007 4.129569 2.873390
88+
#> [1] 2.825820 3.842733 5.808984 4.915793 2.771071
8989
```
9090

9191
For easy programmatic use of `mirai()`, '.expr' accepts a pre-constructed language object, and also a list of named arguments passed via '.args'.
@@ -99,9 +99,9 @@ args <- list(time = 2L, mean = 4)
9999
m1 <- mirai(.expr = expr, .args = args)
100100
m2 <- mirai(.expr = expr, .args = args)
101101
m1[]
102-
#> [1] 2.439538 5.664712 5.820245 4.405887 4.009880
102+
#> [1] 5.747155 3.096813 3.011055 3.950488 3.467498
103103
m2[]
104-
#> [1] 4.575898 3.943637 3.164972 1.542555 4.505653
104+
#> [1] 3.812540 2.567866 4.761930 3.714069 2.813790
105105
```
106106
By running the above two calculations in parallel, they take roughly half the time as running sequentially (minus a relatively inconsequential parallelization overhead).
107107

@@ -165,9 +165,9 @@ for (i in 1:10) {
165165
#> iteration 3 successful
166166
#> iteration 4 successful
167167
#> iteration 5 successful
168+
#> Error: random error
168169
#> iteration 6 successful
169170
#> iteration 7 successful
170-
#> Error: random error
171171
#> iteration 8 successful
172172
#> iteration 9 successful
173173
#> iteration 10 successful
@@ -211,8 +211,7 @@ m3$data$stack.trace
211211
#> [[2]]
212212
#> f(1)
213213
```
214-
Elements of the original error condition are also accessible via `$` on the error object.
215-
For example, additional metadata recorded by `rlang::abort()` is preserved:
214+
A 'miraiError' inherits from the original condition classes and hence can be caught or re-thrown. The elements of the original error condition are also accessible via `$` on the error object. Additional metadata recorded by `rlang::abort()` is preserved:
216215

217216
``` r
218217
f <- function(x) if (x > 0) stop("positive")
@@ -287,7 +286,7 @@ status()
287286
#> [1] 6
288287
#>
289288
#> $daemons
290-
#> [1] "abstract://b1379cc31bd3ab70ab8177ce"
289+
#> [1] "ipc:///tmp/130164c8ebee629bd7eab602"
291290
#>
292291
#> $mirai
293292
#> awaiting executing completed
@@ -325,7 +324,7 @@ status()
325324
#> [1] 6
326325
#>
327326
#> $daemons
328-
#> [1] "abstract://8eea003087abef8dd2dbcca0"
327+
#> [1] "ipc:///tmp/2ef3f6b12e08ecf8ef29e17f"
329328
```
330329

331330
#### Everywhere
@@ -417,7 +416,7 @@ status()
417416
#> [1] 0
418417
#>
419418
#> $daemons
420-
#> [1] "tcp://192.168.1.71:39247"
419+
#> [1] "tcp://10.246.62.139:53122"
421420
#>
422421
#> $mirai
423422
#> awaiting executing completed
@@ -592,7 +591,7 @@ The printed return values may then be copy / pasted directly to a remote machine
592591
daemons(url = host_url())
593592
launch_remote()
594593
#> [1]
595-
#> Rscript -e 'mirai::daemon("tcp://192.168.1.71:44331")'
594+
#> Rscript -e 'mirai::daemon("tcp://10.246.62.139:53127")'
596595
daemons(0)
597596
```
598597

@@ -615,36 +614,36 @@ The generated self-signed certificate is available via `launch_remote()`, where
615614
``` r
616615
launch_remote(1)
617616
#> [1]
618-
#> Rscript -e 'mirai::daemon("tls+tcp://192.168.1.71:33845",tlscert=c("-----BEGIN CERTIFICATE-----
619-
#> MIIFPzCCAyegAwIBAgIBATANBgkqhkiG9w0BAQsFADA3MRUwEwYDVQQDDAwxOTIu
620-
#> MTY4LjEuNzExETAPBgNVBAoMCE5hbm9uZXh0MQswCQYDVQQGEwJKUDAeFw0wMTAx
621-
#> MDEwMDAwMDBaFw0zMDEyMzEyMzU5NTlaMDcxFTATBgNVBAMMDDE5Mi4xNjguMS43
622-
#> MTERMA8GA1UECgwITmFub25leHQxCzAJBgNVBAYTAkpQMIICIjANBgkqhkiG9w0B
623-
#> AQEFAAOCAg8AMIICCgKCAgEA0BKJWPjcIw3arv13ASXaclBtWvNI9+rcMfS6MsaD
624-
#> qQU15NuudCwzKnWe85TqYCfcFHbZh/n3Oa55zT/uFNd9Qu2sCq+P3+X6VycR2VNg
625-
#> sRroHze2TICBItL5LqGRk6NF+65VstODhEioG8oHyWI288GUnx2RvhUZda5vv1m7
626-
#> 8S3iKnTXFDBVf3sGxJnn4GZEfiZCBsCQcjSWuBomQMJOmFLlzcspJlLPDBsXFqud
627-
#> BuHILT6xGhehRNcTpTz5tU1BXlbhE8jG+kgQEZ4Ixzzztj2VfxC3zJgbgaB0bkmH
628-
#> Ch2M9Mw0+BmbVgaquz7Px+hfA9ifR/ByCT8EjPwdf+7Byx6xtLvsnnCFDcSUg+St
629-
#> TqGHWblfdeZDSBXIAwfFbqtGQRgObw1aY5x9u73b4gTtHyTW/WtSnr2slhfGYcph
630-
#> GKGW45EAAjMNi4EDyCeyqlE6mNeFDXRiyfnFq+fmBGVCpqZJIbrdEd1XX+gnCFqL
631-
#> 2aB89L2JR0X0A6bN74FpJv5xvr4hIS+xeQTjdXnQu4F0Ddor4OZkAru7tOBbE+w0
632-
#> zq2HkjAzZbMBRh3378q+mERh+RaZhvLm0nEkdCg9edwLAdtSc3rFwBRn5a+2mLqD
633-
#> bzrPsjoJUsNd42Sratr+miQ8iaEJoYrM1qUroq6Jzju3Um4rh2lE62ck2vd8+JBZ
634-
#> f20CAwEAAaNWMFQwEgYDVR0TAQH/BAgwBgEB/wIBADAdBgNVHQ4EFgQUL5j3PAL4
635-
#> uGO/brZqyILmGEswJHQwHwYDVR0jBBgwFoAUL5j3PAL4uGO/brZqyILmGEswJHQw
636-
#> DQYJKoZIhvcNAQELBQADggIBAA5v5NDYSF8MVw7t2q64Pi1QmXWEEk6sl4O0+B05
637-
#> CXKuxScRdoO1hJMG81kBt16FHwAQtaQaGG7Gkgf2aYbJsbykQXEqpdOfgUn+9DAJ
638-
#> 6l+rj7kX2STZa6ab2jZp2qnG1Usb7B8qMR0lfMVSl/WYVKH2tTT2t+2Kaigmr8E4
639-
#> FsUvkxHqjww7Vhn5Tnxlo7CerHMGlh/MtzY1NItTdfglBLSwigXwFFYlikVaOLS0
640-
#> iUfb/SVpjXdTdg8YBzDPdVrbMVPyHNqqOfmWUvWwXPp7JbB9D/ptCVRL7iiIrvQG
641-
#> zSBxf1ONTSrG+9juWzGimhlpDjI/4c+Qp3wy0a3W20kpzDnRH+APOWwA4+GixxtS
642-
#> j8ofbnkc1CEnJ/qau5fAt3JocNcth1UGnsRXFAtrH7KqB+mOnNwY4oD4cVmNRqIo
643-
#> 19CkKJD+5CVO70z6s0N3CZpL0g8BiKEpnvvzLrktS/1CVquN4XHesMOeeIhu93tX
644-
#> 4aCA3CCJzCvWoZihzicvT7Ol8jDibCQqnrLVPqravqOLMk9Iz7xUNheMG6njr/Zv
645-
#> T3O9Dg5nV9FmtIicSHMa1tcGxJ0Z6U/ubpwCFOrbvSV3oHD3SUeIe6CexjPq9Cql
646-
#> ZtoALoQUdrCCtBuzDeXo3rjo4i4f8MQnQFYjHV4RKDpE5N1t4i9wYK+uqskKet4Y
647-
#> ncJt
617+
#> Rscript -e 'mirai::daemon("tls+tcp://10.246.62.139:53131",tlscert=c("-----BEGIN CERTIFICATE-----
618+
#> MIIFQTCCAymgAwIBAgIBATANBgkqhkiG9w0BAQsFADA4MRYwFAYDVQQDDA0xMC4y
619+
#> NDYuNjIuMTM5MREwDwYDVQQKDAhOYW5vbmV4dDELMAkGA1UEBhMCSlAwHhcNMDEw
620+
#> MTAxMDAwMDAwWhcNMzAxMjMxMjM1OTU5WjA4MRYwFAYDVQQDDA0xMC4yNDYuNjIu
621+
#> MTM5MREwDwYDVQQKDAhOYW5vbmV4dDELMAkGA1UEBhMCSlAwggIiMA0GCSqGSIb3
622+
#> DQEBAQUAA4ICDwAwggIKAoICAQDVROsUMaz/zZopWtvzkfJPyMYpHm3CmFIVkcmn
623+
#> dfSaOiEqOe34Odcki76GdToIgbUsockBrzUIn0zpuUOWfRnBfbrXu62N8G7QAuJ3
624+
#> 6nSM8frjOfkxLZs+QyIXsbr1XDh62qbJaW8NlFjqbrc+YejxM+WjtHun6bEgK+N8
625+
#> 3Cq/AVebylxBn49r6Agh7e8tmFdMWN3TCUZKwXZxi8a+8aIIseejmQ4U8Kpq6RCJ
626+
#> xt7cbc34P0a39I+oqOLUWfoy70Ytj82uQfZIzyYP0lmTNJGozmz2+/sMpTJUS6zj
627+
#> CgKuBieG0H2vZe7RZlNPT8ClLle5yG8gFgUqAm3fCJGND3E2PrrXpfmiEE84MjAn
628+
#> qfEsa/+7heRV7jjLk4IxRZIZw3UV7w/RG8z7vMfCiWYhhIELZmoxauKIOg/mc2eR
629+
#> upVAeK/nhrqBo+vQ/yfh4GnPy5HkDWvo4POduIYOfi94GXHFaIMkuqy0Ojzoae9r
630+
#> 4FzZtTaO3uEy1SaywBdc7CEVgE1BJi8ka/+s0GlCs9WXCKrF+dIlHdOSR+1TO4Ko
631+
#> jqASUlnrupIaC8Q5FgYg8TJDlliK+rfR1kdWQSlvvCxlHbntXyoEQbP80SEuobgW
632+
#> Y6GIEfWJ+ZNnPDjnveytHNS2tQoNzL6y4qprHz9ksfsqRc3Nke7k3xSjo6rjKeqb
633+
#> HJQIZQIDAQABo1YwVDASBgNVHRMBAf8ECDAGAQH/AgEAMB0GA1UdDgQWBBTQPSj2
634+
#> pW0KmX+EV0PSwTCpH/9Q2TAfBgNVHSMEGDAWgBTQPSj2pW0KmX+EV0PSwTCpH/9Q
635+
#> 2TANBgkqhkiG9w0BAQsFAAOCAgEAqVGgBIV9eugmGGv6FGEf0tmm8HxCf5ywFJ+B
636+
#> 7KtgoIAq9GeDjKXhpc1H9RHB3Zv5+Baxq2ecnYkVr8qeVLFM/ileiYB3skod0Pgh
637+
#> uK2hP0qkZO5lWaYu72YiGPM/i/ub33kSOhuEae3c3H0+dEbNa/gizrV08BUXks+D
638+
#> PsA0eryTL4Cxr+gzh7EuggJokd7OAkYV9ZiE2uw3T6NCx0atb5uMFyHi63Q5bmG7
639+
#> lsmJ066J8KuB0HwwQZL290PLXuQBGmGijCSo1CESyKCBpmXlSCavj/DGZuPmwzny
640+
#> G4cWFY0Sft6I5+kNj3QP+nIpOT2C5o5NRPmuK2/OAYuX2jTXtMdG6AooWeCuinOz
641+
#> raLA7Gj91knOQqwkceVFKdZvvnmUB4niAS9OGxB9hCkJt3NSAJrrM2A5Y7/xLjDj
642+
#> wElwM1QCCILoxYNqpXSRiRYrvtzaM56zxxkURgP3QzxLPHU4AtQWUJQKpDtSJ8lF
643+
#> Fbc9czfVEnBsi1RlVCEt2P7I3QxTVjFBkQuFCjlWHpgMq7QYGNLlzRAdkHn03KBp
644+
#> VzbUUDytJdKERQKJaR24BzG77ptHDS2vejPmnJ1wWGvjjeTDn37KRnkBM/tJOnHc
645+
#> fhbgkijCfojLOFiTCV+iX3V+Z7XLWlLBihNbGNxF1xXRegQB1WddzC+UVj04fgZa
646+
#> B2o1ZUQ=
648647
#> -----END CERTIFICATE-----
649648
#> ",""))'
650649
```
@@ -690,3 +689,13 @@ The daemons settings are saved under the named profile.
690689
To create a 'mirai' task using a specific compute profile, specify the `.compute` argument to `mirai()`, which uses the 'default' compute profile if this is `NULL`.
691690

692691
Similarly, functions such as `status()`, `launch_local()` or `launch_remote()` should be specified with the desired `.compute` argument.
692+
693+
### 9. Random Number Generation
694+
695+
mirai employs L'Ecuyer-CMRG streams for random number generation. This is a widely-adopted, statistically-sound method deemed safe for parallel computation, and the same as that employed by base R's own parallel package.
696+
697+
Streams essentially cut into the RNG's period (a very long sequence of pseudo-random numbers) at intervals that are far apart from each other that they do not in practice overlap. This ensures that statistical results obtained from parallel computations remain correct and valid. The method of generating streams is recursive.
698+
699+
By default (when the `seed` argument to `daemons()` is `NULL`) mirai initiates a new stream for each daemon launched, in the same manner as base R. This guarantees that the results are statistically-sound, although it does not guarantee numerical reproducibility between parallel runs. Firstly, using different numbers or workers would cause mirai to be sent to different workers. Secondly, when using dispatcher, mirai are sent dynamically to the next available daemon, and this is not guaranteed to be the same each time.
700+
701+
Supplying an explicit integer `seed` to `daemons()` turns on reproducible RNG. Instead of initiating a new stream for each daemon, now a stream is initiated for each `mirai()`. This is slightly computationally wasteful (although posing a negligible effect on performance), but it does guarantee the same results across runs, and regardless of the number of daemons used.

vignettes/v06-questions.Rmd

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -29,9 +29,8 @@ On the other hand, if your code previously used the `globals` argument to supply
2929
Note that this would only work in the case of a named list and not the other forms that `globals` can take.
3030

3131
Regardless of using a `mirai()` or `future_promise()`, we recommend that you pass globals explicitly in production code.
32-
This is as globals detection is never 100% perfect, and there is always some element of guesswork.
33-
Edge cases can lead to unpredictable failures or silently incorrect results.
34-
Explicit passing of variables allows for transparent and reliable behaviour, that remains completely robust over time.
32+
This is as globals detection is never 100% perfect, and there is always some element of guesswork, with edge cases leading to unpredictable results.
33+
Explicit passing of variables allows for transparent and reliable behaviour, remaining robust over time.
3534

3635
**Capture globals using `environment()`:**
3736

@@ -79,10 +78,10 @@ vec2 <- 4:6
7978
# Returns different values: good
8079
mirai_map(list(vec, vec2), \(x) rnorm(x))[]
8180
#> [[1]]
82-
#> [1] 0.2112876 0.9041800 0.7834014
81+
#> [1] 0.38714685 0.09582403 0.85062845
8382
#>
8483
#> [[2]]
85-
#> [1] -0.3150949 -1.5628536 -0.3860887
84+
#> [1] 0.3188942 0.2086956 0.5288199
8685

8786
# Set the seed in the function
8887
mirai_map(list(vec, vec2), \(x) {
@@ -113,8 +112,7 @@ The random seed is not reset after each mirai call to ensure that however many r
113112

114113
Hence normally, the random seed should be set once on the host process when daemons are created, rather than in each daemon.
115114

116-
If it is required to set the seed in each daemon, this should be done using an independent method and set each time random draws are required.
117-
Another option would be to set the random seed within a local execution scope to prevent the global random seed on each daemon from being affected.
115+
For numerical reproducibility, set the `seed` argument to `daemons()` (see the Random Number Generation section of the reference vignette for further details).
118116

119117
### 3. Accessing package functions during development
120118

0 commit comments

Comments
 (0)