You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**Note:** Older versions of cmdstanr use `num_cores`and `num_chains` instead of `cores` and `chains`. If you get an error, [update](https://mc-stan.org/cmdstanr/)`cmdstanr`.
134
+
**Note:** Older versions of cmdstanr use `num_cores`, `cores`, and `num_chains` instead of `parallel_chains`, `threads_per_chain`, and `chains`. If you get an error, [update](https://mc-stan.org/cmdstanr/)`cmdstanr`.
135
135
136
-
In this case, we'll compute four chains using four cores. This computer has eight
137
-
cores though, and we'll see how we can make use of those other cores later.
138
-
The `elapsed` time is the time that we would have recorded if we were
139
-
timing this process with a stop-watch, so that is the one relevant to understanding
140
-
performance here (`system` time is time spent on system functions like I/O,
141
-
and `user` time is for parallel calculations).
136
+
In this case, we'll compute four chains running in parallel on four cores.
137
+
This computer has eight cores though, and we'll see how we can make use of
138
+
those other cores later. The `elapsed` time is the time that we would have
139
+
recorded if we were timing this process with a stop-watch, so that is the
140
+
one relevant to understanding performance here (`system` time is time spent
141
+
on system functions like I/O, and `user` time is for parallel calculations).
**Note:** If you get an error `Error in self$compile(...) : unused argument (cpp_options = list(stan_threads = TRUE))`, [update](https://mc-stan.org/cmdstanr/) your `cmdstanr` to the latest version (the threading interface was changed in May 2020, shortly after this tutorial was originally published).
269
269
270
-
271
-
Set the number of threads each chain will use with `set_num_threads`:
272
-
273
-
```{r}
274
-
set_num_threads(2)
275
-
```
276
-
277
-
As this computer has 8 cores and we intend to run the usual 4 chains,
278
-
we will use 2 threads per chain to make full use of the processor
279
-
(4 chains with 2 threads each can make use of the full eight cores).
280
-
The within-chain parallelism is generally less effiecient as compared to
281
-
running chains in parallel, but if greater single-chain speedups are desired,
282
-
then the user can choose to run fewer chains (i.e. 2 chains with 4 threads
283
-
each) or use more machines.
270
+
The computer I'm on has 8 cores and we want to make use of all of them. Running 4
271
+
chains in parallel gives us 2 threads per chain (which makes full
272
+
use of all 8 cores on the processor). The within-chain parallelism is generally
273
+
less efficient as compared to running chains in parallel, but if greater
274
+
single-chain speedups are desired, then the user can choose to run fewer
275
+
chains (i.e. 2 chains in parallel with 4 threads each).
**Note:** Older versions of cmdstanr use `num_cores`and `num_chains` instead of `cores` and `chains`. If you get an error, [update](https://mc-stan.org/cmdstanr/)`cmdstanr`.
289
+
**Note:** Older versions of cmdstanr use `num_cores`, `cores`, and `num_chains` instead of `parallel_chains`, `threads_per_chain`, and `chains`. If you get an error, [update](https://mc-stan.org/cmdstanr/)`cmdstanr`.
297
290
298
291
Again, `elapsed` time is the time recorded as if by a stopwatch. Computing the
299
-
ratios of the two times gives the speedup on eight cores of:
292
+
ratios of the two times gives a speedup on eight cores of:
300
293
301
294
```{r}
302
295
time0[["elapsed"]] / time1[["elapsed"]]
@@ -312,8 +305,8 @@ or model block, or do not do very much computation inside the reduce function,
312
305
or does not get lucky with caching, the speedup will be much more limited.
313
306
314
307
We can always get speedup in terms of effective sample size per time
315
-
by running multiple chains on different cores. `reduce_sum` is not a
316
-
replacement for that, and it is still important to run multiple chains
308
+
by running multiple chains in parallel on different cores. `reduce_sum` is not
309
+
a replacement for that, and it is still important to run multiple chains
317
310
to check diagnostics. `reduce_sum` is a tool for speeding up single chain
318
311
calculations, which can be useful for model development and on computers with
0 commit comments