Skip to content

Commit 4aab2cb

Browse files
committed
Updated reduce_sum tutorial for latest cmdstanr changes (stan-dev/cmdstanr#185)
1 parent 506a9ab commit 4aab2cb

File tree

1 file changed

+21
-28
lines changed

1 file changed

+21
-28
lines changed

knitr/reduce-sum/reduce_sum_tutorial.Rmd

Lines changed: 21 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: "Reduce Sum: A Minimal Example"
3-
date: "13 May 2020"
3+
date: "16 June 2020"
44
output: html_document
55
---
66

@@ -125,20 +125,20 @@ To sample from the model run:
125125

126126
```{r}
127127
time0 = system.time(fit0 <- logistic0$sample(redcard_data,
128-
cores = 4,
129128
chains = 4,
129+
parallel_chains = 4,
130130
refresh = 1000))
131131
132132
time0
133133
```
134-
**Note:** Older versions of cmdstanr use `num_cores` and `num_chains` instead of `cores` and `chains`. If you get an error, [update](https://mc-stan.org/cmdstanr/) `cmdstanr`.
134+
**Note:** Older versions of cmdstanr use `num_cores`, `cores`, and `num_chains` instead of `parallel_chains`, `threads_per_chain`, and `chains`. If you get an error, [update](https://mc-stan.org/cmdstanr/) `cmdstanr`.
135135

136-
In this case, we'll compute four chains using four cores. This computer has eight
137-
cores though, and we'll see how we can make use of those other cores later.
138-
The `elapsed` time is the time that we would have recorded if we were
139-
timing this process with a stop-watch, so that is the one relevant to understanding
140-
performance here (`system` time is time spent on system functions like I/O,
141-
and `user` time is for parallel calculations).
136+
In this case, we'll compute four chains running in parallel on four cores.
137+
This computer has eight cores though, and we'll see how we can make use of
138+
those other cores later. The `elapsed` time is the time that we would have
139+
recorded if we were timing this process with a stop-watch, so that is the
140+
one relevant to understanding performance here (`system` time is time spent
141+
on system functions like I/O, and `user` time is for parallel calculations).
142142

143143
## Rewriting the Model to Enable Multithreading
144144

@@ -267,36 +267,29 @@ logistic1 <- cmdstan_model("logistic1.stan", cpp_options = list(stan_threads = T
267267

268268
**Note:** If you get an error `Error in self$compile(...) : unused argument (cpp_options = list(stan_threads = TRUE))`, [update](https://mc-stan.org/cmdstanr/) your `cmdstanr` to the latest version (the threading interface was changed in May 2020, shortly after this tutorial was originally published).
269269

270-
271-
Set the number of threads each chain will use with `set_num_threads`:
272-
273-
```{r}
274-
set_num_threads(2)
275-
```
276-
277-
As this computer has 8 cores and we intend to run the usual 4 chains,
278-
we will use 2 threads per chain to make full use of the processor
279-
(4 chains with 2 threads each can make use of the full eight cores).
280-
The within-chain parallelism is generally less effiecient as compared to
281-
running chains in parallel, but if greater single-chain speedups are desired,
282-
then the user can choose to run fewer chains (i.e. 2 chains with 4 threads
283-
each) or use more machines.
270+
The computer I'm on has 8 cores and we want to make use of all of them. Running 4
271+
chains in parallel gives us 2 threads per chain (which makes full
272+
use of all 8 cores on the processor). The within-chain parallelism is generally
273+
less efficient as compared to running chains in parallel, but if greater
274+
single-chain speedups are desired, then the user can choose to run fewer
275+
chains (i.e. 2 chains in parallel with 4 threads each).
284276

285277
Run and time the model with:
286278

287279
```{r}
288280
redcard_data$grainsize <- 1
289281
time1 = system.time(fit1 <- logistic1$sample(redcard_data,
290-
cores = 4,
291282
chains = 4,
283+
parallel_chains = 4,
284+
threads_per_chain = 2,
292285
refresh = 1000))
293286
294287
time1
295288
```
296-
**Note:** Older versions of cmdstanr use `num_cores` and `num_chains` instead of `cores` and `chains`. If you get an error, [update](https://mc-stan.org/cmdstanr/) `cmdstanr`.
289+
**Note:** Older versions of cmdstanr use `num_cores`, `cores`, and `num_chains` instead of `parallel_chains`, `threads_per_chain`, and `chains`. If you get an error, [update](https://mc-stan.org/cmdstanr/) `cmdstanr`.
297290

298291
Again, `elapsed` time is the time recorded as if by a stopwatch. Computing the
299-
ratios of the two times gives the speedup on eight cores of:
292+
ratios of the two times gives a speedup on eight cores of:
300293

301294
```{r}
302295
time0[["elapsed"]] / time1[["elapsed"]]
@@ -312,8 +305,8 @@ or model block, or do not do very much computation inside the reduce function,
312305
or does not get lucky with caching, the speedup will be much more limited.
313306

314307
We can always get speedup in terms of effective sample size per time
315-
by running multiple chains on different cores. `reduce_sum` is not a
316-
replacement for that, and it is still important to run multiple chains
308+
by running multiple chains in parallel on different cores. `reduce_sum` is not
309+
a replacement for that, and it is still important to run multiple chains
317310
to check diagnostics. `reduce_sum` is a tool for speeding up single chain
318311
calculations, which can be useful for model development and on computers with
319312
large numbers of cores.

0 commit comments

Comments
 (0)