Skip to content

Commit 3edafec

Browse files
committed
Responded to reviews (design-doc #17)
1 parent 021d364 commit 3edafec

File tree

2 files changed

+16
-21
lines changed

2 files changed

+16
-21
lines changed

src/functions-reference/higher-order_functions.Rmd

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@ package MINPACK-1 [@minpack:1980].
135135

136136
The Jacobian of the solution with respect to auxiliary parameters is
137137
computed using the implicit function theorem. Intermediate Jacobians
138-
(of the the algebraic function's output with respect to the unknowns y
138+
(of the algebraic function's output with respect to the unknowns y
139139
and with respect to the auxiliary parameters theta) are computed using
140140
Stan's automatic differentiation.
141141

@@ -422,7 +422,7 @@ available in two variants:
422422
The higher-order reduce function takes a partial sum function `f`, an array argument `x`
423423
(with one array element for each term in the sum), a recommended
424424
`grainsize`, and a set of shared arguments. This representation allows
425-
to parallelize the resultant sum.
425+
parallelization of the resultant sum.
426426

427427
<!-- real; reduce_sum; (F f, T[] x, int grainsize, T1 s1, T2 s2, ...); -->
428428
\index{{\tt \bfseries reduce\_sum }!{\tt (F f, T[] x, int grainsize, T1 s1, T2 s2, ...): real}|hyperpage}
@@ -437,7 +437,7 @@ partial sums. `s1, s2, ...` are shared between all terms in the sum.
437437
* *`f`*: function literal referring to a function specifying the
438438
partial sum operation. Refer to the [partial sum function](#functions-partial-sum).
439439
* *`x`*: array of `T`, one for each term of the reduction, `T` can be any type,
440-
* *`grainsize`*: For `reduce_sum`, `grainsize` is the recommended size of the partial sum. For `reduce_sum_static`, `grainsize` determinse the maximum size of the partial sums, type `int`,
440+
* *`grainsize`*: For `reduce_sum`, `grainsize` is the recommended size of the partial sum (`grainsize = 1` means pick totally automatically). For `reduce_sum_static`, `grainsize` determines the maximum size of the partial sums, type `int`,
441441
* *`s1`*: first (optional) shared argument, type `T1`, where `T1` can be any type
442442
* *`s2`*: second (optional) shared argument, type `T2`, where `T2` can be any type,
443443
* *`...`*: remainder of shared arguments, each of which can be any type.

src/stan-users-guide/parallelization.Rmd

Lines changed: 13 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -36,14 +36,14 @@ inputs, then that sum looks like:
3636

3737
`g(x1) + g(x2) + ...`
3838

39-
The `reduce_sum` function is a tool for automatically parallelizing these
39+
`reduce_sum` and `reduce_sum_static` are tools for parallelizing these
4040
calculations.
4141

4242
For efficiency reasons the reduce function doesn’t work with the
4343
element-wise evaluated function `g`, but instead the partial
4444
sum function `f: U[] -> real`, where `f` computes the partial
4545
sum corresponding to a slice of the sequence `x` passed in. Due to the
46-
the associativity of the sum reduction it holds that:
46+
associativity of the sum reduction it holds that:
4747

4848
```
4949
g(x1) + g(x2) + g(x3) = f({ x1, x2, x3 })
@@ -68,12 +68,13 @@ reduce summation facility:
6868

6969
`grainsize` is the one tuning parameter. For `reduce_sum`, `grainsize` is
7070
a suggested partial sum size. A `grainsize` of 1 leaves the partitioning
71-
entirely up to the scheduler.
71+
entirely up to the scheduler. This should be the default way of using
72+
`reduce_sum` unless time is spent carefully picking `grainsize`. For picking a `grainsize`, see details [below](#reduce-sum-grainsize).
7273

7374
For `reduce_sum_static`, `grainsize` specifies the maximal partial sum size.
7475
With `reduce_sum_static` it is more important to choose `grainsize`
7576
carefully since it entirely determines the partitioning of work.
76-
See details in [more detail below](#reduce-sum-grainsize).
77+
See details [below](#reduce-sum-grainsize).
7778

7879
For efficiency and convenience additional
7980
shared arguments can be passed to every term in the sum. So for the
@@ -219,15 +220,15 @@ accordingly with `start:end`. With this function, reduce summation can
219220
be used to automatically parallelize the likelihood:
220221

221222
```
222-
int grainsize = 100;
223+
int grainsize = 1;
223224
target += reduce_sum(partial_sum, y,
224225
grainsize,
225226
x, beta);
226227
```
227228

228-
The reduce summation facility automatically breaks the sum into roughly `grainsize` sized pieces
229-
and computes them in parallel. `grainsize = 1` specifies that the `grainsize` should
230-
be estimated automatically. The final model looks as:
229+
The reduce summation facility automatically breaks the sum into pieces
230+
and computes them in parallel. `grainsize = 1` specifies that the
231+
`grainsize` should be estimated automatically. The final model looks as:
231232

232233
```
233234
functions {
@@ -247,7 +248,7 @@ parameters {
247248
vector[2] beta;
248249
}
249250
model {
250-
int grainsize = 100;
251+
int grainsize = 1;
251252
beta ~ std_normal();
252253
target += reduce_sum(partial_sum, y,
253254
grainsize,
@@ -257,25 +258,19 @@ model {
257258

258259
### Picking the Grainsize {#reduce-sum-grainsize}
259260

260-
For `grainsize` is a recommendation on how large each piece of
261-
parallel work is (how many terms it contains). When using the
262-
non-static version, it is recommended to choose 1 as a starting
263-
point as automatic aggregation of partial sums are performed. However,
264-
for the static version the `grainsize` defines the maximal size of the
265-
partial sums, e.g.
266-
267261
The rational for choosing a sensible `grainsize` is based on
268262
balancing the overhead implied by creating many small tasks versus
269263
creating fewer large tasks which limits the potential parallelism.
270264

271265
In `reduce_sum`, `grainsize` is a recommendation on how to partition
272266
the work in the partial sum into smaller pieces. A `grainsize` of 1
273-
leaves this entirely up to the internal scheduler. Ideally this will be
267+
leaves this entirely up to the internal scheduler and should be chosen
268+
if no benchmarking of other grainsizes is done. Ideally this will be
274269
efficient, but there are no guarantees.
275270

276271
In `reduce_sum_static`, `grainsize` is an upper limit on the worksize.
277272
Work will be split until all partial sums are just smaller than `grainsize`
278-
(and the split will happen the same way every time for the same data).
273+
(and the split will happen the same way every time for the same inputs).
279274
For the static version it is more important to select a sensible `grainsize`.
280275

281276
In order to figure out an optimal `grainsize`, if there are `N`

0 commit comments

Comments
 (0)