address review comments

weberse2 · weberse2 · commit 88311633c43e · 2020-04-07T09:26:01.000+02:00
diff --git a/src/functions-reference/higher-order_functions.Rmd b/src/functions-reference/higher-order_functions.Rmd
@@ -11,7 +11,7 @@ cat(' * <a href="functions-algebraic-solver.html">Algebraic Equation Solver</a>\
 cat(' * <a href="functions-ode-solver.html">Ordinary Differential Equation (ODE) Solvers</a>\n')
 cat(' * <a href="functions-1d-integrator.html">1D Integrator</a>\n')
 cat(' * <a href="functions-reduce.html">Reduce-Sum</a>\n')
-cat(' * <a href="functions-map.html">Higher-Order Map</a>\n')
+cat(' * <a href="functions-map.html">Map-Rect</a>\n')
 }
 ```
 
@@ -386,11 +386,11 @@ The gradients of the integral are computed in accordance with the Leibniz integr
 ## Reduce-Sum Function {#functions-reduce}
 
 Stan provides a higher-order reduce function for summation. A function
-`g: U -> real`, which returns a scalar, is mapped to every element of
-a list of type `U[]`, `{ x1, x2, ... }` and all the results are
+which returns a scalar `g: U -> real` is mapped to every element of a
+list of type `U[]`, `{ x1, x2, ... }` and all the results are
 accumulated,
 
-```g(x1) + g(x2) + ...```
+`g(x1) + g(x2) + ...`
 
 For efficiency reasons the reduce function doesn't work with the
 element-wise evaluated function `g` itself, but instead works through
@@ -410,12 +410,12 @@ exactly. This implies that the order of summation determines the exact
 numerical result. For this reason, the higher-order reduce function is
 available in two variants:
 
-* `reduce_sum`: Automatically forms partial sums resulting usually in good
- performance without further tuning.
-* `reduce_sum_static`: Creates for the same input always the same
-call graph resulting in stable numerical evaluation. This version
-requires setting a tuning parameter which controls the maximal size of partial
-sums formed.
+* `reduce_sum`: Compute partial sums automatically. This usually
+ results in good performance without further tuning.
+* `reduce_sum_static`: For the same input, always create the same call
+graph. This results in stable numerical evaluation. This version
+requires setting a tuning parameter which controls the maximal size of
+partial sums formed.
 
 ### Specifying the Reduce-sum Function
 
@@ -438,7 +438,7 @@ partial sums. `s1, s2, ...` are shared between all terms in the sum.
 partial sum operation. Refer to the [partial sum function](#functions-partial-sum).
 * *`x`*: array of `T`, one for each term of the reduction, `T` can be any type,
 * *`grainsize`*: recommended number of terms in each reduce call, set
-to one to estimate automatically for `reduce_sum` while for
+to 1 to estimate automatically for `reduce_sum` while for
 `reduce_sum_static` this determines the maximal size of the partial sums, type `int`,
 * *`s1`*: first (optional) shared argument, type `T1`, where `T1` can be any type
 * *`s2`*: second (optional) shared argument, type `T2`, where `T2` can be any type,
diff --git a/src/stan-users-guide/parallelization.Rmd b/src/stan-users-guide/parallelization.Rmd
@@ -34,7 +34,7 @@ function `g: U -> real`, which returns a scalar, to a list of type
 over the results. For instance, for a sequence of ```x``` values of
 type ```U```, ```{ x1, x2, ... }```, we might compute the sum:
 
-```g(x1) + g(x2) + ...```
+`g(x1) + g(x2) + ...`
 
 In probabilistic modeling this comes up when there are $N$
 conditionally independent terms in a likelihood. Because of the
@@ -71,10 +71,10 @@ call graph resulting in stable numerical evaluation. This version
 requires setting a sensible tuning parameter for good performance.
 
 The tuning parameter is the so-called `grainsize`. For the
-`reduce_sum` version the `grainsize` is merely a suggested partial sum
-size while for the `reduce_sum_static` version the `grainsize`
+`reduce_sum` version, the `grainsize` is merely a suggested partial sum
+size, while for the `reduce_sum_static` version the `grainsize`
 specifies the maximal partial sum size. While for `reduce_sum` a
-`grainsize` of one commonly leads to good performance already (since
+`grainsize` of 1 commonly leads to good performance already (since
 automatic aggregation is performed), the `reduce_sum_static` variant
 requires setting a sensible `grainsize` for good performance as
 explained in [more detail below](#reduce-sum-grainsize).
@@ -156,9 +156,7 @@ for(i in 1:size(x)) {
 
 Logistic regression is a useful example to clarify both the syntax
 and semantics of reduce summation and how it can be used to speed up a typical
-model.
-
-A basic logistic regression can be coded in Stan as:
+model. A basic logistic regression can be coded in Stan as:
 
 ```
 data {
@@ -177,7 +175,6 @@ model {
 
 In this model predictions are made about the `N` outputs `y` using the
 covariate `x`. The intercept and slope of the linear equation are to be estimated.
-
 The key point to getting this calculation to use reduce summation, is recognizing that
 the statement:
 
@@ -194,13 +191,10 @@ for(n in 1:N) {
 
 Now it is clear that the calculation is the sum of a number of conditionally
 independent Bernoulli log probability statements, which is the condition where
-reduce summation is useful.
-
-To use the reduce summation, a function must be written that can be used to compute
-arbitrary partial sums of the total sum.
-
-Using the interface defined in [Reduce-Sum](#reduce-sum), such a function
-can be written like:
+reduce summation is useful. To use the reduce summation, a function
+must be written that can be used to compute arbitrary partial sums of
+the total sum. Using the interface defined in
+[Reduce-Sum](#reduce-sum), such a function can be written like:
 
 ```
 functions {
@@ -213,20 +207,20 @@ functions {
 }
 ```
 
-And the likelihood statement in the model can now be written:
+The likelihood statement in the model can now be written:
 
 ```
 target += partial_sum(1, N, y, x, beta); // Sum terms 1 to N of the likelihood
 ```
 
 In this example, `y` was chosen to be sliced over because there
 is one term in the summation per value of `y`. Technically `x` would  have
-worked as well. Use whatever conceptually makes the most sense.
-
-Because `x` is a shared argument, it is subset accordingly with `start:end`.
-
-With this function, reduce summation can be used to automatically parallelize the
-likelihood:
+worked as well. Use whatever conceptually makes the most
+sense for a given model, e.g. slice over independent terms like
+conditionally independent observations or groups of observations as in
+hierarchical models. Because `x` is a shared argument, it is subset
+accordingly with `start:end`. With this function, reduce summation can
+be used to automatically parallelize the likelihood:
 
 ```
 int grainsize = 100;
@@ -237,7 +231,7 @@ target += reduce_sum(partial_sum, y,
 
 The reduce summation facility automatically breaks the sum into roughly `grainsize` sized pieces
 and computes them in parallel. `grainsize = 1` specifies that the grainsize should
-be estimated automatically. The final model looks like:
+be estimated automatically. The final model looks as:
 
 ```
 functions {
@@ -269,7 +263,7 @@ model {
 
 The `grainsize` is a recommendation on how large each piece of
 parallel work is (how many terms it contains). When using the
-non-static version, it is recommended to choose one as a starting
+non-static version, it is recommended to choose 1 as a starting
 point as automatic aggregation of partial sums are performed. However,
 for the static version the `grainsize` defines the maximal size of the
 partial sums, e.g. the static variant will split the input sequence