Skip to content

Commit 50cdcae

Browse files
Merge pull request #168 from torkar/minor_fixes
minor fixes for Part I of User's Guide
2 parents 601b20d + efbece5 commit 50cdcae

16 files changed

+71
-96
lines changed

src/bibtex/all.bib

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1082,15 +1082,6 @@ @article{HoerlKennard:1970
10821082
pages = {55--67}
10831083
}
10841084

1085-
@article{Hoffman-Gelman:2011,
1086-
Author = {Hoffman, Matthew D. and Gelman, Andrew},
1087-
Title = {The No-{U}-Turn Sampler: Adaptively Setting Path Lengths in {H}amiltonian {M}onte {C}arlo},
1088-
Journal = {arXiv},
1089-
Volume = {1111.4246},
1090-
url = {http://arxiv.org/abs/1111.4246},
1091-
Year = {2011}
1092-
}
1093-
10941085
@article{Hoffman-Gelman:2014,
10951086
Title = {{T}he {N}o-{U}-{T}urn {S}ampler: {A}daptively {S}etting {P}ath {L}engths in {H}amiltonian {M}onte {C}arlo},
10961087
Author = {Hoffman, Matthew D. and Gelman, Andrew},
@@ -1413,7 +1404,7 @@ @phdthesis{Schofield:2007
14131404
author = {Schofield, Matthew R.},
14141405
year = {2007},
14151406
title = {Hierarchical Capture-Recapture Models},
1416-
school = {Department of of Statistics, University of Otago, Dunedin}
1407+
school = {Department of Statistics, University of Otago, Dunedin}
14171408
}
14181409

14191410
@article{SmithSpiegelhalterThomas:1995,

src/reference-manual/execution.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,7 @@ If the user specifies the number of leapfrog steps (i.e., chooses to
171171
use standard HMC), that number of leapfrog steps are simulated. If
172172
the user has not specified the number of leapfrog steps, the No-U-Turn
173173
sampler (NUTS) will determine the number of leapfrog steps adaptively
174-
[@Hoffman-Gelman:2011], [@Hoffman-Gelman:2014].
174+
[@Hoffman-Gelman:2014].
175175

176176

177177
### Log Probability and Gradient Calculation {-}

src/reference-manual/mcmc.Rmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -326,7 +326,7 @@ using the notation of @Hoffman-Gelman:2014. In practice, the efficacy
326326
of the optimization is sensitive to the value of these parameters, but
327327
we do not recommend changing the defaults without experience with the
328328
dual-averaging algorithm. For more information, see the discussion of
329-
dual averaging in @Hoffman-Gelman:2011, Hoffman-Gelman:2014.
329+
dual averaging in Hoffman-Gelman:2014.
330330

331331
The full set of dual-averaging parameters are
332332

@@ -474,7 +474,7 @@ e.g., @RobertsEtAl:1997) at each step and avoid the random-walk
474474
behavior that arises in random-walk Metropolis or Gibbs samplers when
475475
there is correlation in the posterior. For a precise definition of the
476476
NUTS algorithm and a proof of detailed balance, see
477-
@Hoffman-Gelman:2011, @Hoffman-Gelman:2014.
477+
@Hoffman-Gelman:2014.
478478

479479
NUTS generates a proposal by starting at an initial position
480480
determined by the parameters drawn in the last iteration. It then

src/stan-users-guide/algebraic-equations.Rmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ A system of algebraic equations is coded directly in Stan as a
3131
function with a strictly specified signature. For example, the
3232
nonlinear system given above can be coded using the
3333
following function in Stan (see the [user-defined functions
34-
section](#functions-programming) for more information on coding
34+
section](#functions-programming.chapter) for more information on coding
3535
user-defined functions).
3636

3737
```
@@ -136,7 +136,7 @@ do so, the current metropolis proposal gets rejected.
136136

137137
## Control Parameters for the Algebraic Solver {#algebra-control.section}
138138

139-
The call to the algebraic solver shown above uses the default control settings. The solver
139+
The call to the algebraic solver shown previously uses the default control settings. The solver
140140
allows three additional parameters, all of which must be supplied if any of them is
141141
supplied.
142142

src/stan-users-guide/clustering.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -422,7 +422,7 @@ parameters.
422422
## Latent Dirichlet Allocation
423423

424424
Latent Dirichlet allocation (LDA) is a mixed-membership multinomial
425-
clustering model @BleiNgJordan:2003 that generalized naive
425+
clustering model [@BleiNgJordan:2003] that generalizes naive
426426
Bayes. Using the topic and document terminology common in discussions of
427427
LDA, each document is modeled as having a mixture of topics, with each
428428
word drawn from a topic based on the mixing proportions.

src/stan-users-guide/finite-mixtures.Rmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -496,7 +496,7 @@ on the linear scale; it is defined to be equal to `log(exp(lp1) + exp(lp2))`, bu
496496

497497
The code given above to compute the zero-inflated Poisson
498498
redundantly calculates all of the Bernoulli terms and also
499-
`poisson_lpmf(0 \mid lambda)` every time the first condition
499+
`poisson_lpmf(0 | lambda)` every time the first condition
500500
body executes. The use of the redundant terms is conditioned on
501501
`y`, which is known when the data are read in. This allows
502502
the transformed data block to be used to compute some more convenient
@@ -650,7 +650,7 @@ transformed data {
650650
}
651651
```
652652

653-
The model block can then be reduced to three statements.
653+
The model block is then reduced to three statements.
654654

655655
```
656656
model {

src/stan-users-guide/gaussian-processes.Rmd

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -346,7 +346,7 @@ model {
346346
}
347347
```
348348

349-
The data block now declares a vector `y` of observed values `y[n]`
349+
The data block declares a vector `y` of observed values `y[n]`
350350
for inputs `x[n]`. The transformed data block now only defines the mean
351351
vector to be zero. The three hyperparameters are defined as parameters
352352
constrained to be non-negative. The computation of the covariance matrix
@@ -366,7 +366,7 @@ noticeable, but for larger matrices ($N \gtrsim 100$) the Cholesky
366366
decomposition version will be faster.
367367

368368
Hamiltonian Monte Carlo sampling is fast and effective for hyperparameter
369-
inference in this model @Neal:1997. If the posterior is
369+
inference in this model [@Neal:1997]. If the posterior is
370370
well-concentrated for the hyperparameters the Stan implementation will fit
371371
hyperparameters in models with a few hundred data points in seconds.
372372

@@ -419,7 +419,7 @@ model {
419419

420420
Two differences between the latent variable GP and the marginal likelihood GP
421421
are worth noting. The first is that we have augmented our parameter block with
422-
a new parameter vector of length $N$ called $`eta`$. This is used in the model
422+
a new parameter vector of length $N$ called `eta`. This is used in the model
423423
block to generate a multivariate normal vector called $f$, corresponding to the
424424
latent GP. We put a $\textsf{normal}(0,1)$ prior on `eta` like we did in the
425425
Cholesky-parameterized GP in the simulation section. The second difference is
@@ -482,7 +482,7 @@ $$
482482
$$
483483

484484
We can extend our latent variable GP Stan program to deal with classification
485-
problems. Below $a$ is the bias term, which can help account for imbalanced
485+
problems. Below `a` is the bias term, which can help account for imbalanced
486486
classes in the training data:
487487

488488

@@ -513,12 +513,12 @@ $$
513513
\right)
514514
+ \delta_{i, j}\sigma^2.
515515
$$
516-
The estimation of $\rho$ was termed "automatic relevance determination" in
517-
@Neal:1996, but this is misleading, because the magnitude the scale of
516+
The estimation of $\rho$ was termed "automatic relevance determination" by
517+
@Neal:1996, but this is misleading, because the magnitude of the scale of
518518
the posterior for each $\rho_d$ is dependent on the scaling of the input data
519519
along dimension $d$. Moreover, the scale of the parameters $\rho_d$ measures
520520
non-linearity along the $d$-th dimension, rather than "relevance"
521-
@PiironenVehtari:2016.
521+
[@PiironenVehtari:2016].
522522

523523
A priori, the closer $\rho_d$ is to zero, the more nonlinear the
524524
conditional mean in dimension $d$ is. A posteriori, the actual dependencies
@@ -595,7 +595,7 @@ inherent statistical properties of a GP, the GP's purpose in the model, and the
595595
numerical issues that may arise in Stan when estimating a GP.
596596

597597
Perhaps most importantly, the parameters $\rho$ and $\alpha$ are weakly
598-
identified @zhang-gp:2004. The ratio of the two
598+
identified [@zhang-gp:2004]. The ratio of the two
599599
parameters is well-identified, but in practice we put independent priors on the
600600
two hyperparameters because these two quantities are more interpretable than
601601
their ratio.

src/stan-users-guide/hyperspherical-models.Rmd

Lines changed: 4 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ set of points in $\mathbb{R}^3$, but each such point may be described
7373
uniquely by a latitude and longitude. Geometrically, the surface
7474
defined by $S^2$ in $\mathbb{R}^3$ behaves locally like a plane, i.e.,
7575
$\mathbb{R}^2$. However, the overall shape of $S^2$ is not like a plane
76-
in that is compact (i.e., there is a maximum distance between points).
76+
in that it is compact (i.e., there is a maximum distance between points).
7777
If you set off around the globe in a "straight line" (i.e., a
7878
geodesic), you wind up back where you started eventually; that is why
7979
the geodesics on the sphere ($S^2$) are called "great circles," and
@@ -123,7 +123,7 @@ option built into all of the Stan interfaces.
123123

124124
Unit vectors correspond directly to angles and thus to rotations.
125125
This is easy to see in two dimensions, where a point on a circle
126-
determines a compass direction, or equivalently, an angle $\theta$).
126+
determines a compass direction, or equivalently, an angle $\theta$.
127127
Given an angle $\theta$, a matrix can be defined, the
128128
pre-multiplication by which rotates a point by an angle of $\theta$.
129129
For angle $\theta$ (in two dimensions), the $2 \times 2$ rotation
@@ -139,17 +139,6 @@ $$
139139
Given a two-dimensional vector $x$, $R_{\theta} \, x$ is the rotation
140140
of $x$ (around the origin) by $\theta$ degrees.
141141

142-
### Unit vector type {-}
143-
144-
In Stan, unit vectors in $K$ dimensions are declared as
145-
146-
```
147-
unit_vector[K] alpha;
148-
```
149-
150-
A unit vector has length one (meaning the sum of squared values is
151-
one, not that its number of elements is one).
152-
153142
### Angles from unit vectors {-}
154143

155144
Angles can be calculated from unit vectors. For example, a random
@@ -167,9 +156,9 @@ transformed parameters {
167156
```
168157

169158
If the distribution of $(x, y)$ is uniform over a circle, then the
170-
distribution of $\arctan \frac{y}{x}$ is uniform over $(-\pi, \pi]$.
159+
distribution of $\arctan \frac{y}{x}$ is uniform over $(-\pi, \pi)$.
171160

172-
It might be tempting to try to just declare theta directly as a
161+
It might be tempting to try to just declare `theta` directly as a
173162
parameter with the lower and upper bound constraint as given above.
174163
The drawback to this approach is that the values $-\pi$ and $\pi$ are
175164
at $-\infty$ and $\infty$ on the unconstrained scale, which can

src/stan-users-guide/latent-discrete.Rmd

Lines changed: 16 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Stan does not support sampling discrete parameters. So it is not
44
possible to directly translate BUGS or JAGS models with discrete
55
parameters (i.e., discrete stochastic nodes). Nevertheless, it is
66
possible to code many models that involve bounded discrete
7-
parameters by marginalizing out the discrete parameters.^[The computations are similar to those involved in expectation maximization (EM) algorithms @dempster-et-al:1977.]
7+
parameters by marginalizing out the discrete parameters.^[The computations are similar to those involved in expectation maximization (EM) algorithms [@dempster-et-al:1977].]
88

99
This chapter shows how to code several widely-used models involving
1010
latent discrete parameters. The next chapter, the [clustering
@@ -29,12 +29,12 @@ exactly the marginalization needed for coding the model in Stan.
2929

3030
## Change Point Models {#change-point.section}
3131

32-
The first example is a model of coal mining disasters in the U.K. for the years 1851--1962.^[The source of the data is @Jarret:1979, which itself is a note correcting an earlier data collection.]
32+
The first example is a model of coal mining disasters in the U.K. for the years 1851--1962.^[The source of the data is [@Jarret:1979], which itself is a note correcting an earlier data collection.]
3333

3434

3535
### Model with Latent Discrete Parameter {-}
3636

37-
[@PyMC:2014 Section 3.1] provides a Poisson model of disaster
37+
@PyMC:2014[, Section 3.1] provides a Poisson model of disaster
3838
$D_t$ in year $t$ with two rate parameters, an early rate ($e$)
3939
and late rate ($l$), that change at a given point in time $s$. The
4040
full model expressed using a latent discrete parameter $s$ is
@@ -86,7 +86,7 @@ where the likelihood is defined by marginalizing $s$ as
8686
p(D \mid e,l) &= \sum_{s=1}^T p(s, D \mid e,l) \\
8787
&= \sum_{s=1}^T p(s) \, p(D \mid s,e,l) \\
8888
&= \sum_{s=1}^T \textsf{uniform}(s \mid 1,T) \,
89-
\prod_{t=1}^T \textsf{Poisson}(D_t \mid t < s \; ? \; e \: : \: l)
89+
\prod_{t=1}^T \textsf{Poisson}(D_t \mid t < s \; ? \; e \: : \: l).
9090
\end{align*}
9191

9292
Stan operates on the log scale and thus requires the log likelihood,
@@ -248,7 +248,7 @@ knitr::include_graphics("./img/s-discrete-posterior.png", auto_pdf = TRUE)
248248

249249
In order their range of estimates be visible, the first plot is on the log
250250
scale and the second plot on the linear scale; note the narrower range
251-
of years in the right-hand plot resulting from sampling. The posterior
251+
of years in the second plot resulting from sampling. The posterior
252252
mean of $s$ is roughly 1891.
253253

254254

@@ -343,7 +343,7 @@ parameter; just because the population must be finite doesn't mean the
343343
parameter representing it must be. The parameter will be used to
344344
produce a real-valued estimate of the population size.
345345

346-
The Lincoln-Petersen [@Lincoln:1930,@Petersen:1896] method for
346+
The Lincoln-Petersen [@Lincoln:1930;@Petersen:1896] method for
347347
estimating population size is
348348
$$
349349
\hat{N} = \frac{M C}{R}.
@@ -385,7 +385,7 @@ for this model.
385385

386386
To ensure the MLE is the Lincoln-Petersen estimate, an improper
387387
uniform prior for $N$ is used; this could (and should) be replaced
388-
with a more informative prior if possible based on knowledge of the
388+
with a more informative prior if possible, based on knowledge of the
389389
population under study.
390390

391391
The one tricky part of the model is the lower bound $C - R + M$ placed
@@ -402,10 +402,9 @@ details of all constrained parameter transforms.
402402

403403
### Cormack-Jolly-Seber with Discrete Parameter {-}
404404

405-
The Cormack-Jolly-Seber (CJS) model
406-
[@Cormack:1964; Jolly:1965; Seber:1965] is an open-population model
407-
in which the population may change over time due to death; the
408-
presentation here draws heavily on @Schofield:2007.
405+
The Cormack-Jolly-Seber (CJS) model [@Cormack:1964; @Jolly:1965; @Seber:1965]
406+
is an open-population model in which the population may change over time
407+
due to death; the presentation here draws heavily on @Schofield:2007.
409408

410409
The basic data are
411410

@@ -514,7 +513,7 @@ By defining these probabilities in terms of $\chi$ directly, there is
514513
no need for a latent binary parameter indicating whether an animal is
515514
alive at time $t$ or not. The definition of $\chi$ is typically used
516515
to define the likelihood (i.e., marginalize out the latent discrete
517-
parameter) for the CJS model [@Schofield:2007, page 9].
516+
parameter) for the CJS model [@Schofield:2007].
518517

519518
The Stan model defines $\chi$ as a transformed parameter based on
520519
parameters $\phi$ and $p$. In the model block, the log probability is
@@ -796,8 +795,8 @@ predictors.
796795

797796
Although seemingly disparate tasks, the rating/coding/annotation of
798797
items with categories and diagnostic testing for disease or other
799-
conditions share several characteristics which allow their statistical
800-
properties to modeled similarly.
798+
conditions, share several characteristics which allow their statistical
799+
properties to be modeled similarly.
801800

802801
### Diagnostic Accuracy {-}
803802

@@ -877,8 +876,7 @@ z_i \sim \textsf{categorical}(\pi).
877876
$$
878877

879878
The rating $y_{i, j}$ provided for item $i$ by rater $j$ is modeled as
880-
a categorical response of rater $i$ to an item of category $z_i$,^[In the subscript, $z[i]$ is written as $z_i$ to
881-
improve legibility.]
879+
a categorical response of rater $i$ to an item of category $z_i$,^[In the subscript, $z_i$ is written as $z[i]$ to improve legibility.]
882880
$$
883881
y_{i, j} \sim \textsf{categorical}(\theta_{j,\pi_{z[i]}}).
884882
$$
@@ -958,7 +956,7 @@ function.
958956

959957
### Stan Implementation {-}
960958

961-
The Stan program for the Dawid and Skene model is provided below @DawidSkene:1979.
959+
The Stan program for the Dawid and Skene model is provided below [@DawidSkene:1979].
962960

963961
```
964962
data {
@@ -998,7 +996,7 @@ model {
998996
<a name="id:dawid-skene-model.figure"></a>
999997

1000998
The model marginalizes out the discrete parameter $z$, storing the
1001-
unnormalized conditional probability $\log q(z_i=k|\theta,\pi)$ in\
999+
unnormalized conditional probability $\log q(z_i=k|\theta,\pi)$ in
10021000
`log_q_z[i, k]`.
10031001

10041002
The Stan model converges quickly and mixes well using NUTS starting at

src/stan-users-guide/measurement-error.Rmd

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,12 @@ Most quantities used in statistical models arise from measurements.
44
Most of these measurements are taken with some error. When the
55
measurement error is small relative to the quantity being measured,
66
its effect on a model is usually small. When measurement error is
7-
large relative to the quantity being measured, or when precise
7+
large relative to the quantity being measured, or when precise
88
relations can be estimated being measured quantities, it is useful to
99
introduce an explicit model of measurement error. One kind of
1010
measurement error is rounding.
1111

12-
Meta-analysis plays out statistically much like measurement error
12+
Meta-analysis plays out statistically much like measurement error
1313
models, where the inferences drawn from multiple data sets are
1414
combined to do inference over all of them. Inferences for each data
1515
set are treated as providing a kind of measurement error with respect
@@ -102,7 +102,7 @@ Rounding may be done in many ways, such as rounding weights to the
102102
nearest milligram, or to the nearest pound; rounding may even be done
103103
by rounding down to the nearest integer.
104104

105-
Exercise 3.5(b) from @GelmanEtAl:2013 provides an example.
105+
Exercise 3.5(b) by @GelmanEtAl:2013 provides an example.
106106

107107
\begin{quote}
108108
3.5. \ Suppose we weigh an object five times and measure
@@ -227,7 +227,7 @@ the studies being analyzed.
227227
Suppose the data in question arise from a total of $M$ studies
228228
providing paired binomial data for a treatment and control group. For
229229
instance, the data might be post-surgical pain reduction under a treatment
230-
of ibuprofen @WarnThompsonSpiegelhalter:2002 or mortality after
230+
of ibuprofen [@WarnThompsonSpiegelhalter:2002] or mortality after
231231
myocardial infarction under a treatment of beta blockers
232232
[@GelmanEtAl:2013, Section 5.6].
233233

@@ -352,8 +352,8 @@ in each school.
352352

353353
#### Extensions and Alternatives {-}
354354

355-
@SmithSpiegelhalterThomas:1995 and [@GelmanEtAl:2013, Section 19.4]
356-
provides meta-analyses based directly on binomial data.
355+
@SmithSpiegelhalterThomas:1995 and @GelmanEtAl:2013[, Section 19.4]
356+
provide meta-analyses based directly on binomial data.
357357
@WarnThompsonSpiegelhalter:2002 consider the modeling
358358
implications of using alternatives to the log-odds ratio in
359359
transforming the binomial data.

0 commit comments

Comments
 (0)