Skip to content

Commit d76d85a

Browse files
authored
Merge pull request #177 from stan-dev/case-study/lotka-volterra-edits
Case study/lotka volterra edits
2 parents 1f9e74c + f7761ed commit d76d85a

File tree

1 file changed

+22
-18
lines changed

1 file changed

+22
-18
lines changed

knitr/lotka-volterra/lotka-volterra-predator-prey.Rmd

Lines changed: 22 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ printf <- function(msg = "%5.3f", ...) {
7979

8080
## Abstract {-}
8181

82-
Lotka (1925) and Volterra (1926) formulated parameteric differential equations that characterize the oscillating populations of predators and prey. A statistical model to account for measurement error and unexplained variation uses the deterministic solutions to the Lotka-Volterra equations as expected population sizes. Stan is used to encode the statistical model and perform full Bayesian inference to solve the inverse problem of inferring parameters from noisy data. The model is fit to Canadian lynx^[*Predator*: Canadian lynx
82+
Lotka (1925) and Volterra (1926) formulated parameteric differential equations that characterize the oscillating populations of predators and prey. A statistical model to account for measurement error and unexplained variation uses the deterministic solutions to the Lotka-Volterra equations as expected population sizes. Stan is used to encode the statistical model and perform full Bayesian inference to solve the inverse problem of inferring parameters from noisy data. The model is fit to Canadian lynx^[*Predator*: Canadian lynx
8383
![Canadian lynx](img/lynx.jpg)
8484
<span style="font-size:70%; float:right">&copy; 2009, Keith Williams, CC-BY 2.0</span>
8585
] and snowshoe hare^[*Prey*: snowshoe hare
@@ -103,14 +103,12 @@ Following up on the original source, Hewitt (1921) provides plots of the number
103103
![](img/hewitt-plot.png)
104104
<span style="float:right; font-size: 70%">&copy; Scribner's Sons 1921</span>
105105
]
106-
Hewitt's discussion ranges over many natural factors affecting the population sizes, such as plagues, migrations, and weather-related events. Hewitt even discusses measurement confounders such as the fact that lynx are easier to trap when they are hungry and weak, which is correlated with a relative decline in the hare population. The models we consider here for illustrative purposes will not consider any of these factors, though they could be extended to do so through the usual strategy of the inclusion of covariates (see the exercises). Hewitt also discusses many other species; hares and lynxes occupy only a small part of a single chapter.
106+
Hewitt's discussion ranges over many natural factors affecting the population sizes, such as plagues, migrations, and weather-related events. Hewitt even discusses measurement confounders such as the fact that lynx are easier to trap when they are hungry and weak, which is correlated with a relative decline in the hare population. The models we consider here for illustrative purposes will not consider any of these factors, though they could be extended to do so through the usual strategy of the inclusion of covariates (see the exercises). Hewitt also discusses many other species; hares and lynxes occupy only a small part of a single chapter.
107107

108-
Howard (2009)^[Howard (personal communication) traces the numerical data to Joseph
109-
M. Mahaffy's (2010) [lecture notes on mathematical
110-
modeling](http://jmahaffy.sdsu.edu/courses/f09/math636/lectures/lotka/qualde2.html)
111-
for his San Diego State University course (Math 636), which in turn
112-
cites Purves, Sadava, Orians, and Heller's book, *Life: The Science of
113-
Biology.*] provides numerical data for the number of pelts collected by the Hudson's Bay Company in the years 1900-1920, which we have included in comma-separated value (CSV) form in the source repository with the case study.
108+
Howard (2009)^[Howard (personal communication) traces the numerical data to Joseph M. Mahaffy's (2010) [lecture notes on mathematical modeling](http://jmahaffy.sdsu.edu/courses/f09/math636/lectures/lotka/qualde2.html) for his San Diego State University course (Math 636), which in turn cites Purves, Sadava, Orians, and Heller's book, *Life: The Science of Biology.*]
109+
provides numerical data for the number of pelts collected by the Hudson's Bay
110+
Company in the years 1900-1920, which we have included in comma-separated value
111+
(CSV) form in the source repository with the case study.
114112

115113
```{r}
116114
lynx_hare_df <-
@@ -204,7 +202,8 @@ As usual in writing differential equations, $u(t)$ and $v(t)$ are rendered as $u
204202
As measures of population sizes, the values of $u(t)$ and $v(t)$ must be non-negative.^[As in much of ecology, undead are frowned upon.] Nothing in the differential equations here explicitly enforces positivity. Nevertheless, as long as the initial populations are non-negative, i.e., $u(0) \geq 0$ and $v(0) \geq 0$, the values $u(t)$ and $v(t)$ for other times $t$ must also be non-negative.
205203
This is because the rate of change in each population is a factor of the population size itself.
206204

207-
Although a population following the Lotka-Volterra dynamics will never become extinct,^[The dodo (right), in 1682, before the population became extinct. ![](img/dodo.jpg) <br /> <span style="float:right; font-size:60%">1682, Sir Thomas Herbert, public domain.</span>] it may become arbitrarily small. In reality, random events such as accidents can cause extinction events as can not so random events such as extensive hunting. Therefore, the Lotka-Volterra model should not be expected to adequately model the dynamics of small populations.
205+
Although a population following the Lotka-Volterra dynamics will never become extinct,^[The dodo (right), in 1682, before the population became extinct. ![](img/dodo.jpg) <br /> <span style="float:right; font-size:60%">1682, Sir Thomas Herbert, public domain.</span>]
206+
it may become arbitrarily small. In reality, random events such as accidents can cause extinction events as can not so random events such as extensive hunting. Therefore, the Lotka-Volterra model should not be expected to adequately model the dynamics of small populations.
208207

209208

210209
## Behavior in the limit
@@ -226,9 +225,12 @@ Specifically, a Bayesian model requires a mathematical model of what we know abo
226225

227226
Mathematically, a prior density $p(\theta)$ over the sequence of parameters $\theta$ encapsulates our knowledge of the parameters before seeing the data. A sampling distribution,^[The sampling distribution $p(\theta | y)$ is called a likelihood function when considered as a function $\mathcal{L}(\theta)$ of the parameters $\theta$ for fixed data $y$.] which may have a continuous, discrete or mixed probability function, $p(y | \theta)$ characterizes the distribution of observable data $y$ given parameters $\theta$.
228227

229-
Bayes's rule gives us a general solution to the inverse problem, expressing the posterior $p(\theta | y)$ in terms of the prior $p(\theta)$ and likelihood $p(y | \theta)$.^[Bayes's rule for parameters $\theta$ and observed data $y$ is $$ \begin{array}{rcl} p(\theta\,|\, y) & = & \displaystyle \frac{p(\theta, y)}{p(y)} \\[4pt] & = & \displaystyle \frac{p(y | \theta) \, p(\theta)}{p(y)} \\[4pt] & = & \displaystyle \frac{p(y | \theta) \, p(\theta)}{\int_{\Theta} p(y | \theta) \, p(\theta) \, \mathrm{d}\theta} \\[4pt] & \propto & p(y | \theta) \, p(\theta). \end{array} $$]
228+
Bayes's rule gives us a general solution to the inverse problem, expressing the posterior $p(\theta | y)$
229+
in terms of the prior $p(\theta)$ and likelihood $p(y | \theta)$.^[Bayes's rule for parameters $\theta$ and observed data $y$ is $$ \begin{array}{rcl} p(\theta\,|\, y) & = & \displaystyle \frac{p(\theta, y)}{p(y)} \\[4pt] & = & \displaystyle \frac{p(y | \theta) \, p(\theta)}{p(y)} \\[4pt] & = & \displaystyle \frac{p(y | \theta) \, p(\theta)}{\int_{\Theta} p(y | \theta) \, p(\theta) \, \mathrm{d}\theta} \\[4pt] & \propto & p(y | \theta) \, p(\theta). \end{array} $$]
230230

231-
Stan provides a form of Markov chain Monte Carlo (MCMC) sampling that draws a sample $\theta^{(1)}, \ldots, \theta^{(M)}$ from the posterior to use for computational inference.^[The matrix of $\theta^{(m)}$ values (parameter by draw) is what is returned by the `extract()` function in Stan.] Posterior quantities of interest may be expressed as derived random variables using functions $f(\theta)$ of parameters. Such functions may be used for posterior expectation calculations such as parameter estimates that minimize expected square error when $f(\theta) = \theta$, or event probabilities such as the probability of the hare population falling below some fraction of the lynx population, when $f(\theta)$ is some indicator function.^[The convergence result (as well as error bounds) follows from the MCMC central limit theorem when $\theta^{(m)}$ are drawn according to $p(\theta | y)$ with an appropriate MCMC algorithm, $$ \begin{array}{rcl} \displaystyle \mathbb{E}[ \, f(\theta) \mid y \, ] & = & \int_{\Theta} \, f(\theta) \, p(\theta | y) \, \mathrm{d}\theta \\[4pt] & = & \lim_{M \rightarrow \infty} \, \frac{1}{M} \sum_{m=1}^M \, f\!\left(\theta^{(m)}\right) \\[4pt] & \approx & \frac{1}{M} \sum_{m=1}^M \, f\!\left(\theta^{(m)}\right) \ \ \mbox{ for some finite } M \end{array} $$]
231+
Stan provides a form of Markov chain Monte Carlo (MCMC) sampling that draws a sample $\theta^{(1)}, \ldots, \theta^{(M)}$ from the posterior to use for computational inference.^[The matrix of $\theta^{(m)}$ values (parameter by draw) is what is returned by the `extract()` function in Stan.]
232+
Posterior quantities of interest may be expressed as derived random variables using functions $f(\theta)$ of parameters.
233+
Such functions may be used for posterior expectation calculations such as parameter estimates that minimize expected square error when $f(\theta) = \theta$, or event probabilities such as the probability of the hare population falling below some fraction of the lynx population, when $f(\theta)$ is some indicator function.^[The convergence result (as well as error bounds) follows from the MCMC central limit theorem when $\theta^{(m)}$ are drawn according to $p(\theta | y)$ with an appropriate MCMC algorithm, $$ \begin{array}{rcl} \displaystyle \mathbb{E}[ \, f(\theta) \mid y \, ] & = & \int_{\Theta} \, f(\theta) \, p(\theta | y) \, \mathrm{d}\theta \\[4pt] & = & \lim_{M \rightarrow \infty} \, \frac{1}{M} \sum_{m=1}^M \, f\!\left(\theta^{(m)}\right) \\[4pt] & \approx & \frac{1}{M} \sum_{m=1}^M \, f\!\left(\theta^{(m)}\right) \ \ \mbox{ for some finite } M \end{array} $$]
232234

233235
## Measurement Error and Unexplained Variation
234236

@@ -250,7 +252,9 @@ y_n & = & x_n \beta + \epsilon_n
250252
\end{eqnarray}
251253
$$
252254

253-
The deterministic part of the equation is the linear predictor $x_n \beta$ with predictor $x_n$ (row $n$ of the data matrix $x$) and coefficient (column) vector $\beta$. The stochastic error term, $\epsilon_n$, is assigned a normal distribution located at zero with scale parameter $\sigma > 0$.^[Gauss initially noted that the maximum likelihood estimate derived from the normal error model is identical to the least square error estimate derived by minimizing the sum of squared errors, $\epsilon^{\top} \, \epsilon$. With Markov, Gauss further proved that it was the lowest variance unbiased estimator.]. We typically formulate this model without the latent error variable $\epsilon_n$ as follows,^[The latent error variable may be defined in terms of $x$, $y$, and $\beta$ as $$\epsilon_n = y_n - x_n \beta$$.]
255+
The deterministic part of the equation is the linear predictor $x_n \beta$ with predictor $x_n$ (row $n$ of the data matrix $x$) and coefficient (column) vector $\beta$.
256+
The stochastic error term, $\epsilon_n$, is assigned a normal distribution located at zero with scale parameter $\sigma > 0$.^[Gauss initially noted that the maximum likelihood estimate derived from the normal error model is identical to the least square error estimate derived by minimizing the sum of squared errors, $\epsilon^{\top} \, \epsilon$. With Markov, Gauss further proved that it was the lowest variance unbiased estimator.].
257+
We typically formulate this model without the latent error variable $\epsilon_n$ as follows,^[The latent error variable may be defined in terms of $x$, $y$, and $\beta$ as $$\epsilon_n = y_n - x_n \beta$$.]
254258

255259
$$
256260
y_n \sim \mathsf{Normal}(x_n \beta, \sigma).
@@ -347,13 +351,13 @@ Because $\exp(\epsilon_{n,k})$ is positive and the underlying population size $z
347351
This transform and its associated fractional error is so common that they are jointly known as the lognormal distribution, so that we can simplify the above notation as we did with linear regression and write
348352

349353
$$
350-
y_{n, k} \sim \mathsf{LogNormal}(z_{n, k}, \sigma_n).
354+
y_{n, k} \sim \mathsf{LogNormal}(\log(z_{n, k}), \sigma_n).
351355
$$
352356
whenever
353357
$$
354-
\log y_{n, k} \sim \mathsf{Normal}(z_{n, k}, \sigma_n).
358+
\log y_{n, k} \sim \mathsf{Normal}(\log(z_{n, k}), \sigma_n).
355359
$$
356-
The $\mathsf{LogNormal}$ density accounts for the non-linear change of variables through a Jacobian adjustment.^[The Stan manual chapter on changes of variables works through this particular example.]
360+
The $\mathsf{LogNormal}$ density accounts for the non-linear change of variables through a Jacobian adjustment.^[The [Stan manual chapter on changes of variables](https://mc-stan.org/docs/stan-users-guide/changes-of-variables.html) works through the Jacobian adjustment for this particular change of variables.]
357361

358362
## Weakly informative priors
359363

@@ -373,7 +377,7 @@ $$
373377

374378
## Prior for noise scale
375379

376-
The noise scale is proportional, so the following prior should be weakly informative, as a value smaller than 0.05 or larger than 3 would be unexpected.^[The 95% interval is approximately the mean plus or minus two standard deviations, which here is $\exp(-1 - 2) \approx 0.05$ and $\exp(-1 + 2) \approx 3$]. Because values are positive, this prior adopts the lognormal distribution.^[A lognormal prior on $\sigma$ is not consistent with zero values of $\sigma$, but we do not expect the data to be consistent with values of $\sigma$ near zero because the model will is not particularly accurate. It makes well calibrated predictions, but they are not very sharp.]
380+
The noise scale is proportional, so the following prior should be weakly informative, as a value smaller than 0.05 or larger than 3 would be unexpected.^[The 95% interval is approximately the mean plus or minus two standard deviations, which here is $\exp(-1 - 2) \approx 0.05$ and $\exp(-1 + 2) \approx 3$]. Because values are positive, this prior adopts the lognormal distribution.^[A lognormal prior on $\sigma$ is not consistent with zero values of $\sigma$, but we do not expect the data to be consistent with values of $\sigma$ near zero because the model is not particularly accurate. It makes well calibrated predictions, but they are not very sharp.]
377381

378382
$$
379383
\sigma \sim \mathsf{LogNormal}(-1, 1)
@@ -598,7 +602,7 @@ One inference we would like to make from our data is the size of the lynx and ha
598602

599603
Rather than plugging in point estimates to get point predictions, we will follow the fully Bayesian approach of adjusting for uncertainty. This uncertainty takes two forms in inference. First, there is estimation uncertainty, which is fully characterized by the joint posterior density $p(\alpha, \beta, \gamma, \delta, z^{\mathrm init}, \sigma \mid y)$.^[In well-behaved models in which the data is broadly consistent with the prior and likelihood, estimation uncertainty is reduced by larger samples; as more data is available, it will overwhelm the fixed priors and the posterior will concentrate around the true values of the parameters.]
600604

601-
The second form of uncertainty stems from measurement error and unexplained variation, which are both rolled into a single sampling distribution, $\log y_n \sim \mathsf{Normal}(\log z_n, \sigma)$. As in the Stan implementation, $z_n = (u_n, v_n)$ is the solution to the differential equation conditioned on the parameters $\theta = (\alpha, \beta, \gamma, \delta, \sigma)$ and initial state $z^{\mathrm init}$.
605+
The second form of uncertainty stems from measurement error and unexplained variation, which are both rolled into a single sampling distribution, $\log y_n \sim \mathsf{LogNormal}(\log z_n, \sigma)$. As in the Stan implementation, $z_n = (u_n, v_n)$ is the solution to the differential equation conditioned on the parameters $\theta = (\alpha, \beta, \gamma, \delta, \sigma)$ and initial state $z^{\mathrm init}$.
602606

603607
## Posterior predictive checks
604608

@@ -856,7 +860,7 @@ model {
856860
theta[{1, 3}] ~ normal(1, 0.5);
857861
theta[{2, 4}] ~ normal(0.05, 0.05);
858862
sigma ~ lognormal(-1, 1);
859-
z_init ~ lognormal(10, 1);
863+
z_init ~ lognormal(log(10), 1);
860864
for (k in 1:2) {
861865
y_init[k] ~ lognormal(log(z_init[k]), sigma[k]);
862866
y[ , k] ~ lognormal(log(z[, k]), sigma[k]);

0 commit comments

Comments
 (0)