You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: knitr/planetary_motion/planetary_motion.rmd
+10-11Lines changed: 10 additions & 11 deletions
Original file line number
Diff line number
Diff line change
@@ -7,12 +7,13 @@ keep_tex: true
7
7
toc: false
8
8
documentclass: article
9
9
bibliography: ref.bib
10
+
urlcolor: blue
10
11
abstract: "The Bayesian model of planetary motion is a simple but powerful example that illustrates important concepts, as well as gaps, in prescribed modeling workflows. Our focus is on Bayesian inference using Markov chains Monte Carlo for a model based on an ordinary differential equations (ODE). Our example presents unexpected multimodality, causing our inference to be unreliable and what is more, dramatically slowing down our ODE integrators. What do we do when our chains do not mix and do not forget their starting points? Reasoning about the computational statistics at hand and the physics of the modeled phenomenon, we diagnose how the modes arise and how to improve our inference. Our process for fitting the model is iterative, starting with a simplification and building the model back up, and makes extensive use of visualization."
11
12
---
12
13
13
14
# Introduction
14
15
15
-
As developers of statistical softwares, we realize that we cannot fully automate modeling.
16
+
As developers of statistical softwares^[The authors are notably members of the Stan development team, see [mc-stan.org](https://mc-stan.org/).], we realize that we cannot fully automate modeling.
16
17
Practitioners need to take bespoke steps to fit, evaluate, and improve their models.
17
18
At the same time, the more modeling we do, the better prepared we usually are for the next project we undertake.
18
19
It's not uncommon to apply hard-learned lessons from a past project to a new problem.
@@ -59,8 +60,7 @@ source("tools.r")
59
60
set.seed(1954)
60
61
```
61
62
62
-
All the requisite code to run this notebook can be found
63
-
on \url{https://github.com/stan-dev/example-models/knitr/planet_motion}.
63
+
All the requisite code to run this notebook can be found online, in the [planetary motion github repository](https://github.com/stan-dev/example-models/tree/case-study/planet/knitr/planetary_motion).
64
64
65
65
# Building the model
66
66
@@ -72,8 +72,7 @@ We would like to estimate the following quantities:
72
72
We assume the gravitational constant is known, $G = 1.0 \times 10^{-3}$ in some unit, and aim to evaluate the star-planet mass ratio.
73
73
To do this, we set the planetary mass to $m = 1$.
74
74
It remains to evaluate the solar mass, $M$.
75
-
* The initial position vector, $q_0$, and the initial momentum vector, $p_0$
76
-
of the planet.
75
+
* The initial position vector, $q_0$, and the initial momentum vector, $p_0$, of the planet.
77
76
* The subsequent position vector, $q(t)$, of the planet over time.
78
77
* The position vector of the star, $q_*$.
79
78
@@ -137,7 +136,7 @@ This turns out to be quite true here where a simple one-parameter model allows u
137
136
There are many ways to simplify a model.
138
137
A general approach is to fix some of the parameters, which we can easily do when working with simulated data.
139
138
We fix all the parameters, except for $k$.
140
-
Our goal is now to characterize the posterior distribution,
139
+
We now want to characterize the posterior distribution,
141
140
$$
142
141
p(k \mid q_\mathrm{obs}),
143
142
$$
@@ -198,7 +197,7 @@ In the latter case, this also means our gradient calculations, and computation o
198
197
199
198
Bearing a slight abuse of language, we use "degenerate" to mean that various values of $k$ roughly produce the same data generating process.
200
199
We can check for degeneracy by looking at the _posterior predictive checks_, split across chains.
201
-
We plot $q_x$ against $t$, and for each chain, compute the median estimate for $q_\mathrm{pred}$, obtained using the `generated quantities block`.
200
+
We plot $q_x$ against $q_y$, and for each chain, compute the median estimate for $q_\mathrm{pred}$, obtained using the `generated quantities block`.
202
201
Note that since we fixed $\sigma = 0.01$, we expect the confidence interval to be very narrow.
203
202
204
203
```{r message=FALSE}
@@ -213,8 +212,8 @@ This is consistent with the much higher log posterior density these chains produ
213
212
So degeneracy alone does not explain the lack of convergence.
214
213
Nevertheless, the chains may still be getting stuck at smaller modes, in the tail of $k$'s distribution.
215
214
216
-
At this point, we have taken "standard" steps to diagnose issues with our inference, notably by taking advantage of recommended tools that rstan supports.
217
-
To fully grasp what prevents the chains from mixing and overcome this challenge, we require a more bespoke analysis.
215
+
At this point, we have taken "standard" steps to diagnose issues with our inference.
216
+
To fully grasp what prevents the chains from mixing, we require a more bespoke analysis.
218
217
We summarize our reasoning, noting it involves unmentioned trials and errors, and long moments of pondering.
219
218
220
219
## The wanderers: how do the chains even find these presumed modes?
@@ -403,7 +402,7 @@ But clearly, for this and other examples, too much dispersion can prevent certai
403
402
One heuristic is to sample the starting point from the prior distribution, or potentially from an overdispered prior.
404
403
405
404
Another perspective is to simply admit that there is no one-size-fits-all solution.
406
-
This is very much true of other tuning parameters of our algorithm, such as the length of the warmup or the _target acceptance rate_ of HMC.
405
+
This is very much true of other tuning parameters of our algorithm, such as the length of the warmup or the target acceptance rate of HMC.
407
406
While defaults exist, a first attempt at fitting the model can motivate adjustments.
408
407
In this sense, we can justify using a tighter distribution to draw the starting points after examining the behavior of the Markov chains with a broad starting distribution.
409
408
@@ -579,7 +578,7 @@ We now have some intuition that elliptical observations allow for local modes, b
579
578
Remember also that for a mode to exist, it doesn't need to induce a particularly good fit; it simply needs to dominate a neighborhood.
580
579
581
580
Conceptually, tweaking $q_*$ means we can move the star closer to the planet and thus increase the gravitational interaction.
582
-
This is not unlike tweaking $k$, except we are the affecting the $r$ term in
581
+
This is not unlike tweaking $k$, except we are affecting the $r$ term in
583
582
$$
584
583
\frac{\mathrm d p}{\mathrm d t} = - \frac{k}{r^3}(q - q_*).
0 commit comments