|
4 | 4 | "cell_type": "markdown", |
5 | 5 | "metadata": {}, |
6 | 6 | "source": [ |
7 | | - "## Using Estimates from Variational, Laplace, or Optimization Methods to Initialize the NUTS-HMC Sampler\n", |
| 7 | + "## Initializing the NUTS-HMC sampler\n", |
8 | 8 | "\n", |
9 | | - "In this example, we show how to use parameter estimates returned by Stan's various posterior approximation or optimization algorithms as initial values for Stan's NUTS-HMC sampler. These include:\n", |
| 9 | + "In this example, we show how to use parameter estimates returned by any of Stan's inference algorithms as initial values for Stan's NUTS-HMC sampler. These include:\n", |
10 | 10 | "\n", |
11 | 11 | "* [Pathfinder ](https://mc-stan.org/docs/cmdstan-guide/pathfinder-config.html) \n", |
12 | 12 | "* [ADVI ](https://mc-stan.org/docs/cmdstan-guide/variational_config.html) \n", |
13 | 13 | "* [Laplace](https://mc-stan.org/docs/cmdstan-guide/laplace_sample_config.html)\n", |
14 | 14 | "* [Optimization](https://mc-stan.org/docs/cmdstan-guide/optimize_config.html)\n", |
| 15 | + "* [NUTS-HMC MCMC](https://mc-stan.org/docs/cmdstan-guide/mcmc_config.html)\n", |
15 | 16 | "\n", |
16 | | - "By default, the NUTS-HMC sampler randomly initializes all model parameters uniformly in the interval $(-2, 2)$. If this interval is far from the typical set of the posterior, initializing sampling from these approximation algorithms can speed up and improve adaptation.\n", |
| 17 | + "By default, the NUTS-HMC sampler randomly initializes all (unconstrained) model parameters uniformly in the interval (-2, 2). If this interval is far from the typical set of the posterior, initializing sampling from these approximation algorithms can speed up and improve adaptation.\n", |
17 | 18 | "\n", |
18 | 19 | "### Model and data\n", |
19 | 20 | "\n", |
|
49 | 50 | "source": [ |
50 | 51 | "### Demonstration with Stan's `pathfinder` method\n", |
51 | 52 | "\n", |
52 | | - "The approximation methods all follow the same general pattern of usage. First, we call the \n", |
| 53 | + "Initializing the sampler with estimates from any previous inference algorithm follows the same general usage pattern. First, we call the \n", |
53 | 54 | "corresponding method on the `CmdStanModel` object. From the resulting fit, we call the `.create_inits()` \n", |
54 | 55 | "method to construct a set of per-chain initializations for the model parameters. To make it explicit, \n", |
55 | 56 | "we will walk through the process using the `pathfinder` method (which wraps the \n", |
|
84 | 85 | "source": [ |
85 | 86 | "Posteriordb provides reference posteriors for all models. For the blr model, conditioned on the dataset `sblri.json`, the reference posteriors can be found in the [sblri-blr.json](https://github.com/stan-dev/posteriordb/blob/master/posterior_database/reference_posteriors/summary_statistics/mean/mean/sblri-blr.json) file.\n", |
86 | 87 | "\n", |
87 | | - "The reference posteriors for all elements of `beta` and `sigma` are all very close to $1.0$.\n", |
| 88 | + "The reference posteriors for all elements of `beta` and `sigma` are all very close to 1.0.\n", |
88 | 89 | "\n", |
89 | 90 | "The experiments reported in Figure 3 of the paper [Pathfinder: Parallel quasi-Newton variational inference](https://arxiv.org/abs/2108.03782) by Zhang et al. show that Pathfinder provides a better estimate of the posterior, as measured by the 1-Wasserstein distance to the reference posterior, than 75 iterations of the warmup Phase I algorithm used by the NUTS-HMC sampler.\n", |
90 | 91 | "Furthermore, Pathfinder is more computationally efficient, requiring fewer evaluations of the log density and gradient functions. Therefore, using the Pathfinder estimates to initialize the parameter values for the NUTS-HMC sampler can allow the sampler to do a better job of adapting the stepsize and metric during warmup, resulting in better performance and estimation.\n", |
|
191 | 192 | "cell_type": "markdown", |
192 | 193 | "metadata": {}, |
193 | 194 | "source": [ |
194 | | - "### Other approximation algorithms" |
| 195 | + "### Other inference algorithms" |
195 | 196 | ] |
196 | 197 | }, |
197 | 198 | { |
|
349 | 350 | "cell_type": "markdown", |
350 | 351 | "metadata": {}, |
351 | 352 | "source": [ |
352 | | - "It is also possible to use the output of the `sample()` method to construct inits to be fed into a future sampling run:" |
| 353 | + "It is also possible to use the output of the `sample()` method itself to construct inits to be fed into a future sampling run:" |
353 | 354 | ] |
354 | 355 | }, |
355 | 356 | { |
|
0 commit comments