Skip to content

Commit 3f5aeb0

Browse files
committed
add new page on sampling keyword arguments
1 parent 6ce7f96 commit 3f5aeb0

File tree

2 files changed

+272
-0
lines changed

2 files changed

+272
-0
lines changed

_quarto.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,7 @@ website:
7474
collapse-level: 1
7575
contents:
7676
- usage/automatic-differentiation/index.qmd
77+
- usage/sampling-options/index.qmd
7778
- usage/submodels/index.qmd
7879
- usage/custom-distribution/index.qmd
7980
- usage/probability-interface/index.qmd
@@ -250,6 +251,7 @@ usage-modifying-logprob: usage/modifying-logprob
250251
usage-performance-tips: usage/performance-tips
251252
usage-probability-interface: usage/probability-interface
252253
usage-sampler-visualisation: usage/sampler-visualisation
254+
usage-sampling-options: usage/sampling-options
253255
usage-submodels: usage/submodels
254256
usage-tracking-extra-quantities: usage/tracking-extra-quantities
255257
usage-troubleshooting: usage/troubleshooting

usage/sampling-options/index.qmd

Lines changed: 270 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,270 @@
1+
---
2+
title: "MCMC Sampling Options"
3+
engine: julia
4+
---
5+
6+
```{julia}
7+
#| echo: false
8+
#| output: false
9+
using Pkg;
10+
Pkg.instantiate();
11+
```
12+
13+
Markov chain Monte Carlo sampling in Turing.jl is performed using the `sample()` function.
14+
As described on the [Core Functionality page]({{< meta core-functionality >}}), single-chain and multiple-chain sampling can be done using, respectively,
15+
16+
```julia
17+
sample(model, sampler, niters)
18+
sample(model, sampler, MCMCThreads(), niters, nchains) # or MCMCSerial() or MCMCDistributed()
19+
```
20+
21+
On top of this, both methods also accept a number of keyword arguments that allow you to control the sampling process.
22+
This page will detail these options.
23+
24+
To begin, let's create a simple model:
25+
26+
```{julia}
27+
using Turing
28+
29+
@model function demo_model()
30+
x ~ Normal()
31+
y ~ Normal(x)
32+
4.0 ~ Normal(y)
33+
return nothing
34+
end
35+
```
36+
37+
## Controlling logging
38+
39+
Progress bars can be controlled with the `progress` keyword argument.
40+
The exact values that can be used depend on whether you are using single-chain or multi-chain sampling.
41+
42+
For single-chain sampling, `progress=true` and `progress=false` enable and disable the progress bar, respectively.
43+
44+
For multi-chain sampling, `progress` can take the following values:
45+
46+
- `:none` or `false`: no progress bar
47+
- (default) `:overall` or `true`: creates one overall progress bar for all chains
48+
- `:perchain`: creates one overall progress bar, plus one extra progress bar per chain (note that this can lead to visual clutter if you have many chains)
49+
50+
If you want to globally enable or disable the progress bar, you can use:
51+
52+
```{julia}
53+
Turing.setprogress!(false); # or true
54+
```
55+
56+
(This handily also disables progress logging for the rest of this document.)
57+
58+
For NUTS in particular, you can also specify `verbose=false` to disable the "Found initial step size" info message.
59+
60+
## Switching the output type
61+
62+
By default, the results of MCMC sampling are bundled up in an `MCMCChains.Chains` object.
63+
64+
```{julia}
65+
chn = sample(demo_model(), HMC(0.1, 20), 5)
66+
typeof(chn)
67+
```
68+
69+
If you wish to use a different chain format provided in another package, you can specify the `chain_type` keyword argument.
70+
You should refer to the documentation of the respective package for exact details.
71+
72+
Another situation where specifying `chain_type` can be useful is when you want to obtain the raw MCMC outputs as a vector of transitions.
73+
This can be used for profiling or debugging purposes (often, chain construction can take a surprising amount of time compared to sampling, especially for very simple models).
74+
To do so, you can use `chain_type=Any` (i.e., do not convert the output to any specific chain format):
75+
76+
```{julia}
77+
transitions = sample(demo_model(), MH(), 5; chain_type=Any)
78+
typeof(transitions)
79+
```
80+
81+
## Specifying initial parameters
82+
83+
In Turing.jl, initial parameters for MCMC sampling can be specified using the `initial_params` keyword argument.
84+
85+
For single-chain sampling, the AbstractMCMC interface generally expects that you provide a completely flattened vector of parameters.
86+
87+
```{julia}
88+
chn = sample(demo_model(), MH(), 5; initial_params=[1.0, -5.0])
89+
chn[:x][1], chn[:y][1]
90+
```
91+
92+
::: {.callout-note}
93+
Note that a number of samplers use warm-up steps by default (see the [Thinning and Warmup section below](#thinning-and-warmup)), so `chn[:param][1]` may not correspond to the exact initial parameters you provided.
94+
`MH()` does not do this, which is why we use it here.
95+
:::
96+
97+
Note that for Turing models, the use of `Vector` can be extremely error-prone as the order of parameters in the flattened vector is not always obvious (especially if there are parameters with non-trivial types).
98+
In general, parameters should be provided in the order they are defined in the model.
99+
A relatively 'safe' way of obtaining parameters in the correct order is to first generate a `VarInfo`, and then linearise that:
100+
101+
```{julia}
102+
using DynamicPPL
103+
vi = VarInfo(demo_model())
104+
initial_params = vi[:]
105+
```
106+
107+
To avoid this situation, you can also use `NamedTuple` to specify initial parameters.
108+
109+
```{julia}
110+
chn = sample(demo_model(), MH(), 5; initial_params=(y=2.0, x=-6.0))
111+
chn[:x][1], chn[:y][1]
112+
```
113+
114+
This works even for parameters with more complex types.
115+
116+
```{julia}
117+
@model function demo_complex()
118+
x ~ LKJCholesky(3, 0.5)
119+
y ~ MvNormal(zeros(3), I)
120+
end
121+
init_x, init_y = rand(LKJCholesky(3, 0.5)), rand(MvNormal(zeros(3), I))
122+
chn = sample(demo_complex(), MH(), 5; initial_params=(x=init_x, y=init_y));
123+
```
124+
125+
::: {.callout-important}
126+
## Upcoming changes in Turing v0.41
127+
128+
In Turing v0.41, instead of providing _initial parameters_, users will have to provide what is conceptually an _initialisation strategy_.
129+
The keyword argument is still `initial_params`, but the permitted values will either be:
130+
131+
- `InitFromPrior()`: generate initial parameters by sampling from the prior
132+
- `InitFromUniform(lower, upper)`: generate initial parameters by sampling uniformly from the given bounds in linked space
133+
- `InitFromParams(namedtuple_or_dict)`: use the provided initial parameters, supplied either as a `NamedTuple` or a `Dict{<:VarName}`
134+
135+
Initialisation with `Vector` will be fully removed due to its inherent ambiguity.
136+
Initialisation with a raw `NamedTuple` will still be supported (it will simply be wrapped in `InitFromParams()`); but we expect to remove this eventually, so it will be more future-proof to use `InitFromParams()` directly.
137+
:::
138+
139+
## Saving and resuming sampling
140+
141+
By default, MCMC sampling starts from scratch, using the initial parameters provided.
142+
You can, however, resume sampling from a previous chain.
143+
This is useful to, for example, perform sampling in batches, or to inspect intermediate results.
144+
145+
Firstly, the previous chain _must_ have been run using the `save_state=true` argument.
146+
147+
```{julia}
148+
using Random
149+
rng = Xoshiro(468)
150+
151+
chn1 = sample(rng, demo_model(), MH(), 5; save_state=true);
152+
```
153+
154+
For `MCMCChains.Chains`, this results in the final sampler state being stored inside the chain metadata.
155+
You can access it using `DynamicPPL.loadstate`:
156+
157+
```{julia}
158+
saved_state = DynamicPPL.loadstate(chn1)
159+
typeof(saved_state)
160+
```
161+
162+
::: {.callout-note}
163+
You can also directly access the saved sampler state with `chn1.info.samplerstate`, but we recommend _not_ using this as it relies on the internal structure of `MCMCChains.Chains`.
164+
:::
165+
166+
Sampling can then be resumed from this state by providing it as the `initial_state` keyword argument.
167+
168+
```{julia}
169+
chn2 = sample(demo_model(), MH(), 5; initial_state=saved_state)
170+
```
171+
172+
Note that the exact format saved in `chn.info.samplerstate`, and that expected by `initial_state`, depends on the invocation of `sample` used.
173+
For single-chain sampling, the saved state, and the required initial state, is just a single sampler state.
174+
For multiple-chain sampling, it is a vector of states, one per chain.
175+
176+
This means that, for example, after sampling a single chain, you could sample three chains that branch off from that final state:
177+
178+
```{julia}
179+
initial_states = fill(saved_state, 3)
180+
chn3 = sample(demo_model(), MH(), MCMCThreads(), 5, 3; initial_state=initial_states)
181+
```
182+
183+
::: {.callout-note}
184+
## Initial states versus initial parameters
185+
186+
The `initial_state` and `initial_params` keyword arguments are mutually exclusive.
187+
If both are provided, `initial_params` will be silently ignored.
188+
189+
```{julia}
190+
chn2 = sample(rng, demo_model(), MH(), 5;
191+
initial_state=saved_state, initial_params=(x=0.0, y=0.0)
192+
)
193+
chn2[:x][1], chn2[:y][1]
194+
```
195+
196+
In general, the saved state will contain a set of parameters (which will be the last parameters in the previous chain).
197+
However, the saved state not only specifies parameters but also other internal variables required by the sampler.
198+
For example, the MH state contains a cached log-density of the current parameters, which is later used for calculating the acceptance ratio.
199+
200+
Finally, note that the first sample in the resumed chain will not be the same as the last sample in the previous chain; it will be the sample immediately after that.
201+
202+
```{julia}
203+
# In general these will not be the same (although it _could_ be if the MH step
204+
# was rejected -- that is why we seed the sampling in this section).
205+
chn1[:x][end], chn2[:x][1]
206+
```
207+
:::
208+
209+
## Thinning and warmup
210+
211+
The `num_warmup` and `discard_initial` keyword arguments can be used to control MCMC warmup.
212+
Both of these are integers, and respectively specify the number of warmup steps to perform, and the number of iterations at the start of the chain to discard.
213+
Note that the value of `discard_initial` should also include the `num_warmup` steps if you want the warmup steps to be discarded.
214+
215+
Here are some examples of how these two keyword arguments interact:
216+
217+
| `num_warmup=` | `discard_initial=` | Description |
218+
| -------------- | -------------------- | ---------------------------------------------------------------------------------------------------------------------- |
219+
| 10 | 10 | Perform 10 warmup steps, discard them; the chain starts from the first non-warmup step |
220+
| 10 | 15 | Perform 10 warmup steps, discard them and the next 5 steps; the chain starts from the 6th non-warmup step |
221+
| 10 | 5 | Perform 10 warmup steps, discard the first 5; the chain will contain 5 warmup steps followed by the rest of the chain |
222+
| 0 | 10 | No warmup steps, discard the first 10 steps; the chain starts from the 11th step |
223+
| 0 | 0 | No warmup steps, do not discard any steps; the chain starts from the 1st step (corresponding to the initial parameters) |
224+
225+
Each sampler has its own default value for `num_warmup`, but `discard_initial` always defaults to `num_warmup`.
226+
227+
Warmup steps and 'regular' non-warmup steps differ in that warmup steps call `AbstractMCMC.step_warmup`, whereas regular steps call `AbstractMCMC.step`.
228+
For all the samplers defined in Turing, these two functions are identical; however, they may in general differ for other samplers.
229+
Please consult the documentation of the respective sampler for details.
230+
231+
A thinning factor can be specified using the `thinning` keyword argument.
232+
For example, `thinning=10` will keep every tenth sample, discarding the other nine.
233+
234+
Note that thinning is not applied to the first `discard_initial` samples; it is only applied to the remaining samples.
235+
Thus, for example, if you use `discard_initial=50` and `thinning=10`, the chain will contain samples 51, 61, 71, and so on.
236+
237+
## Performing model checks
238+
239+
DynamicPPL by default performs a number of checks on the model before any sampling is done.
240+
This catches a number of potential errors in a model, such as having repeated variables (see [the DynamicPPL documentation](https://turinglang.org/DynamicPPL.jl/stable/api/#DynamicPPL.DebugUtils.check_model_and_trace) for details).
241+
242+
If you wish to disable this you can pass `check_model=false` to `sample()`.
243+
244+
245+
## Callbacks
246+
247+
The `callback` keyword argument can be used to specify a function that is called at the end of each sampler iteration.
248+
This function should have the signature `callback(rng, model, sampler, sample, iteration::Int; kwargs...)`.
249+
250+
If you are performing multi-chain sampling, `kwargs` will additionally contain `chain_number::Int`, which ranges from 1 to the number of chains.
251+
252+
The [TuringCallbacks.jl package](https://github.com/TuringLang/TuringCallbacks.jl) contains a `TensorBoardCallback`, which can be used to obtain live progress visualisations using [TensorBoard](https://www.tensorflow.org/tensorboard).
253+
254+
## Automatic differentiation
255+
256+
Finally, please note that for samplers which use automatic differentiation (e.g., HMC and NUTS), the AD type should be specified in the sampler constructor itself, rather than as a keyword argument to `sample()`.
257+
258+
In other words, this is correct:
259+
260+
```{julia}
261+
spl = NUTS(; adtype=AutoForwardDiff())
262+
chn = sample(demo_model(), spl, 10);
263+
```
264+
265+
and not this:
266+
267+
```julia
268+
spl = NUTS()
269+
chn = sample(demo_model(), spl, 10; adtype=AutoForwardDiff())
270+
```

0 commit comments

Comments
 (0)