Skip to content

Commit ca9706c

Browse files
authored
Merge branch 'main' into sg/temp-theme-update
2 parents 54e1c87 + 4e95bc8 commit ca9706c

File tree

8 files changed

+289
-122
lines changed

8 files changed

+289
-122
lines changed

.github/workflows/preview.yml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,6 @@ name: PR Preview Workflow
22

33
on:
44
pull_request:
5-
types:
6-
- opened
7-
- synchronize
85

96
concurrency:
107
group: docs

core-functionality/index.qmd

Lines changed: 133 additions & 70 deletions
Large diffs are not rendered by default.

developers/compiler/minituring-compiler/index.qmd

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -270,7 +270,18 @@ We define the probabilistic model:
270270
end;
271271
```
272272

273-
We perform inference with data `x = 3.0`:
273+
The `@mini_model` macro expands this into another function, `m`, which effectively calls either `assume` or `observe` on each variable as needed:
274+
275+
```{julia}
276+
@macroexpand @mini_model function m(x)
277+
a ~ Normal(0.5, 1)
278+
b ~ Normal(a, 2)
279+
x ~ Normal(b, 0.5)
280+
return nothing
281+
end
282+
```
283+
284+
We can use this function to construct the `MiniModel`, and then perform inference with data `x = 3.0`:
274285

275286
```{julia}
276287
sample(MiniModel(m, (x=3.0,)), MHSampler(), 1_000_000; chain_type=Chains, progress=false)

developers/inference/abstractmcmc-turing/index.qmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ using Pkg;
1212
Pkg.instantiate();
1313
```
1414

15-
Prerequisite: [Interface guide]({{<meta using-turing-interface>}}).
15+
Prerequisite: Interface guide.
1616

1717
## Introduction
1818

@@ -35,7 +35,7 @@ n_samples = 1000
3535
chn = sample(mod, alg, n_samples, progress=false)
3636
```
3737

38-
The function `sample` is part of the AbstractMCMC interface. As explained in the [interface guide]({{<meta using-turing-interface>}}), building a sampling method that can be used by `sample` consists in overloading the structs and functions in `AbstractMCMC`. The interface guide also gives a standalone example of their implementation, [`AdvancedMH.jl`]().
38+
The function `sample` is part of the AbstractMCMC interface. As explained in the interface guide, building a sampling method that can be used by `sample` consists in overloading the structs and functions in `AbstractMCMC`. The interface guide also gives a standalone example of their implementation, [`AdvancedMH.jl`]().
3939

4040
Turing sampling methods (most of which are written [here](https://github.com/TuringLang/Turing.jl/tree/main/src/mcmc)) also implement `AbstractMCMC`. Turing defines a particular architecture for `AbstractMCMC` implementations, that enables working with models defined by the `@model` macro, and uses DynamicPPL as a backend. The goal of this page is to describe this architecture, and how you would go about implementing your own sampling method in Turing, using Importance Sampling as an example. I don't go into all the details: for instance, I don't address selectors or parallelism.
4141

getting-started/index.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,4 +93,4 @@ A thorough introduction to the field is [*Pattern Recognition and Machine Learni
9393
:::
9494

9595
The next page on [Turing's core functionality]({{<meta using-turing>}}) explains the basic features of the Turing language.
96-
From there, you can either look at [worked examples of how different models are implemented in Turing]({{<meta tutorials-intro>}}), or [specific tips and tricks that can help you get the most out of Turing]({{<meta using-turing-mode-estimation>}}).
96+
From there, you can either look at [worked examples of how different models are implemented in Turing]({{<meta tutorials-intro>}}), or [specific tips and tricks that can help you get the most out of Turing]({{<meta usage-performance-tips>}}).

tutorials/hidden-markov-models/index.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ In this case, we use HMC for `m` and `T`, representing the emission and transiti
103103
The parameter `s` is not a continuous variable.
104104
It is a vector of **integers**, and thus Hamiltonian methods like HMC and NUTS won't work correctly.
105105
Gibbs allows us to apply the right tools to the best effect.
106-
If you are a particularly advanced user interested in higher performance, you may benefit from setting up your Gibbs sampler to use [different automatic differentiation]({{<meta using-turing-autodiff>}}#compositional-sampling-with-differing-ad-modes) backends for each parameter space.
106+
If you are a particularly advanced user interested in higher performance, you may benefit from setting up your Gibbs sampler to use [different automatic differentiation]({{<meta usage-automatic-differentiation>}}#compositional-sampling-with-differing-ad-modes) backends for each parameter space.
107107

108108
Time to run our sampler.
109109

usage/automatic-differentiation/index.qmd

Lines changed: 83 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -12,68 +12,81 @@ using Pkg;
1212
Pkg.instantiate();
1313
```
1414

15-
## Switching AD Modes
15+
## What is Automatic Differentiation?
1616

17-
Turing currently supports four automatic differentiation (AD) backends for sampling: [ForwardDiff](https://github.com/JuliaDiff/ForwardDiff.jl) for forward-mode AD; and [Mooncake](https://github.com/compintell/Mooncake.jl) and [ReverseDiff](https://github.com/JuliaDiff/ReverseDiff.jl) for reverse-mode AD.
18-
`ForwardDiff` is automatically imported by Turing. To utilize `Mooncake` or `ReverseDiff` for AD, users must explicitly import them with `import Mooncake` or `import ReverseDiff`, alongside the usual `using Turing`.
17+
Automatic differentiation (AD) is a technique used in Turing.jl to evaluate the gradient of a function at a given set of arguments.
18+
In the context of Turing.jl, the function being differentiated is the log probability density of a model, and the arguments are the parameters of the model (i.e. the values of the random variables).
19+
The gradient of the log probability density is used by various algorithms in Turing.jl, such as HMC (including NUTS), mode estimation (which uses gradient-based optimization), and variational inference.
1920

20-
As of Turing version v0.30, the global configuration flag for the AD backend has been removed in favour of [`AdTypes.jl`](https://github.com/SciML/ADTypes.jl), allowing users to specify the AD backend for individual samplers independently.
21-
Users can pass the `adtype` keyword argument to the sampler constructor to select the desired AD backend, with the default being `AutoForwardDiff(; chunksize=0)`.
21+
The Julia ecosystem has a number of AD libraries.
22+
You can switch between these using the unified [ADTypes.jl](https://github.com/SciML/ADTypes.jl/) interface, which for a given AD backend, provides types such as `AutoBackend` (see [the documentation](https://docs.sciml.ai/ADTypes/stable/) for more details).
23+
For example, to use the [Mooncake.jl](https://github.com/compintell/Mooncake.jl) package for AD, you can run the following:
2224

23-
For `ForwardDiff`, pass `adtype=AutoForwardDiff(; chunksize)` to the sampler constructor. A `chunksize` of `nothing` permits the chunk size to be automatically determined. For more information regarding the selection of `chunksize`, please refer to [related section of `ForwardDiff`'s documentation](https://juliadiff.org/ForwardDiff.jl/dev/user/advanced/#Configuring-Chunk-Size).
25+
```{julia}
26+
# Turing re-exports AutoForwardDiff, AutoReverseDiff, and AutoMooncake.
27+
# Other ADTypes must be explicitly imported from ADTypes.jl or
28+
# DifferentiationInterface.jl.
29+
using Turing
30+
setprogress!(false)
2431
25-
For `ReverseDiff`, pass `adtype=AutoReverseDiff()` to the sampler constructor. An additional keyword argument called `compile` can be provided to `AutoReverseDiff`. It specifies whether to pre-record the tape only once and reuse it later (`compile` is set to `false` by default, which means no pre-recording). This can substantially improve performance, but risks silently incorrect results if not used with care.
32+
# Note that if you specify a custom AD backend, you must also import it.
33+
import Mooncake
2634
27-
Pre-recorded tapes should only be used if you are absolutely certain that the sequence of operations performed in your code does not change between different executions of your model.
35+
@model function f()
36+
x ~ Normal()
37+
# Rest of your model here
38+
end
2839
29-
Thus, e.g., in the model definition and all implicitly and explicitly called functions in the model, all loops should be of fixed size, and `if`-statements should consistently execute the same branches.
30-
For instance, `if`-statements with conditions that can be determined at compile time or conditions that depend only on fixed properties of the model, e.g. fixed data.
31-
However, `if`-statements that depend on the model parameters can take different branches during sampling; hence, the compiled tape might be incorrect.
32-
Thus you must not use compiled tapes when your model makes decisions based on the model parameters, and you should be careful if you compute functions of parameters that those functions do not have branching which might cause them to execute different code for different values of the parameter.
40+
sample(f(), HMC(0.1, 5; adtype=AutoMooncake(; config=nothing)), 100)
41+
```
3342

34-
The previously used interface functions including `ADBackend`, `setadbackend`, `setsafe`, `setchunksize`, and `setrdcache` have been removed.
43+
By default, if you do not specify a backend, Turing will default to [ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl).
44+
In this case, you do not need to import ForwardDiff, as it is already a dependency of Turing.
3545

36-
For `Mooncake`, pass `adtype=AutoMooncake(; config=nothing)` to the sampler constructor.
46+
## Choosing an AD Backend
3747

38-
## Compositional Sampling with Differing AD Modes
48+
There are two aspects to choosing an AD backend: firstly, what backends are available; and secondly, which backend is best for your model.
3949

40-
Turing supports intermixed automatic differentiation methods for different variable spaces. The snippet below shows using `ForwardDiff` to sample the mean (`m`) parameter, and using `ReverseDiff` for the variance (`s`) parameter:
50+
### Usable AD Backends
4151

42-
```{julia}
43-
using Turing
44-
using ReverseDiff
52+
Turing.jl uses the functionality in [DifferentiationInterface.jl](https://github.com/JuliaDiff/DifferentiationInterface.jl) ('DI') to interface with AD libraries in a unified way.
53+
In principle, any AD library that DI provides an interface for can be used with Turing; you should consult the [DI documentation](https://juliadiff.org/DifferentiationInterface.jl/DifferentiationInterface/stable/) for an up-to-date list of compatible AD libraries.
4554

46-
# Define a simple Normal model with unknown mean and variance.
47-
@model function gdemo(x, y)
48-
s² ~ InverseGamma(2, 3)
49-
m ~ Normal(0, sqrt(s²))
50-
x ~ Normal(m, sqrt(s²))
51-
return y ~ Normal(m, sqrt(s²))
52-
end
55+
Note, however, that not all AD libraries in there are thoroughly tested on Turing models.
56+
Thus, it is possible that some of them will either error (because they don't know how to differentiate through Turing's code), or maybe even silently give incorrect results (if you are very unlucky).
57+
Turing is most extensively tested with **ForwardDiff.jl** (the default), **ReverseDiff.jl**, and **Mooncake.jl**.
58+
We also run a smaller set of tests with Enzyme.jl.
5359

54-
# Sample using Gibbs and varying autodiff backends.
55-
c = sample(
56-
gdemo(1.5, 2),
57-
Gibbs(
58-
:m => HMC(0.1, 5; adtype=AutoForwardDiff(; chunksize=0)),
59-
:s² => HMC(0.1, 5; adtype=AutoReverseDiff(false)),
60-
),
61-
1000,
62-
progress=false,
63-
)
64-
```
60+
### ADTests
61+
62+
Before describing how to choose the best AD backend for your model, we should mention that we also publish a table of benchmarks for various models and AD backends in [the ADTests website](https://turinglang.org/ADTests/).
63+
These models aim to capture a variety of different features of Turing.jl and Julia in general, so that you can see which AD backends may be compatible with your model.
64+
Benchmarks are also included, although it should be noted that many of the models in ADTests are small and thus the timings may not be representative of larger, real-life models.
65+
66+
If you have suggestions for other models to include, please do let us know by [creating an issue on GitHub](https://github.com/TuringLang/ADTests/issues/new)!
6567

66-
Generally, reverse-mode AD, for instance `ReverseDiff`, is faster when sampling from variables of high dimensionality (greater than 20), while forward-mode AD, for instance `ForwardDiff`, is more efficient for lower-dimension variables. This functionality allows those who are performance sensitive to fine tune their automatic differentiation for their specific models.
68+
### The Best AD Backend for Your Model
6769

68-
If the differentiation method is not specified in this way, Turing will default to using whatever the global AD backend is.
69-
Currently, this defaults to `ForwardDiff`.
70+
Given the number of possible backends, how do you choose the best one for your model?
7071

71-
The most reliable way to ensure you are using the fastest AD that works for your problem is to benchmark them using the functionality in DynamicPPL (see [the API documentation](https://turinglang.org/DynamicPPL.jl/stable/api/#AD-testing-and-benchmarking-utilities)):
72+
A simple heuristic is to look at the number of parameters in your model.
73+
The log density of the model, i.e. the function being differentiated, is a function that goes from $\mathbb{R}^n \to \mathbb{R}$, where $n$ is the number of parameters in your model.
74+
For models with a small number of parameters (say up to ~ 20), forward-mode AD (e.g. ForwardDiff) is generally faster due to a smaller overhead.
75+
On the other hand, for models with a large number of parameters, reverse-mode AD (e.g. ReverseDiff or Mooncake) is generally faster as it computes the gradients with respect to all parameters in a single pass.
76+
77+
The most exact way to ensure you are using the fastest AD that works for your problem is to benchmark them using the functionality in DynamicPPL (see [the API documentation](https://turinglang.org/DynamicPPL.jl/stable/api/#AD-testing-and-benchmarking-utilities)):
7278

7379
```{julia}
80+
using ADTypes
7481
using DynamicPPL.TestUtils.AD: run_ad, ADResult
7582
using ForwardDiff, ReverseDiff
7683
84+
@model function gdemo(x, y)
85+
s² ~ InverseGamma(2, 3)
86+
m ~ Normal(0, sqrt(s²))
87+
x ~ Normal(m, sqrt(s²))
88+
return y ~ Normal(m, sqrt(s²))
89+
end
7790
model = gdemo(1.5, 2)
7891
7992
for adtype in [AutoForwardDiff(), AutoReverseDiff()]
@@ -84,6 +97,32 @@ end
8497

8598
In this specific instance, ForwardDiff is clearly faster (due to the small size of the model).
8699

87-
We also have a table of benchmarks for various models and AD backends in [the ADTests website](https://turinglang.org/ADTests/).
88-
These models aim to capture a variety of different Turing.jl features.
89-
If you have suggestions for things to include, please do let us know by [creating an issue on GitHub](https://github.com/TuringLang/ADTests/issues/new)!
100+
::: {.callout-note}
101+
## A note about ReverseDiff's `compile` argument
102+
103+
The additional keyword argument `compile=true` for `AutoReverseDiff` specifies whether to pre-record the tape only once and reuse it later.
104+
By default, this is set to `false`, which means no pre-recording.
105+
Setting `compile=true` can substantially improve performance, but risks silently incorrect results if not used with care.
106+
Pre-recorded tapes should only be used if you are absolutely certain that the sequence of operations performed in your code does not change between different executions of your model.
107+
:::
108+
109+
## Compositional Sampling with Differing AD Modes
110+
111+
When using Gibbs sampling, Turing also supports mixed automatic differentiation methods for different variable spaces.
112+
The following snippet shows how one can use `ForwardDiff` to sample the mean (`m`) parameter, and `ReverseDiff` for the variance (`s`) parameter:
113+
114+
```{julia}
115+
using Turing
116+
using ReverseDiff
117+
118+
# Sample using Gibbs and varying autodiff backends.
119+
c = sample(
120+
gdemo(1.5, 2),
121+
Gibbs(
122+
:m => HMC(0.1, 5; adtype=AutoForwardDiff()),
123+
:s² => HMC(0.1, 5; adtype=AutoReverseDiff()),
124+
),
125+
1000,
126+
progress=false,
127+
)
128+
```

usage/troubleshooting/index.qmd

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,3 +102,60 @@ sample(model, NUTS(), 1000; initial_params=rand(Vector, model))
102102
```
103103

104104
More generally, you may also consider reparameterising the model to avoid such issues.
105+
106+
## ForwardDiff type parameters
107+
108+
> MethodError: no method matching Float64(::ForwardDiff.Dual{... The type `Float64` exists, but no method is defined for this combination of argument types when trying to construct it.
109+
110+
A common error with ForwardDiff looks like this:
111+
112+
```{julia}
113+
#| error: true
114+
@model function forwarddiff_fail()
115+
x = Float64[0.0, 1.0]
116+
a ~ Normal()
117+
@show typeof(a)
118+
x[1] = a
119+
b ~ MvNormal(x, I)
120+
end
121+
sample(forwarddiff_fail(), NUTS(; adtype=AutoForwardDiff()), 10)
122+
```
123+
124+
The problem here is the line `x[1] = a`.
125+
When the log probability density of the model is calculated, `a` is sampled from a normal distribution and is thus a Float64; however, when ForwardDiff calculates the gradient of the log density, `a` is a `ForwardDiff.Dual` object.
126+
However, `x` is _always_ a `Vector{Float64}`, and the call `x[1] = a` attempts to insert a `Dual` object into a `Vector{Float64}`, which is not allowed.
127+
128+
::: {.callout-note}
129+
In more depth: the basic premise of ForwardDiff is that functions have to accept `Real` parameters instead of `Float64` (since `Dual` is a subtype of `Real`).
130+
Here, the line `x[1] = a` is equivalent to `setindex!(x, a, 1)`, and although the method `setindex!(::Vector{Float64}, ::Real, ...)` does exist, it attempts to convert the `Real` into a `Float64`, which is where it fails.
131+
:::
132+
133+
There are two ways around this.
134+
135+
Firstly, you could broaden the type of the container:
136+
137+
```{julia}
138+
@model function forwarddiff_working1()
139+
x = Real[0.0, 1.0]
140+
a ~ Normal()
141+
x[1] = a
142+
b ~ MvNormal(x, I)
143+
end
144+
sample(forwarddiff_working1(), NUTS(; adtype=AutoForwardDiff()), 10)
145+
```
146+
147+
This is generally unfavourable because the `Vector{Real}` type contains an abstract type parameter.
148+
As a result, memory allocation is less efficient (because the compiler does not know the size of each vector's elements).
149+
Furthermore, the compiler cannot infer the type of `x[1]`, which can lead to type stability issues (to see this in action, run `x = Real[0.0, 1.0]; @code_warntype x[1]` in the Julia REPL).
150+
151+
A better solution is to pass a type as a parameter to the model:
152+
153+
```{julia}
154+
@model function forwarddiff_working2(::Type{T}=Float64) where T
155+
x = T[0.0, 1.0]
156+
a ~ Normal()
157+
x[1] = a
158+
b ~ MvNormal(x, I)
159+
end
160+
sample(forwarddiff_working2(), NUTS(; adtype=AutoForwardDiff()), 10)
161+
```

0 commit comments

Comments
 (0)