Skip to content

Commit d420cd1

Browse files
authored
Add a new page on submodels and how to use them (#627)
* Add page on submodels * Actually run the linear model bit * Improvements
1 parent 6df0913 commit d420cd1

File tree

4 files changed

+351
-1
lines changed

4 files changed

+351
-1
lines changed

_quarto.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ website:
5757
collapse-level: 1
5858
contents:
5959
- usage/automatic-differentiation/index.qmd
60+
- usage/submodels/index.qmd
6061
- usage/custom-distribution/index.qmd
6162
- usage/probability-interface/index.qmd
6263
- usage/modifying-logprob/index.qmd
@@ -196,6 +197,7 @@ usage-modifying-logprob: usage/modifying-logprob
196197
usage-performance-tips: usage/performance-tips
197198
usage-probability-interface: usage/probability-interface
198199
usage-sampler-visualisation: usage/sampler-visualisation
200+
usage-submodels: usage/submodels
199201
usage-tracking-extra-quantities: usage/tracking-extra-quantities
200202
usage-troubleshooting: usage/troubleshooting
201203

core-functionality/index.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ end
4545

4646
Indexing and field access is supported, so that `x[i] ~ distr` and `x.field ~ distr` are valid statements.
4747
However, in these cases, `x` must be defined in the scope of the model function.
48-
`distr` is typically either a distribution from Distributions.jl (see [this page]({{< meta usage-custom-distribution >}}) for implementing custom distributions), or another Turing model wrapped in `to_submodel()`.
48+
`distr` is typically either a distribution from Distributions.jl (see [this page]({{< meta usage-custom-distribution >}}) for implementing custom distributions), or another Turing model wrapped in `to_submodel()` (see [this page]({{< meta usage-submodels >}}) for submodels).
4949

5050
There are two classes of tilde-statements: _observe_ statements, where the left-hand side contains an observed value, and _assume_ statements, where the left-hand side is not observed.
5151
These respectively correspond to likelihood and prior terms.

developers/contexts/submodel-condition/index.qmd

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,10 @@ title: "Conditioning and fixing in submodels"
33
engine: julia
44
---
55

6+
This page is a technical explanation of how contexts are managed for submodels.
7+
8+
A user-facing guide to submodels is available on [this page]({{< meta usage-submodels }}>).
9+
610
## PrefixContext
711

812
Submodels in DynamicPPL come with the notion of _prefixing_ variables: under the hood, this is implemented by adding a `PrefixContext` to the context stack.

usage/submodels/index.qmd

Lines changed: 344 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,344 @@
1+
---
2+
title: Submodels
3+
engine: julia
4+
---
5+
6+
```{julia}
7+
using Turing
8+
using Random: Xoshiro, seed!
9+
seed!(468)
10+
```
11+
12+
In Turing.jl, you can define models and use them as components of larger models (i.e., _submodels_), using the `to_submodel` function.
13+
In this way, you can (for example) define a model once and use it in multiple places:
14+
15+
```{julia}
16+
@model function inner()
17+
a ~ Normal()
18+
return a + 100
19+
end
20+
21+
@model function outer()
22+
# This line adds the variable `x.a` to the chain.
23+
# The inner variable `a` is prefixed with the
24+
# left-hand side of the `~` operator, i.e. `x`.
25+
x ~ to_submodel(inner())
26+
# Here, the value of x will be `a + 100` because
27+
# that is the return value of the submodel.
28+
b ~ Normal(x)
29+
end
30+
```
31+
32+
If we sample from this model, we would expect that `x.a` should be close to zero, and `b` close to 100:
33+
34+
```{julia}
35+
rand(outer())
36+
```
37+
38+
## Manipulating submodels
39+
40+
### Conditioning
41+
42+
In general, everything that can be done to a model 'carries over' to when it is used as a submodel.
43+
For example, you can condition a variable in a submodel in two ways:
44+
45+
```{julia}
46+
# From the outside; the prefix `x` must be applied because
47+
# from the perspective of `outer`, the variable is called
48+
# `x.a`.
49+
outer_conditioned1 = outer() | (@varname(x.a) => 1);
50+
rand(Xoshiro(468), outer_conditioned1)
51+
```
52+
53+
Or equivalently, from the inside:
54+
55+
```{julia}
56+
@model function outer_conditioned2()
57+
# The prefix doesn't need to be applied here because
58+
# `inner` itself has no knowledge of the prefix.
59+
x ~ to_submodel(inner() | (@varname(a) => 1))
60+
b ~ Normal(x)
61+
end
62+
rand(Xoshiro(468), outer_conditioned2())
63+
```
64+
65+
In both cases the variable `x.a` does not appear.
66+
67+
Note, however, that you cannot condition on the return value of a submodel.
68+
Thus, for example, if we had:
69+
70+
```{julia}
71+
@model function inner_sensible()
72+
a ~ Normal()
73+
return a
74+
end
75+
76+
@model function outer()
77+
x ~ to_submodel(inner())
78+
b ~ Normal(x)
79+
end
80+
```
81+
82+
and we tried to condition on `x`, it would be silently ignored, even though `x` is equal to `a`.
83+
84+
The reason for this is because it is entirely coincidental that the return value of the submodel is equal to `a`.
85+
In general, a return value can be anything, and conditioning on it is in general not a meaningful operation.
86+
87+
### Prefixing
88+
89+
Prefixing is the only place where submodel behaviour is 'special' compared to that of ordinary models.
90+
91+
By default, all variables in a submodel are prefixed with the left-hand side of the tilde-statement.
92+
This is done to avoid clashes if the same submodel is used multiple times in a model.
93+
94+
You can disable this by passing `false` as the second argument to `to_submodel`:
95+
96+
```{julia}
97+
@model function outer_no_prefix()
98+
x ~ to_submodel(inner(), false)
99+
b ~ Normal(x)
100+
end
101+
rand(outer_no_prefix())
102+
```
103+
104+
## Accessing submodel variables
105+
106+
In all of the examples above, `x` is equal to `a + 100` because that is the return value of the submodel.
107+
To access the actual latent variables in the submodel itself, the simplest option is to include the variable in the return value of the submodel:
108+
109+
```{julia}
110+
@model function inner_with_retval()
111+
a ~ Normal()
112+
# You can return anything you like from the model,
113+
# but if you want to access the latent variables, they
114+
# should be included in the return value.
115+
return (; a=a, a_plus_100=a + 100)
116+
end
117+
@model function outer_with_retval()
118+
# The variable `x` will now contain the return value of the submodel,
119+
# which is a named tuple with `a` and `a_plus_100`.
120+
x ~ to_submodel(inner_with_retval())
121+
# You can access the value of x.a directly, because
122+
# x is a NamedTuple which contains `a`. Since `b` is
123+
# centred on `x.a`, it should be close to 0, not 100.
124+
b ~ Normal(x.a)
125+
end
126+
rand(Xoshiro(468), outer_with_retval())
127+
```
128+
129+
You can also manually access the value by looking inside the special `__varinfo__` object.
130+
131+
::: {.callout-warning}
132+
This relies on DynamicPPL internals and we do not recommend doing this unless you have no other option, e.g., if the submodel is defined in a different package which you do not control.
133+
:::
134+
135+
```{julia}
136+
@model function outer_with_varinfo()
137+
x ~ to_submodel(inner())
138+
# Access the value of x.a
139+
a_value = __varinfo__[@varname(x.a)]
140+
b ~ Normal(a_value)
141+
end
142+
rand(Xoshiro(468), outer_with_varinfo())
143+
```
144+
145+
## Example: linear models
146+
147+
Here is a motivating example for the use of submodels.
148+
Suppose we want to fit a (very simplified) regression model to some data $x$ and $y$, where
149+
150+
$$\begin{align}
151+
c_0 &\sim \text{Normal}(0, 5) \\
152+
c_1 &\sim \text{Normal}(0, 5) \\
153+
\mu &= c_0 + c_1x \\
154+
y &\sim d
155+
\end{align}$$
156+
157+
where $d$ is _some_ distribution parameterised by the value of $\mu$, which we don't know the exact form of.
158+
159+
In practice, what we would do is to write several different models, one for each function $f$:
160+
161+
```{julia}
162+
@model function normal(x, y)
163+
c0 ~ Normal(0, 5)
164+
c1 ~ Normal(0, 5)
165+
mu = c0 .+ c1 .* x
166+
# Assume that y = mu, and that the noise in `y` is
167+
# normally distributed with standard deviation sigma
168+
sigma ~ truncated(Cauchy(0, 3); lower=0)
169+
for i in eachindex(y)
170+
y[i] ~ Normal(mu[i], sigma)
171+
end
172+
end
173+
174+
@model function logpoisson(x, y)
175+
c0 ~ Normal(0, 5)
176+
c1 ~ Normal(0, 5)
177+
mu = c0 .+ c1 .* x
178+
# exponentiate mu because the rate parameter of
179+
# a Poisson distribution must be positive
180+
for i in eachindex(y)
181+
y[i] ~ Poisson(exp(mu[i]))
182+
end
183+
end
184+
185+
# and so on...
186+
```
187+
188+
::: {.callout-note}
189+
You could use `arraydist` to avoid the loops: for example, in `logpoisson`, one could write `y ~ arraydist(Poisson.(exp.(mu)))`, but for simplicity in this tutorial we spell it out fully.
190+
:::
191+
192+
We would then fit all of our models and use some criterion to test which model is most suitable (see e.g. [Wikipedia](https://en.wikipedia.org/wiki/Model_selection), or section 3.4 of Bishop's *Pattern Recognition and Machine Learning*).
193+
194+
However, the code above is quite repetitive.
195+
For example, if we wanted to adjust the priors on `c0` and `c1`, we would have to do it in each model separately.
196+
If this was any other kind of code, we would naturally think of extracting the common parts into a separate function.
197+
In this case we can do exactly that with a submodel:
198+
199+
```{julia}
200+
@model function priors(x)
201+
c0 ~ Normal(0, 5)
202+
c1 ~ Normal(0, 5)
203+
mu = c0 .+ c1 .* x
204+
return (; c0=c0, c1=c1, mu=mu)
205+
end
206+
207+
@model function normal(x, y)
208+
ps = to_submodel(priors(x))
209+
sigma ~ truncated(Cauchy(0, 3); lower=0)
210+
for i in eachindex(y)
211+
y[i] ~ Normal(ps.mu[i], sigma)
212+
end
213+
end
214+
215+
@model function logpoisson(x, y)
216+
ps = to_submodel(priors(x))
217+
for i in eachindex(y)
218+
y[i] ~ Poisson(exp(ps.mu[i]))
219+
end
220+
end
221+
```
222+
223+
One could go even further and extract the `y` section into its own submodel as well, which would bring us to a generalised linear modelling interface that does not actually require the user to define their own Turing models at all:
224+
225+
```{julia}
226+
@model function normal_family(mu, y)
227+
sigma ~ truncated(Cauchy(0, 3); lower=0)
228+
for i in eachindex(y)
229+
y[i] ~ Normal(mu[i], sigma)
230+
end
231+
return nothing
232+
end
233+
234+
@model function logpoisson_family(mu, y)
235+
for i in eachindex(y)
236+
y[i] ~ Poisson(exp(mu[i]))
237+
end
238+
return nothing
239+
end
240+
241+
# An end-user could just use this function. Of course,
242+
# a more thorough interface would also allow the user to
243+
# specify priors, etc.
244+
function make_model(x, y, family::Symbol)
245+
if family == :normal
246+
family_model = normal_family
247+
elseif family == :logpoisson
248+
family_model = logpoisson_family
249+
else
250+
error("unknown family: `$family`")
251+
end
252+
253+
@model function general(x, y)
254+
ps ~ to_submodel(priors(x), false)
255+
_n ~ to_submodel(family_model(ps.mu, y), false)
256+
end
257+
return general(x, y)
258+
end
259+
260+
sample(make_model([1, 2, 3], [1, 2, 3], :normal), NUTS(), 1000; progress=false)
261+
```
262+
263+
While this final example really showcases the composability of submodels, it also illustrates a minor syntactic drawback.
264+
In the case of the `family_model`, we do not care about its return value because it is not used anywhere else in the model.
265+
Ideally, we should therefore not need to place anything on the left-hand side of `to_submodel`.
266+
However, because the special behaviour of `to_submodel` relies on the tilde operator, and the tilde operator requires a left-hand side, we have to use a dummy variable (here `_n`).
267+
268+
Furthermore, because the left-hand side of a tilde-statement must be a valid variable name, we cannot use destructuring syntax on the left-hand side of `to_submodel`, even if the return value is a NamedTuple.
269+
Thus, for example, the following is not allowed:
270+
271+
```julia
272+
(; c0, c1, mu) ~ to_submodel(priors(x))
273+
```
274+
275+
To use destructuring syntax, you would have to add a separate line:
276+
277+
```julia
278+
ps = to_submodel(priors(x))
279+
(; c0, c1, mu) = ps
280+
```
281+
282+
## Submodels versus distributions
283+
284+
Finally, we end with a discussion of why some of the behaviour for submodels above has come about.
285+
This is slightly more behind-the-scenes and therefore will likely be of most interest to Turing developers.
286+
287+
Fundamentally, submodels are to be compared against distributions: both of them can appear on the right-hand side of a tilde statement.
288+
However, distributions only have one 'output', i.e., the value that is sampled from them:
289+
290+
```{julia}
291+
dist = Normal()
292+
rand(dist)
293+
```
294+
295+
Another point to bear in mind is that, given a sample from `dist`, asking for its log-probability is a meaningful calculation.
296+
297+
```{julia}
298+
logpdf(dist, rand(dist))
299+
```
300+
301+
In contrast, models (and hence submodels) have two different outputs: the latent variables, and the return value.
302+
These are accessed respectively using `rand(model)` and `model()`:
303+
304+
```{julia}
305+
@model function f()
306+
a ~ Normal()
307+
return "hello, world."
308+
end
309+
310+
model = f()
311+
```
312+
313+
```{julia}
314+
# Latent variables
315+
rand(model)
316+
```
317+
318+
```{julia}
319+
# Return value
320+
model()
321+
```
322+
323+
Just like for distributions, one can indeed ask for the log-probability of the latent variables (although we have to specify whether we want the joint, likelihood, or prior):
324+
325+
```{julia}
326+
logjoint(model, rand(model))
327+
```
328+
329+
But it does not make sense to ask for the log-probability of the return value (which in this case is a string, and in general, could be literally any object).
330+
331+
The fact that we have what looks like a unified notation for these is a bit of a lie, since it hides this distinction.
332+
In particular, for `x ~ distr`, `x` is assigned the value of `rand(distr)`; but for `y ~ submodel`, `y` is assigned the value of `submodel()`.
333+
This is why, for example, it is impossible to condition on `y` in `y ~ ...`; we can only condition on `x` in `x ~ dist`.
334+
335+
Eventually we would like to make this more logically consistent.
336+
In particular, it is clear that `y ~ submodel` should return not one but two objects: the latent variables and the return value.
337+
Furthermore, it should be possible to condition on the latent variables, but not on the return value.
338+
See [this issue](https://github.com/TuringLang/Turing.jl/issues/2485) for an ongoing discussion of the best way to accomplish this.
339+
340+
It should be mentioned that extracting the latent variables from a submodel is not entirely trivial since the submodel is run using the same `VarInfo` as the parent model (i.e., we would have to do a before-and-after comparison to see which _new_ variables were added by the submodel).
341+
342+
Also, we are still working out the exact data structure that should be used to represent the latent variables.
343+
In the examples above `rand(model)` returns a `NamedTuple`, but this actually causes loss of information because the keys of a `NamedTuple` are `Symbol`s, whereas we really want to use `VarName`s.
344+
See [this issue](https://github.com/TuringLang/DynamicPPL.jl/issues/900) for a current proposal.

0 commit comments

Comments
 (0)