|
| 1 | +--- |
| 2 | +title: Submodels |
| 3 | +engine: julia |
| 4 | +--- |
| 5 | + |
| 6 | +```{julia} |
| 7 | +using Turing |
| 8 | +using Random: Xoshiro, seed! |
| 9 | +seed!(468) |
| 10 | +``` |
| 11 | + |
| 12 | +In Turing.jl, you can define models and use them as components of larger models (i.e., _submodels_), using the `to_submodel` function. |
| 13 | +In this way, you can (for example) define a model once and use it in multiple places: |
| 14 | + |
| 15 | +```{julia} |
| 16 | +@model function inner() |
| 17 | + a ~ Normal() |
| 18 | + return a + 100 |
| 19 | +end |
| 20 | +
|
| 21 | +@model function outer() |
| 22 | + # This line adds the variable `x.a` to the chain. |
| 23 | + # The inner variable `a` is prefixed with the |
| 24 | + # left-hand side of the `~` operator, i.e. `x`. |
| 25 | + x ~ to_submodel(inner()) |
| 26 | + # Here, the value of x will be `a + 100` because |
| 27 | + # that is the return value of the submodel. |
| 28 | + b ~ Normal(x) |
| 29 | +end |
| 30 | +``` |
| 31 | + |
| 32 | +If we sample from this model, we would expect that `x.a` should be close to zero, and `b` close to 100: |
| 33 | + |
| 34 | +```{julia} |
| 35 | +rand(outer()) |
| 36 | +``` |
| 37 | + |
| 38 | +## Manipulating submodels |
| 39 | + |
| 40 | +### Conditioning |
| 41 | + |
| 42 | +In general, everything that can be done to a model 'carries over' to when it is used as a submodel. |
| 43 | +For example, you can condition a variable in a submodel in two ways: |
| 44 | + |
| 45 | +```{julia} |
| 46 | +# From the outside; the prefix `x` must be applied because |
| 47 | +# from the perspective of `outer`, the variable is called |
| 48 | +# `x.a`. |
| 49 | +outer_conditioned1 = outer() | (@varname(x.a) => 1); |
| 50 | +rand(Xoshiro(468), outer_conditioned1) |
| 51 | +``` |
| 52 | + |
| 53 | +Or equivalently, from the inside: |
| 54 | + |
| 55 | +```{julia} |
| 56 | +@model function outer_conditioned2() |
| 57 | + # The prefix doesn't need to be applied here because |
| 58 | + # `inner` itself has no knowledge of the prefix. |
| 59 | + x ~ to_submodel(inner() | (@varname(a) => 1)) |
| 60 | + b ~ Normal(x) |
| 61 | +end |
| 62 | +rand(Xoshiro(468), outer_conditioned2()) |
| 63 | +``` |
| 64 | + |
| 65 | +In both cases the variable `x.a` does not appear. |
| 66 | + |
| 67 | +Note, however, that you cannot condition on the return value of a submodel. |
| 68 | +Thus, for example, if we had: |
| 69 | + |
| 70 | +```{julia} |
| 71 | +@model function inner_sensible() |
| 72 | + a ~ Normal() |
| 73 | + return a |
| 74 | +end |
| 75 | +
|
| 76 | +@model function outer() |
| 77 | + x ~ to_submodel(inner()) |
| 78 | + b ~ Normal(x) |
| 79 | +end |
| 80 | +``` |
| 81 | + |
| 82 | +and we tried to condition on `x`, it would be silently ignored, even though `x` is equal to `a`. |
| 83 | + |
| 84 | +The reason for this is because it is entirely coincidental that the return value of the submodel is equal to `a`. |
| 85 | +In general, a return value can be anything, and conditioning on it is in general not a meaningful operation. |
| 86 | + |
| 87 | +### Prefixing |
| 88 | + |
| 89 | +Prefixing is the only place where submodel behaviour is 'special' compared to that of ordinary models. |
| 90 | + |
| 91 | +By default, all variables in a submodel are prefixed with the left-hand side of the tilde-statement. |
| 92 | +This is done to avoid clashes if the same submodel is used multiple times in a model. |
| 93 | + |
| 94 | +You can disable this by passing `false` as the second argument to `to_submodel`: |
| 95 | + |
| 96 | +```{julia} |
| 97 | +@model function outer_no_prefix() |
| 98 | + x ~ to_submodel(inner(), false) |
| 99 | + b ~ Normal(x) |
| 100 | +end |
| 101 | +rand(outer_no_prefix()) |
| 102 | +``` |
| 103 | + |
| 104 | +## Accessing submodel variables |
| 105 | + |
| 106 | +In all of the examples above, `x` is equal to `a + 100` because that is the return value of the submodel. |
| 107 | +To access the actual latent variables in the submodel itself, the simplest option is to include the variable in the return value of the submodel: |
| 108 | + |
| 109 | +```{julia} |
| 110 | +@model function inner_with_retval() |
| 111 | + a ~ Normal() |
| 112 | + # You can return anything you like from the model, |
| 113 | + # but if you want to access the latent variables, they |
| 114 | + # should be included in the return value. |
| 115 | + return (; a=a, a_plus_100=a + 100) |
| 116 | +end |
| 117 | +@model function outer_with_retval() |
| 118 | + # The variable `x` will now contain the return value of the submodel, |
| 119 | + # which is a named tuple with `a` and `a_plus_100`. |
| 120 | + x ~ to_submodel(inner_with_retval()) |
| 121 | + # You can access the value of x.a directly, because |
| 122 | + # x is a NamedTuple which contains `a`. Since `b` is |
| 123 | + # centred on `x.a`, it should be close to 0, not 100. |
| 124 | + b ~ Normal(x.a) |
| 125 | +end |
| 126 | +rand(Xoshiro(468), outer_with_retval()) |
| 127 | +``` |
| 128 | + |
| 129 | +You can also manually access the value by looking inside the special `__varinfo__` object. |
| 130 | + |
| 131 | +::: {.callout-warning} |
| 132 | +This relies on DynamicPPL internals and we do not recommend doing this unless you have no other option, e.g., if the submodel is defined in a different package which you do not control. |
| 133 | +::: |
| 134 | + |
| 135 | +```{julia} |
| 136 | +@model function outer_with_varinfo() |
| 137 | + x ~ to_submodel(inner()) |
| 138 | + # Access the value of x.a |
| 139 | + a_value = __varinfo__[@varname(x.a)] |
| 140 | + b ~ Normal(a_value) |
| 141 | +end |
| 142 | +rand(Xoshiro(468), outer_with_varinfo()) |
| 143 | +``` |
| 144 | + |
| 145 | +## Example: linear models |
| 146 | + |
| 147 | +Here is a motivating example for the use of submodels. |
| 148 | +Suppose we want to fit a (very simplified) regression model to some data $x$ and $y$, where |
| 149 | + |
| 150 | +$$\begin{align} |
| 151 | +c_0 &\sim \text{Normal}(0, 5) \\ |
| 152 | +c_1 &\sim \text{Normal}(0, 5) \\ |
| 153 | +\mu &= c_0 + c_1x \\ |
| 154 | +y &\sim d |
| 155 | +\end{align}$$ |
| 156 | + |
| 157 | +where $d$ is _some_ distribution parameterised by the value of $\mu$, which we don't know the exact form of. |
| 158 | + |
| 159 | +In practice, what we would do is to write several different models, one for each function $f$: |
| 160 | + |
| 161 | +```{julia} |
| 162 | +@model function normal(x, y) |
| 163 | + c0 ~ Normal(0, 5) |
| 164 | + c1 ~ Normal(0, 5) |
| 165 | + mu = c0 .+ c1 .* x |
| 166 | + # Assume that y = mu, and that the noise in `y` is |
| 167 | + # normally distributed with standard deviation sigma |
| 168 | + sigma ~ truncated(Cauchy(0, 3); lower=0) |
| 169 | + for i in eachindex(y) |
| 170 | + y[i] ~ Normal(mu[i], sigma) |
| 171 | + end |
| 172 | +end |
| 173 | +
|
| 174 | +@model function logpoisson(x, y) |
| 175 | + c0 ~ Normal(0, 5) |
| 176 | + c1 ~ Normal(0, 5) |
| 177 | + mu = c0 .+ c1 .* x |
| 178 | + # exponentiate mu because the rate parameter of |
| 179 | + # a Poisson distribution must be positive |
| 180 | + for i in eachindex(y) |
| 181 | + y[i] ~ Poisson(exp(mu[i])) |
| 182 | + end |
| 183 | +end |
| 184 | +
|
| 185 | +# and so on... |
| 186 | +``` |
| 187 | + |
| 188 | +::: {.callout-note} |
| 189 | +You could use `arraydist` to avoid the loops: for example, in `logpoisson`, one could write `y ~ arraydist(Poisson.(exp.(mu)))`, but for simplicity in this tutorial we spell it out fully. |
| 190 | +::: |
| 191 | + |
| 192 | +We would then fit all of our models and use some criterion to test which model is most suitable (see e.g. [Wikipedia](https://en.wikipedia.org/wiki/Model_selection), or section 3.4 of Bishop's *Pattern Recognition and Machine Learning*). |
| 193 | + |
| 194 | +However, the code above is quite repetitive. |
| 195 | +For example, if we wanted to adjust the priors on `c0` and `c1`, we would have to do it in each model separately. |
| 196 | +If this was any other kind of code, we would naturally think of extracting the common parts into a separate function. |
| 197 | +In this case we can do exactly that with a submodel: |
| 198 | + |
| 199 | +```{julia} |
| 200 | +@model function priors(x) |
| 201 | + c0 ~ Normal(0, 5) |
| 202 | + c1 ~ Normal(0, 5) |
| 203 | + mu = c0 .+ c1 .* x |
| 204 | + return (; c0=c0, c1=c1, mu=mu) |
| 205 | +end |
| 206 | +
|
| 207 | +@model function normal(x, y) |
| 208 | + ps = to_submodel(priors(x)) |
| 209 | + sigma ~ truncated(Cauchy(0, 3); lower=0) |
| 210 | + for i in eachindex(y) |
| 211 | + y[i] ~ Normal(ps.mu[i], sigma) |
| 212 | + end |
| 213 | +end |
| 214 | +
|
| 215 | +@model function logpoisson(x, y) |
| 216 | + ps = to_submodel(priors(x)) |
| 217 | + for i in eachindex(y) |
| 218 | + y[i] ~ Poisson(exp(ps.mu[i])) |
| 219 | + end |
| 220 | +end |
| 221 | +``` |
| 222 | + |
| 223 | +One could go even further and extract the `y` section into its own submodel as well, which would bring us to a generalised linear modelling interface that does not actually require the user to define their own Turing models at all: |
| 224 | + |
| 225 | +```{julia} |
| 226 | +@model function normal_family(mu, y) |
| 227 | + sigma ~ truncated(Cauchy(0, 3); lower=0) |
| 228 | + for i in eachindex(y) |
| 229 | + y[i] ~ Normal(mu[i], sigma) |
| 230 | + end |
| 231 | + return nothing |
| 232 | +end |
| 233 | +
|
| 234 | +@model function logpoisson_family(mu, y) |
| 235 | + for i in eachindex(y) |
| 236 | + y[i] ~ Poisson(exp(mu[i])) |
| 237 | + end |
| 238 | + return nothing |
| 239 | +end |
| 240 | +
|
| 241 | +# An end-user could just use this function. Of course, |
| 242 | +# a more thorough interface would also allow the user to |
| 243 | +# specify priors, etc. |
| 244 | +function make_model(x, y, family::Symbol) |
| 245 | + if family == :normal |
| 246 | + family_model = normal_family |
| 247 | + elseif family == :logpoisson |
| 248 | + family_model = logpoisson_family |
| 249 | + else |
| 250 | + error("unknown family: `$family`") |
| 251 | + end |
| 252 | +
|
| 253 | + @model function general(x, y) |
| 254 | + ps ~ to_submodel(priors(x), false) |
| 255 | + _n ~ to_submodel(family_model(ps.mu, y), false) |
| 256 | + end |
| 257 | + return general(x, y) |
| 258 | +end |
| 259 | +
|
| 260 | +sample(make_model([1, 2, 3], [1, 2, 3], :normal), NUTS(), 1000; progress=false) |
| 261 | +``` |
| 262 | + |
| 263 | +While this final example really showcases the composability of submodels, it also illustrates a minor syntactic drawback. |
| 264 | +In the case of the `family_model`, we do not care about its return value because it is not used anywhere else in the model. |
| 265 | +Ideally, we should therefore not need to place anything on the left-hand side of `to_submodel`. |
| 266 | +However, because the special behaviour of `to_submodel` relies on the tilde operator, and the tilde operator requires a left-hand side, we have to use a dummy variable (here `_n`). |
| 267 | + |
| 268 | +Furthermore, because the left-hand side of a tilde-statement must be a valid variable name, we cannot use destructuring syntax on the left-hand side of `to_submodel`, even if the return value is a NamedTuple. |
| 269 | +Thus, for example, the following is not allowed: |
| 270 | + |
| 271 | +```julia |
| 272 | +(; c0, c1, mu) ~ to_submodel(priors(x)) |
| 273 | +``` |
| 274 | + |
| 275 | +To use destructuring syntax, you would have to add a separate line: |
| 276 | + |
| 277 | +```julia |
| 278 | +ps = to_submodel(priors(x)) |
| 279 | +(; c0, c1, mu) = ps |
| 280 | +``` |
| 281 | + |
| 282 | +## Submodels versus distributions |
| 283 | + |
| 284 | +Finally, we end with a discussion of why some of the behaviour for submodels above has come about. |
| 285 | +This is slightly more behind-the-scenes and therefore will likely be of most interest to Turing developers. |
| 286 | + |
| 287 | +Fundamentally, submodels are to be compared against distributions: both of them can appear on the right-hand side of a tilde statement. |
| 288 | +However, distributions only have one 'output', i.e., the value that is sampled from them: |
| 289 | + |
| 290 | +```{julia} |
| 291 | +dist = Normal() |
| 292 | +rand(dist) |
| 293 | +``` |
| 294 | + |
| 295 | +Another point to bear in mind is that, given a sample from `dist`, asking for its log-probability is a meaningful calculation. |
| 296 | + |
| 297 | +```{julia} |
| 298 | +logpdf(dist, rand(dist)) |
| 299 | +``` |
| 300 | + |
| 301 | +In contrast, models (and hence submodels) have two different outputs: the latent variables, and the return value. |
| 302 | +These are accessed respectively using `rand(model)` and `model()`: |
| 303 | + |
| 304 | +```{julia} |
| 305 | +@model function f() |
| 306 | + a ~ Normal() |
| 307 | + return "hello, world." |
| 308 | +end |
| 309 | +
|
| 310 | +model = f() |
| 311 | +``` |
| 312 | + |
| 313 | +```{julia} |
| 314 | +# Latent variables |
| 315 | +rand(model) |
| 316 | +``` |
| 317 | + |
| 318 | +```{julia} |
| 319 | +# Return value |
| 320 | +model() |
| 321 | +``` |
| 322 | + |
| 323 | +Just like for distributions, one can indeed ask for the log-probability of the latent variables (although we have to specify whether we want the joint, likelihood, or prior): |
| 324 | + |
| 325 | +```{julia} |
| 326 | +logjoint(model, rand(model)) |
| 327 | +``` |
| 328 | + |
| 329 | +But it does not make sense to ask for the log-probability of the return value (which in this case is a string, and in general, could be literally any object). |
| 330 | + |
| 331 | +The fact that we have what looks like a unified notation for these is a bit of a lie, since it hides this distinction. |
| 332 | +In particular, for `x ~ distr`, `x` is assigned the value of `rand(distr)`; but for `y ~ submodel`, `y` is assigned the value of `submodel()`. |
| 333 | +This is why, for example, it is impossible to condition on `y` in `y ~ ...`; we can only condition on `x` in `x ~ dist`. |
| 334 | + |
| 335 | +Eventually we would like to make this more logically consistent. |
| 336 | +In particular, it is clear that `y ~ submodel` should return not one but two objects: the latent variables and the return value. |
| 337 | +Furthermore, it should be possible to condition on the latent variables, but not on the return value. |
| 338 | +See [this issue](https://github.com/TuringLang/Turing.jl/issues/2485) for an ongoing discussion of the best way to accomplish this. |
| 339 | + |
| 340 | +It should be mentioned that extracting the latent variables from a submodel is not entirely trivial since the submodel is run using the same `VarInfo` as the parent model (i.e., we would have to do a before-and-after comparison to see which _new_ variables were added by the submodel). |
| 341 | + |
| 342 | +Also, we are still working out the exact data structure that should be used to represent the latent variables. |
| 343 | +In the examples above `rand(model)` returns a `NamedTuple`, but this actually causes loss of information because the keys of a `NamedTuple` are `Symbol`s, whereas we really want to use `VarName`s. |
| 344 | +See [this issue](https://github.com/TuringLang/DynamicPPL.jl/issues/900) for a current proposal. |
0 commit comments