Skip to content

Commit 926c404

Browse files
committed
Update interface description
1 parent 6cf3dee commit 926c404

File tree

1 file changed

+137
-86
lines changed

1 file changed

+137
-86
lines changed

interface.md

Lines changed: 137 additions & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -1,108 +1,104 @@
11
## `AbstractProbabilisticProgram` interface
22

3-
There are at least two incompatible conventions used for the term “model”: in Turing.jl, it is an
4-
instantiated “conditional distribution” object with fixed values for parameters and observations,
5-
while in Soss.jl, it is the raw symbolic structure from which distributions can be derived.
3+
There are at least two somewhat incompatible conventions used for the term “model”. None of this is
4+
particularly exact, but:
5+
6+
- In Turing.jl, if you write down a `@model` function and call it on arguments, you get a model
7+
object paired with (a possibly empty set of) observations. This can be treated as instantiated
8+
“conditioned” object with fixed values for parameters and observations.
9+
- In Soss.jl, “model” is used for a symbolic “generative” object from which concrete functions, such as
10+
densities and sampling functions, can be derived, _and_ which you can later condition on (and in
11+
turn get a conditional density etc.).
612

713
Relevant discussions:
8-
[1](https://julialang.zulipchat.com/#narrow/stream/234072-probprog/topic/Naming.20the.20.22likelihood.22.20thingy), [2](https://github.com/TuringLang/AbstractPPL.jl/discussions/10).
14+
[1](https://julialang.zulipchat.com/#narrow/stream/234072-probprog/topic/Naming.20the.20.22likelihood.22.20thingy),
15+
[2](https://github.com/TuringLang/AbstractPPL.jl/discussions/10).
916

1017

11-
### Traces & probability expressions
18+
### TL/DR:
1219

13-
Models are always, at least in a theoretical sense, distributions over *traces* – types which carry
14-
collections of values together with their names. Existing realizations of these are `VarInfo` in
15-
Turing.jl, choice maps in Gen.jl, and the usage of named tuples in Soss.jl.
1620

17-
Traces solve the problem of having to name random variables in function calls, and in samples from
18-
models. In essence, every concrete trace type will just be a fancy kind of dictionary from variable
19-
names (ideally, `VarName`s) to values.
21+
There are three interrelating aspects that this interface intends to standardize:
2022

21-
```julia
22-
t = @T(Y[1] = ..., Z = ...)
23-
```
23+
- Density calculation
24+
- Sampling
25+
- “Conversions” between different conditionings of models
2426

25-
Note that this needs to be a macro, if written this way, since the keys may themselves be more
26-
complex than just symbols (e.g., indexed variables.) (Don’t hang yourselves up on that `@T` name
27-
though, this is just a working draft.)
27+
Therefore, the interface consists of:
2828

29-
The idea here is to standardize the construction (and manipulation) of *abstract probability
30-
expressions*, plus the interface for turning them into concrete traces for a specific model – like
31-
[`@formula`](https://juliastats.org/StatsModels.jl/stable/formula/#Modeling-tabular-data) and
32-
[`apply_schema`](https://juliastats.org/StatsModels.jl/stable/internals/#Semantics-time-(apply_schema))
33-
from StatsModels.jl are doing.
29+
- `condition(::Model, ::Trace) -> ConditionedModel`
30+
- `decondition(::ConditionedModel) -> GenerativeModel`
31+
- `sample(::Model, ::Sampler = Exact(), [Int])` (from `AbstractMCMC.sample`)
32+
- `logdensity(::Model, ::Trace)`
3433

35-
Maybe the following would suffice to do that:
3634

37-
```julia
38-
maketrace(m, t)::tracetype(m, t)
39-
```
35+
### Traces & probability expressions
4036

41-
where `maketrace` produces a concrete trace corresponding to `t` for the model `m`, and `tracetype`
42-
is the corresponding `eltype`–like function giving you the concrete trace type for a certain model
43-
and probability expression combination.
37+
First, an infrastructural requirement which we will need below to write things out.
4438

45-
Possible extensions of this idea:
39+
The kinds of models we consider are, at least in a theoretical sense, distributions over *traces*
40+
types which carry collections of values together with their names. Existing realizations of these
41+
are `VarInfo` in Turing.jl, choice maps in Gen.jl, and the usage of named tuples in Soss.jl.
4642

47-
- Pearl-style do-notation: `@T(Y = y | do(X = x))`
48-
- Allowing free variables, to specify model transformations: `query(m, @T(X | Y))`
49-
- “Graph queries”: `@T(X | Parents(X))`, `@T(Y | Not(X))` (a nice way to express Gibbs conditionals!)
50-
- Predicate style for “measure queries”: `@T(X < Y + Z)`
43+
Traces solve the problem of having to name random variables in function calls, and in samples from
44+
models. In essence, every concrete trace type will just be a fancy kind of dictionary from variable
45+
names (ideally, `VarName`s) to values.
5146

52-
The latter applications are the reason I originally liked the idea of the macro being called `@P`
53-
(or even `@𝓅` or `@ℙ`), since then it would look like a “Bayesian probability expression”: `@P(X <
54-
Y + Z)`. But this would not be so meaningful in the case of representing a trace instance.
47+
Since we have to use this kind of mapping a lot in the specification of the interface, let’s for now
48+
just choose some arbitrary macro-like syntax like the following:
5549

56-
Perhaps both `@T` and `@P` can coexist, and both produce different kinds of `ProbabilityExpression`
57-
objects?
50+
```julia
51+
@T(Y[1] = , Z = )
52+
```
5853

59-
NB: the exact details of this kind of “schema application”, and what results from it, will need to
60-
be specified in the interface of `AbstractModelTrace`, aka “the new `VarInfo`”.
54+
Some more ideas for this kind of object can be found at the end.
6155

6256

6357
### “Conversions”
6458

6559
The purpose of this part is to provide common names for how we want a model instance to be
66-
understood. In some modelling languages, model instances are primarily generative or “joint”, with
67-
some parameters fixed (e.g. in Soss.jl), while other instance types pair model instances conditioned
68-
on observations (e.g. Turing.jl’s models).
60+
understood. As we have seen, in some modelling languages, model instances are primarily generative,
61+
with some parameters fixed, while other instance types pair model instances conditioned on
62+
observations. What I call “conversions” here is just an interface to transform between these two
63+
views and unify the involved objects under one language.
6964

70-
Let’s start from a generative model:
65+
Let’s start from a generative model with parameter `μ`:
7166

7267
```julia
7368
# (hypothetical) generative spec a la Soss
74-
@generativemodel function foo_gen(μ)
69+
@generative_model function foo_gen(μ)
7570
X ~ Normal(0, μ)
7671
Y[1] ~ Normal(X)
7772
Y[2] ~ Normal(X + 1)
7873
end
7974
```
8075

81-
Applying the “constructor” `foo_gen` now means to fix the parameters, and should return a concrete
82-
object of the generative type (a `JointDistribution` in Soss.jl):
76+
Applying the “constructor” `foo_gen` now means to fix the parameter, and should return a concrete
77+
object of the generative type:
8378

8479
```julia
85-
g = foo_gen=)::GenerativeModel
80+
g = foo_gen=)::SomeGenerativeModel
8681
```
8782

8883
With this kind of object, we should be able to sample and calculate joint log-densities from, i.e.,
89-
over the combined trace space of `X`, `Y[1]`, and `Y[2]`.
84+
over the combined trace space of `X`, `Y[1]`, and `Y[2]` – either directly, or by deriving the
85+
respective functions (e.g., by converting form a symbolic representation).
9086

9187
For model types that contain enough structural information, it should then be possible to condition
9288
on observed values and obtain a conditioned model:
9389

9490
```julia
95-
condition(g, @T(Y = ...))::ConditionedModel
91+
condition(g, @T(Y = ))::SomeConditionedModel
9692
```
9793

9894
For this operation, there will probably exist syntactic sugar in the form of
9995

10096
```julia
101-
g | @T(Y = ...)
97+
g | @T(Y = )
10298
```
10399

104100
Now, if we start from a Turing.jl-like model instead, with the “observation part” already specified,
105-
we have a situation like this, with the observation fixed in the instantiation:
101+
we have a situation like this, with the observations `Y` fixed in the instantiation:
106102

107103
```julia
108104
# conditioned spec a la DPPL
@@ -112,16 +108,18 @@ we have a situation like this, with the observation fixed in the instantiation:
112108
Y[2] ~ Normal(X + 1)
113109
end
114110

115-
m = foo(Y=, μ=)::ConditionedModel
111+
m = foo(Y=, μ=)::SomeConditionedModel
116112
```
117113

118-
From this we can, if supported, go back to the generative form via `decondition`, and back via `condition`:
114+
From this we can, if supported, go back to the generative form via `decondition`, and back via
115+
`condition`:
119116

120117
```julia
121-
decondition(m) == g::GenerativeModel
122-
m == condition(g, @T(Y = ...))
118+
decondition(m) == g::SomeGenerativeModel
119+
m == condition(g, @T(Y = ))
123120
```
124121

122+
(with equality in distribution).
125123

126124
In the case of Turing.jl, the object `m` would at the same time contain the information about the
127125
generative and posterior distribution `condition` and `decondition` can simply return different
@@ -132,24 +130,40 @@ Soss.jl pretty much already works like the examples above, with one model object
132130

133131
A hypothetical `DensityModel`, or something like the types from LogDensityProblems.jl, would be a
134132
case for a model type that does not support the structural operations `condition` and
135-
`decondition`.
133+
`decondition`.
134+
135+
The invariances between these operations should follow normal rules of probability theory. Not all
136+
methods or directions need to be supported for every modelling language; in this case, a
137+
`MethodError` or some other runtime error should be raised.
138+
139+
There is no strict requirement for generative models and conditioned models to have different types
140+
or be tagged with variable names etc. This is a choice to be made by the concrete implementation.
141+
142+
Decomposing models into prior and observation distributions is not yet specified; the former is
143+
rather easy, since it is only a marginal of the generative distribution, while the latter requires
144+
more structural information. Perhaps both can be generalized under the `query` function I discuss
145+
at the end.
136146

137147

138148
### Sampling
139149

140-
For sampling, model instances are assumed to implement the `AbstractMCMC` interface – i.e., at least
141-
[`step`](https://github.com/TuringLang/AbstractMCMC.jl#sampling-step), and accordingly `sample`,
142-
`steps`, `Samples`. The most important aspect is `sample`, though, which plays the role of `rand`
143-
for distributions.
150+
Sampling in this case refers to producing values from the distribution specified in a model
151+
instance, either following the distribution exactly, or approximating it through a Monte Carlo
152+
algorithm.
153+
154+
All sampleable model instances are assumed to implement the `AbstractMCMC` interface – i.e., at
155+
least [`step`](https://github.com/TuringLang/AbstractMCMC.jl#sampling-step), and accordingly
156+
`sample`, `steps`, `Samples`. The most important aspect is `sample`, though, which plays the role
157+
of `rand` for distributions.
144158

145159
The results of `sample` generalize `rand` – while `rand(d, N)` is assumed to give you iid samples,
146-
`sample(m, sampler, N)` returns a sample from a (Markov) chain of length `N` approximating `m`’s
147-
distribution by a specific sampling algorithm (which of course subsumes the case that `m` can be
148-
sampled from exactly, in which case the “chain” actually is iid).
160+
`sample(m, sampler, N)` returns a sample from a sequence (known as chain in the case of MCMC) of
161+
length `N` approximating `m`’s distribution by a specific sampling algorithm (which of course
162+
subsumes the case that `m` can be sampled from exactly, in which case the “chain” actually is iid).
149163

150164
Depending on which kind of sampling is supported, several methods may be supported. In the case of
151-
a (posterior) `ConditionedModel` with no known exact sampling possible, we just have what is given
152-
through `AbstractMCMC`:
165+
a (posterior) conditioned model with no known sampling procedure, we just have what is given through
166+
`AbstractMCMC`:
153167

154168
```julia
155169
sample([rng], m, N, sampler; [args]) # chain of length N using `sampler`
@@ -163,15 +177,16 @@ sample([rng], m; [args…]) # one random sample
163177
sample([rng], m, N; [args]) # N iid samples; equivalent to `rand` in certain cases
164178
```
165179

166-
It should be possible to implement this by a special sampler `Exact` (name still to be discussed),
167-
that can then also be reused for generative sampling:
180+
It should be possible to implement this by a special sampler, say, `Exact` (name still to be
181+
discussed), that can then also be reused for generative sampling:
168182

169183
```
170184
step(g, spl = Exact(), state = nothing) # IID sample from exact distribution with trivial state
171185
sample(g, Exact(), [N])
172186
```
173187

174-
with dispatch failing for models types for which exact sampling is not possible (or implemented).
188+
with dispatch failing for models types for which exact sampling is not possible (or not
189+
implemented).
175190

176191
This could even be useful for Monte Carlo methods not being based on Markov Chains, e.g.,
177192
particle-based sampling using a return type with weights, or rejection sampling.
@@ -185,10 +200,14 @@ model.
185200

186201
### Density Calculation
187202

188-
Since the different “contexts” of how a model is to be understood are to be expressed in the type,
189-
there should be no need for separate functions `logjoint`, `loglikelihood`, etc., but one
190-
`logdensity` suffice for all. Note that this generalizes `logpdf`, too, since the posterior density
191-
will of course in general be unnormalized.
203+
Since the different “versions” of how a model is to be understood as generative or conditioned are
204+
to be expressed in the type or dispatch they support, there should be no need for separate functions
205+
`logjoint`, `loglikelihood`, etc., which force these semantic distinctions on the implementor; one
206+
`logdensity` should suffice for all, with the distinction being made by the capabilities of the
207+
concrete model instance.
208+
209+
Note that this generalizes `logpdf`, too, since the posterior density will of course in general be
210+
unnormalized and hence not a probability density.
192211

193212
The evaluation will usually work with the internal, concrete trace type, like `VarInfo` in Turing.jl:
194213

@@ -202,7 +221,12 @@ But the user will more likely work on the interface using probability expression
202221
logdensity(m, @T(X = ...))
203222
```
204223

205-
(Note that this could replace the current `prob` string macro in Turing.jl.)
224+
(Note that this would replace the current `prob` string macro in Turing.jl.)
225+
226+
Densities need not be normalized.
227+
228+
229+
#### Implementation notes
206230

207231
It should be able to make this fall back on the internal method with the right definition and
208232
implementation of `maketrace`:
@@ -212,7 +236,8 @@ logdensity(m, t::ProbabilityExpression) = logdensity(m, maketrace(m, t))
212236
```
213237

214238
There is one open question – should normalized and unnormalized densities be able to be
215-
distinguished? This could be done by dispatch as well, e.g., if the caller wants to make sure normalization:
239+
distinguished? This could be done by dispatch as well, e.g., if the caller wants to make sure
240+
normalization:
216241

217242
```
218243
logdensity(g, @T(X = ..., Y = ..., Z = ...); normalized=Val{true})
@@ -222,16 +247,42 @@ Although there is proably a better way through traits; maybe like for arrays, wi
222247
`NormalizationStyle(g, t) = IsNormalized()`?
223248

224249

225-
## TL/DR:
250+
## More on probability expressions
226251

227-
- Probability expressions: `@T` and `maketrace`
228-
- `condition(::Model, ::Trace) -> ConditionedModel`
229-
- `decondition(::ConditionedModel) -> GenerativeModel`
230-
- `sample(::Model, ::Sampler = Exact(), [Int])`
231-
- `logdensity(::Model, ::Trace)`
252+
Note that this needs to be a macro, if written this way, since the keys may themselves be more
253+
complex than just symbols (e.g., indexed variables.) (Don’t hang yourselves up on that `@T` name
254+
though, this is just a working draft.)
232255

233-
Decomposing models into prior and observation distributions is not yet specified; the former is
234-
rather easy, since it is only a marginal of the generative distribution, while the latter requires
235-
more structural information. Perhaps both can be generalized under the `query` function I have
236-
hinted to above.
256+
The idea here is to standardize the construction (and manipulation) of *abstract probability
257+
expressions*, plus the interface for turning them into concrete traces for a specific model – like
258+
[`@formula`](https://juliastats.org/StatsModels.jl/stable/formula/#Modeling-tabular-data) and
259+
[`apply_schema`](https://juliastats.org/StatsModels.jl/stable/internals/#Semantics-time-(apply_schema))
260+
from StatsModels.jl are doing.
261+
262+
Maybe the following would suffice to do that:
263+
264+
```julia
265+
maketrace(m, t)::tracetype(m, t)
266+
```
267+
268+
where `maketrace` produces a concrete trace corresponding to `t` for the model `m`, and `tracetype`
269+
is the corresponding `eltype`–like function giving you the concrete trace type for a certain model
270+
and probability expression combination.
271+
272+
Possible extensions of this idea:
273+
274+
- Pearl-style do-notation: `@T(Y = y | do(X = x))`
275+
- Allowing free variables, to specify model transformations: `query(m, @T(X | Y))`
276+
- “Graph queries”: `@T(X | Parents(X))`, `@T(Y | Not(X))` (a nice way to express Gibbs conditionals!)
277+
- Predicate style for “measure queries”: `@T(X < Y + Z)`
278+
279+
The latter applications are the reason I originally liked the idea of the macro being called `@P`
280+
(or even `@𝓅` or `@ℙ`), since then it would look like a “Bayesian probability expression”: `@P(X <
281+
Y + Z)`. But this would not be so meaningful in the case of representing a trace instance.
282+
283+
Perhaps both `@T` and `@P` can coexist, and both produce different kinds of `ProbabilityExpression`
284+
objects?
285+
286+
NB: the exact details of this kind of “schema application”, and what results from it, will need to
287+
be specified in the interface of `AbstractModelTrace`, aka “the new `VarInfo`”.
237288

0 commit comments

Comments
 (0)