1
1
## ` AbstractProbabilisticProgram ` interface
2
2
3
- There are at least two incompatible conventions used for the term “model”: in Turing.jl, it is an
4
- instantiated “conditional distribution” object with fixed values for parameters and observations,
5
- while in Soss.jl, it is the raw symbolic structure from which distributions can be derived.
3
+ There are at least two somewhat incompatible conventions used for the term “model”. None of this is
4
+ particularly exact, but:
5
+
6
+ - In Turing.jl, if you write down a ` @model ` function and call it on arguments, you get a model
7
+ object paired with (a possibly empty set of) observations. This can be treated as instantiated
8
+ “conditioned” object with fixed values for parameters and observations.
9
+ - In Soss.jl, “model” is used for a symbolic “generative” object from which concrete functions, such as
10
+ densities and sampling functions, can be derived, _ and_ which you can later condition on (and in
11
+ turn get a conditional density etc.).
6
12
7
13
Relevant discussions:
8
- [ 1] ( https://julialang.zulipchat.com/#narrow/stream/234072-probprog/topic/Naming.20the.20.22likelihood.22.20thingy ) , [ 2] ( https://github.com/TuringLang/AbstractPPL.jl/discussions/10 ) .
14
+ [ 1] ( https://julialang.zulipchat.com/#narrow/stream/234072-probprog/topic/Naming.20the.20.22likelihood.22.20thingy ) ,
15
+ [ 2] ( https://github.com/TuringLang/AbstractPPL.jl/discussions/10 ) .
9
16
10
17
11
- ### Traces & probability expressions
18
+ ### TL/DR:
12
19
13
- Models are always, at least in a theoretical sense, distributions over * traces* – types which carry
14
- collections of values together with their names. Existing realizations of these are ` VarInfo ` in
15
- Turing.jl, choice maps in Gen.jl, and the usage of named tuples in Soss.jl.
16
20
17
- Traces solve the problem of having to name random variables in function calls, and in samples from
18
- models. In essence, every concrete trace type will just be a fancy kind of dictionary from variable
19
- names (ideally, ` VarName ` s) to values.
21
+ There are three interrelating aspects that this interface intends to standardize:
20
22
21
- ``` julia
22
- t = @T (Y[ 1 ] = ... , Z = ... )
23
- ```
23
+ - Density calculation
24
+ - Sampling
25
+ - “Conversions” between different conditionings of models
24
26
25
- Note that this needs to be a macro, if written this way, since the keys may themselves be more
26
- complex than just symbols (e.g., indexed variables.) (Don’t hang yourselves up on that ` @T ` name
27
- though, this is just a working draft.)
27
+ Therefore, the interface consists of:
28
28
29
- The idea here is to standardize the construction (and manipulation) of * abstract probability
30
- expressions* , plus the interface for turning them into concrete traces for a specific model – like
31
- [ ` @formula ` ] ( https://juliastats.org/StatsModels.jl/stable/formula/#Modeling-tabular-data ) and
32
- [ ` apply_schema ` ] ( https://juliastats.org/StatsModels.jl/stable/internals/#Semantics-time-(apply_schema) )
33
- from StatsModels.jl are doing.
29
+ - ` condition(::Model, ::Trace) -> ConditionedModel `
30
+ - ` decondition(::ConditionedModel) -> GenerativeModel `
31
+ - ` sample(::Model, ::Sampler = Exact(), [Int]) ` (from ` AbstractMCMC.sample ` )
32
+ - ` logdensity(::Model, ::Trace) `
34
33
35
- Maybe the following would suffice to do that:
36
34
37
- ``` julia
38
- maketrace (m, t):: tracetype (m, t)
39
- ```
35
+ ### Traces & probability expressions
40
36
41
- where ` maketrace ` produces a concrete trace corresponding to ` t ` for the model ` m ` , and ` tracetype `
42
- is the corresponding ` eltype ` –like function giving you the concrete trace type for a certain model
43
- and probability expression combination.
37
+ First, an infrastructural requirement which we will need below to write things out.
44
38
45
- Possible extensions of this idea:
39
+ The kinds of models we consider are, at least in a theoretical sense, distributions over * traces* –
40
+ types which carry collections of values together with their names. Existing realizations of these
41
+ are ` VarInfo ` in Turing.jl, choice maps in Gen.jl, and the usage of named tuples in Soss.jl.
46
42
47
- - Pearl-style do-notation: ` @T(Y = y | do(X = x)) `
48
- - Allowing free variables, to specify model transformations: ` query(m, @T(X | Y)) `
49
- - “Graph queries”: ` @T(X | Parents(X)) ` , ` @T(Y | Not(X)) ` (a nice way to express Gibbs conditionals!)
50
- - Predicate style for “measure queries”: ` @T(X < Y + Z) `
43
+ Traces solve the problem of having to name random variables in function calls, and in samples from
44
+ models. In essence, every concrete trace type will just be a fancy kind of dictionary from variable
45
+ names (ideally, ` VarName ` s) to values.
51
46
52
- The latter applications are the reason I originally liked the idea of the macro being called ` @P `
53
- (or even ` @𝓅 ` or ` @ℙ ` ), since then it would look like a “Bayesian probability expression”: `@P(X <
54
- Y + Z)`. But this would not be so meaningful in the case of representing a trace instance.
47
+ Since we have to use this kind of mapping a lot in the specification of the interface, let’s for now
48
+ just choose some arbitrary macro-like syntax like the following:
55
49
56
- Perhaps both ` @T ` and ` @P ` can coexist, and both produce different kinds of ` ProbabilityExpression `
57
- objects?
50
+ ``` julia
51
+ @T (Y[1 ] = … , Z = … )
52
+ ```
58
53
59
- NB: the exact details of this kind of “schema application”, and what results from it, will need to
60
- be specified in the interface of ` AbstractModelTrace ` , aka “the new ` VarInfo ` ”.
54
+ Some more ideas for this kind of object can be found at the end.
61
55
62
56
63
57
### “Conversions”
64
58
65
59
The purpose of this part is to provide common names for how we want a model instance to be
66
- understood. In some modelling languages, model instances are primarily generative or “joint”, with
67
- some parameters fixed (e.g. in Soss.jl), while other instance types pair model instances conditioned
68
- on observations (e.g. Turing.jl’s models).
60
+ understood. As we have seen, in some modelling languages, model instances are primarily generative,
61
+ with some parameters fixed, while other instance types pair model instances conditioned on
62
+ observations. What I call “conversions” here is just an interface to transform between these two
63
+ views and unify the involved objects under one language.
69
64
70
- Let’s start from a generative model:
65
+ Let’s start from a generative model with parameter ` μ ` :
71
66
72
67
``` julia
73
68
# (hypothetical) generative spec a la Soss
74
- @generativemodel function foo_gen (μ)
69
+ @generative_model function foo_gen (μ)
75
70
X ~ Normal (0 , μ)
76
71
Y[1 ] ~ Normal (X)
77
72
Y[2 ] ~ Normal (X + 1 )
78
73
end
79
74
```
80
75
81
- Applying the “constructor” ` foo_gen ` now means to fix the parameters , and should return a concrete
82
- object of the generative type (a ` JointDistribution ` in Soss.jl) :
76
+ Applying the “constructor” ` foo_gen ` now means to fix the parameter , and should return a concrete
77
+ object of the generative type:
83
78
84
79
``` julia
85
- g = foo_gen (μ= … ):: GenerativeModel
80
+ g = foo_gen (μ= … ):: SomeGenerativeModel
86
81
```
87
82
88
83
With this kind of object, we should be able to sample and calculate joint log-densities from, i.e.,
89
- over the combined trace space of ` X ` , ` Y[1] ` , and ` Y[2] ` .
84
+ over the combined trace space of ` X ` , ` Y[1] ` , and ` Y[2] ` – either directly, or by deriving the
85
+ respective functions (e.g., by converting form a symbolic representation).
90
86
91
87
For model types that contain enough structural information, it should then be possible to condition
92
88
on observed values and obtain a conditioned model:
93
89
94
90
``` julia
95
- condition (g, @T (Y = ... )):: ConditionedModel
91
+ condition (g, @T (Y = … )):: SomeConditionedModel
96
92
```
97
93
98
94
For this operation, there will probably exist syntactic sugar in the form of
99
95
100
96
``` julia
101
- g | @T (Y = ... )
97
+ g | @T (Y = … )
102
98
```
103
99
104
100
Now, if we start from a Turing.jl-like model instead, with the “observation part” already specified,
105
- we have a situation like this, with the observation fixed in the instantiation:
101
+ we have a situation like this, with the observations ` Y ` fixed in the instantiation:
106
102
107
103
``` julia
108
104
# conditioned spec a la DPPL
@@ -112,16 +108,18 @@ we have a situation like this, with the observation fixed in the instantiation:
112
108
Y[2 ] ~ Normal (X + 1 )
113
109
end
114
110
115
- m = foo (Y= … , μ= … ):: ConditionedModel
111
+ m = foo (Y= … , μ= … ):: SomeConditionedModel
116
112
```
117
113
118
- From this we can, if supported, go back to the generative form via ` decondition ` , and back via ` condition ` :
114
+ From this we can, if supported, go back to the generative form via ` decondition ` , and back via
115
+ ` condition ` :
119
116
120
117
``` julia
121
- decondition (m) == g:: GenerativeModel
122
- m == condition (g, @T (Y = ... ))
118
+ decondition (m) == g:: SomeGenerativeModel
119
+ m == condition (g, @T (Y = … ))
123
120
```
124
121
122
+ (with equality in distribution).
125
123
126
124
In the case of Turing.jl, the object ` m ` would at the same time contain the information about the
127
125
generative and posterior distribution ` condition ` and ` decondition ` can simply return different
@@ -132,24 +130,40 @@ Soss.jl pretty much already works like the examples above, with one model object
132
130
133
131
A hypothetical ` DensityModel ` , or something like the types from LogDensityProblems.jl, would be a
134
132
case for a model type that does not support the structural operations ` condition ` and
135
- ` decondition ` .
133
+ ` decondition ` .
134
+
135
+ The invariances between these operations should follow normal rules of probability theory. Not all
136
+ methods or directions need to be supported for every modelling language; in this case, a
137
+ ` MethodError ` or some other runtime error should be raised.
138
+
139
+ There is no strict requirement for generative models and conditioned models to have different types
140
+ or be tagged with variable names etc. This is a choice to be made by the concrete implementation.
141
+
142
+ Decomposing models into prior and observation distributions is not yet specified; the former is
143
+ rather easy, since it is only a marginal of the generative distribution, while the latter requires
144
+ more structural information. Perhaps both can be generalized under the ` query ` function I discuss
145
+ at the end.
136
146
137
147
138
148
### Sampling
139
149
140
- For sampling, model instances are assumed to implement the ` AbstractMCMC ` interface – i.e., at least
141
- [ ` step ` ] ( https://github.com/TuringLang/AbstractMCMC.jl#sampling-step ) , and accordingly ` sample ` ,
142
- ` steps ` , ` Samples ` . The most important aspect is ` sample ` , though, which plays the role of ` rand `
143
- for distributions.
150
+ Sampling in this case refers to producing values from the distribution specified in a model
151
+ instance, either following the distribution exactly, or approximating it through a Monte Carlo
152
+ algorithm.
153
+
154
+ All sampleable model instances are assumed to implement the ` AbstractMCMC ` interface – i.e., at
155
+ least [ ` step ` ] ( https://github.com/TuringLang/AbstractMCMC.jl#sampling-step ) , and accordingly
156
+ ` sample ` , ` steps ` , ` Samples ` . The most important aspect is ` sample ` , though, which plays the role
157
+ of ` rand ` for distributions.
144
158
145
159
The results of ` sample ` generalize ` rand ` – while ` rand(d, N) ` is assumed to give you iid samples,
146
- ` sample(m, sampler, N) ` returns a sample from a (Markov) chain of length ` N ` approximating ` m ` ’s
147
- distribution by a specific sampling algorithm (which of course subsumes the case that ` m ` can be
148
- sampled from exactly, in which case the “chain” actually is iid).
160
+ ` sample(m, sampler, N) ` returns a sample from a sequence (known as chain in the case of MCMC) of
161
+ length ` N ` approximating ` m ` ’s distribution by a specific sampling algorithm (which of course
162
+ subsumes the case that ` m ` can be sampled from exactly, in which case the “chain” actually is iid).
149
163
150
164
Depending on which kind of sampling is supported, several methods may be supported. In the case of
151
- a (posterior) ` ConditionedModel ` with no known exact sampling possible , we just have what is given
152
- through ` AbstractMCMC ` :
165
+ a (posterior) conditioned model with no known sampling procedure , we just have what is given through
166
+ ` AbstractMCMC ` :
153
167
154
168
``` julia
155
169
sample ([rng], m, N, sampler; [args… ]) # chain of length N using `sampler`
@@ -163,15 +177,16 @@ sample([rng], m; [args…]) # one random sample
163
177
sample ([rng], m, N; [args… ]) # N iid samples; equivalent to `rand` in certain cases
164
178
```
165
179
166
- It should be possible to implement this by a special sampler ` Exact ` (name still to be discussed),
167
- that can then also be reused for generative sampling:
180
+ It should be possible to implement this by a special sampler, say, ` Exact ` (name still to be
181
+ discussed), that can then also be reused for generative sampling:
168
182
169
183
```
170
184
step(g, spl = Exact(), state = nothing) # IID sample from exact distribution with trivial state
171
185
sample(g, Exact(), [N])
172
186
```
173
187
174
- with dispatch failing for models types for which exact sampling is not possible (or implemented).
188
+ with dispatch failing for models types for which exact sampling is not possible (or not
189
+ implemented).
175
190
176
191
This could even be useful for Monte Carlo methods not being based on Markov Chains, e.g.,
177
192
particle-based sampling using a return type with weights, or rejection sampling.
@@ -185,10 +200,14 @@ model.
185
200
186
201
### Density Calculation
187
202
188
- Since the different “contexts” of how a model is to be understood are to be expressed in the type,
189
- there should be no need for separate functions ` logjoint ` , ` loglikelihood ` , etc., but one
190
- ` logdensity ` suffice for all. Note that this generalizes ` logpdf ` , too, since the posterior density
191
- will of course in general be unnormalized.
203
+ Since the different “versions” of how a model is to be understood as generative or conditioned are
204
+ to be expressed in the type or dispatch they support, there should be no need for separate functions
205
+ ` logjoint ` , ` loglikelihood ` , etc., which force these semantic distinctions on the implementor; one
206
+ ` logdensity ` should suffice for all, with the distinction being made by the capabilities of the
207
+ concrete model instance.
208
+
209
+ Note that this generalizes ` logpdf ` , too, since the posterior density will of course in general be
210
+ unnormalized and hence not a probability density.
192
211
193
212
The evaluation will usually work with the internal, concrete trace type, like ` VarInfo ` in Turing.jl:
194
213
@@ -202,7 +221,12 @@ But the user will more likely work on the interface using probability expression
202
221
logdensity (m, @T (X = ... ))
203
222
```
204
223
205
- (Note that this could replace the current ` prob ` string macro in Turing.jl.)
224
+ (Note that this would replace the current ` prob ` string macro in Turing.jl.)
225
+
226
+ Densities need not be normalized.
227
+
228
+
229
+ #### Implementation notes
206
230
207
231
It should be able to make this fall back on the internal method with the right definition and
208
232
implementation of ` maketrace ` :
@@ -212,7 +236,8 @@ logdensity(m, t::ProbabilityExpression) = logdensity(m, maketrace(m, t))
212
236
```
213
237
214
238
There is one open question – should normalized and unnormalized densities be able to be
215
- distinguished? This could be done by dispatch as well, e.g., if the caller wants to make sure normalization:
239
+ distinguished? This could be done by dispatch as well, e.g., if the caller wants to make sure
240
+ normalization:
216
241
217
242
```
218
243
logdensity(g, @T(X = ..., Y = ..., Z = ...); normalized=Val{true})
@@ -222,16 +247,42 @@ Although there is proably a better way through traits; maybe like for arrays, wi
222
247
` NormalizationStyle(g, t) = IsNormalized() ` ?
223
248
224
249
225
- ## TL/DR:
250
+ ## More on probability expressions
226
251
227
- - Probability expressions: ` @T ` and ` maketrace `
228
- - ` condition(::Model, ::Trace) -> ConditionedModel `
229
- - ` decondition(::ConditionedModel) -> GenerativeModel `
230
- - ` sample(::Model, ::Sampler = Exact(), [Int]) `
231
- - ` logdensity(::Model, ::Trace) `
252
+ Note that this needs to be a macro, if written this way, since the keys may themselves be more
253
+ complex than just symbols (e.g., indexed variables.) (Don’t hang yourselves up on that ` @T ` name
254
+ though, this is just a working draft.)
232
255
233
- Decomposing models into prior and observation distributions is not yet specified; the former is
234
- rather easy, since it is only a marginal of the generative distribution, while the latter requires
235
- more structural information. Perhaps both can be generalized under the ` query ` function I have
236
- hinted to above.
256
+ The idea here is to standardize the construction (and manipulation) of * abstract probability
257
+ expressions* , plus the interface for turning them into concrete traces for a specific model – like
258
+ [ ` @formula ` ] ( https://juliastats.org/StatsModels.jl/stable/formula/#Modeling-tabular-data ) and
259
+ [ ` apply_schema ` ] ( https://juliastats.org/StatsModels.jl/stable/internals/#Semantics-time-(apply_schema) )
260
+ from StatsModels.jl are doing.
261
+
262
+ Maybe the following would suffice to do that:
263
+
264
+ ``` julia
265
+ maketrace (m, t):: tracetype (m, t)
266
+ ```
267
+
268
+ where ` maketrace ` produces a concrete trace corresponding to ` t ` for the model ` m ` , and ` tracetype `
269
+ is the corresponding ` eltype ` –like function giving you the concrete trace type for a certain model
270
+ and probability expression combination.
271
+
272
+ Possible extensions of this idea:
273
+
274
+ - Pearl-style do-notation: ` @T(Y = y | do(X = x)) `
275
+ - Allowing free variables, to specify model transformations: ` query(m, @T(X | Y)) `
276
+ - “Graph queries”: ` @T(X | Parents(X)) ` , ` @T(Y | Not(X)) ` (a nice way to express Gibbs conditionals!)
277
+ - Predicate style for “measure queries”: ` @T(X < Y + Z) `
278
+
279
+ The latter applications are the reason I originally liked the idea of the macro being called ` @P `
280
+ (or even ` @𝓅 ` or ` @ℙ ` ), since then it would look like a “Bayesian probability expression”: `@P(X <
281
+ Y + Z)`. But this would not be so meaningful in the case of representing a trace instance.
282
+
283
+ Perhaps both ` @T ` and ` @P ` can coexist, and both produce different kinds of ` ProbabilityExpression `
284
+ objects?
285
+
286
+ NB: the exact details of this kind of “schema application”, and what results from it, will need to
287
+ be specified in the interface of ` AbstractModelTrace ` , aka “the new ` VarInfo ` ”.
237
288
0 commit comments