Skip to content

Commit 64d2c22

Browse files
phipsgableryebai
andauthored
Attempt to define the VarInfo interface (#36)
* Add draft for varinfo interface description * Add some stuff * Update varinfo-interface.md Co-authored-by: Hong Ge <[email protected]> * Elaborate on model trace proposal * Update abstractmodeltrace-interface.md Co-authored-by: Hong Ge <[email protected]>
1 parent 4dde06e commit 64d2c22

File tree

1 file changed

+258
-0
lines changed

1 file changed

+258
-0
lines changed

abstractmodeltrace-interface.md

Lines changed: 258 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,258 @@
1+
# `AbstractModelTrace`/`VarInfo` interface proposal
2+
3+
## Background
4+
5+
### Why do we do this?
6+
7+
As I have said before:
8+
9+
> There are many aspects that make VarInfo a very complex data structure.
10+
11+
Currently, there is an insane amount of complexity and implementation details in `varinfo.jl`, which
12+
has been rewritten multiple times with different concerns in mind – most times to improve concrete
13+
needs of Turing.jl, such as type stability, or requirements of specific samplers.
14+
15+
This unfortunately makes `VarInfo` extremely opaque: it is hard to refactor without breaking
16+
anything (nobody really dares touching it), and a lot of knowledge about Turing.jl/DynamicPPL.jl
17+
internals is needed in order to judge the effects of changes.
18+
19+
### Design choices
20+
21+
Recently, @torfjelde [has shown](https://github.com/TuringLang/DynamicPPL.jl/pull/267/files) that a
22+
much simpler implementation is feasible – basically, just a wrapped `NamedTuple` with a minimal
23+
interface.
24+
25+
The purpose of this proposal is twofold: first, to think about what a sufficient interface for
26+
`AbstractModelTrace`, the abstract supertype of `VarInfo`, should be, to allow multiple specialized
27+
variants and refactor the existing ones (typed/untyped and simple). Second, to view the problem as
28+
the design of an abstract data type: the specification of construction and modification mechanisms
29+
for a dictionary-like structure.
30+
31+
Related previous discussions:
32+
33+
- [Discussion about `VarName`](https://github.com/TuringLang/AbstractPPL.jl/discussions/7)
34+
- [`AbstractVarInfo` representation](https://github.com/TuringLang/AbstractPPL.jl/discussions/5)
35+
36+
Additionally (but closely related), the second part tries to formalize the “subsumption” mechanism
37+
of `VarName`s, and its interaction with using `VarName`s as keys/indices.
38+
39+
Our discussions take place in what is a bit of a fuzzy zone between the part that is really
40+
“abstract”, and meant for the wider purpuse of AbstractPPL.jl – the implementation of probabilistic
41+
programming systems in general – and our concrete needs within DPPL. I hope to always stay abstract
42+
and reusable; and there are already a couple of candidates for APPL clients other than DPPL, which
43+
will hopefully keep us focused: simulation based calibration, SimplePPL (a BUGS-like frontend), and
44+
ParetoSmoothing.jl.
45+
46+
### What is going to change?
47+
48+
- For the end user of Turing.jl: nothing. You usually don’t use `VarInfo`, or the raw evaluator
49+
interface, anyways. (Although if the newer data structures are more user-friendly, they might occur
50+
in more places in the future?)
51+
- For people having a look into code using `VarInfo`, or starting to hack on Turing.jl/DPPL.jl: a
52+
huge reduction in cognitive complexity. `VarInfo` implementations should be readable on their own,
53+
and the implemented functions layed out somewhere. Its usages should look like for any other nice,
54+
normal data structure.
55+
- For core DPPL.jl implementors: same as the previous, plus: a standard against which to improve and
56+
test `VarInfo`, and a clearly defined design space for new data structures.
57+
- For AbstractPPL.jl clients/PPL implementors: an interface to program against (as with the rest of
58+
APPL), and an existing set of well-specified, flexible trace data types with different
59+
characteristics.
60+
61+
And in terms of implementation work in DPPL.jl: once the interface is fixed (or even during fixing
62+
it), varinfo.jl will undergo a heavy refactoring – which should make it _simpler_! (No three
63+
different getter functions with slightly different semantics, etc…).
64+
65+
66+
## Dictionary interface
67+
68+
The basic idea is for all `VarInfo`s to behave like ordered dictionaries with `VarName` keys – all
69+
common operations should just work. There are two things that make them more special, though:
70+
71+
1. “Fancy indexing”: since `VarName`s are structured themselves, the `VarInfo` should be have a bit
72+
like a trie, in the sense that all prefixes of stored keys should be retrievable. Also,
73+
subsumption of `VarName`s should be respected (see end of this document):
74+
75+
```julia
76+
vi[@varname(x.a)] = [1,2,3]
77+
vi[@varname(x.b)] = [4,5,6]
78+
vi[@varname(x.a[2])] == 2
79+
vi[@varname(x)] == (; a = [1,2,3], b = [4,5,6])
80+
```
81+
82+
Generalizations that go beyond simple cases (those that you can imagine by storing individual
83+
`setfield!`s in a tree) need not be implemented in the beginning; e.g.,
84+
85+
```julia
86+
vi[@varname(x[1])] = 1
87+
vi[@varname(x[2])] = 2
88+
keys(vi) == [x[1], x[2]]
89+
90+
vi[@varname(x)] = [1,2]
91+
keys(vi) == [x]
92+
```
93+
94+
2. (_This has to be discussed further._) Information other than the sampled values, such as flags,
95+
metadata, pointwise likelihoods, etc., can in principle be stored in multiple of these “`VarInfo`
96+
dicts” with parallel structure. For efficiency, it is thinkable to devise a design such that
97+
multiple fields can be stored under the same indexing structure.
98+
99+
```julia
100+
vi[@varname(x[1])] == 1
101+
vi[@varname(x[1])].meta["bla"] == false
102+
```
103+
104+
or something in that direction.
105+
106+
(This is logically equivalent to a dictionary with named tuple values. Maybe we can do what
107+
[`DictTable`](https://github.com/JuliaData/TypedTables.jl/blob/main/src/DictTable.jl) does?)
108+
109+
The old `order` field, indicating at which position in the evaluator function a variable has
110+
been added (essentially a counter of insertions) can actually be left out completely, since the
111+
dictionary is specified to be ordered by insertion.
112+
113+
The important question here is: should the “joint data structure” behave like a dictionary of
114+
`NamedTuple`s (`eltype(vi) == @NamedTuple{value::T, ℓ::Float64, meta}`), or like a struct of
115+
dicts with shared keys (`eltype(vi.value) <: T`, `eltype(vi.ℓ) <: Float64`, )?
116+
117+
The required dictionary functions are about the following:
118+
119+
- Pure functions:
120+
- `iterate`, yielding pairs of `VarName` and the stored value
121+
- `IteratorEltype == HasEltype()`, `IteratorSize = HasLength()`
122+
- `keys`, `values`, `pairs`, `length` consistent with `iterate`
123+
- `eltype`, `keytype`, `valuetype`
124+
- `get`, `getindex`, `haskey` for indexing by `VarName`
125+
- `merge` to join two `VarInfo`s
126+
- Mutating functions:
127+
- `insert!!`, `set!!`
128+
- `merge!!` to add and join elements (TODO: think about `merge`)
129+
- `setindex!!`
130+
- `empty!!`, `delete!!`, `unset!!` (_Are these really used anywhere? Not having them makes persistent
131+
implementations much easier!_)
132+
133+
I believe that adopting the interface of
134+
[Dictionaries.jl](https://github.com/andyferris/Dictionaries.jl), not `Base.AbstractDict`, would be
135+
ideal, since their approach make key sharing and certain operations naturally easy (particularly
136+
“broadcast-style”, i.e., transformations on the values, but not the keys).
137+
138+
Other `Base` functions, like `enumerate`, should follow from the above.
139+
140+
`length` might appear weird – but it should definitely be consistent with the iterator.
141+
142+
It would be really cool if `merge` supported the combination of distinct types of implementations,
143+
e.g., a dynamic and a tuple-based part.
144+
145+
To support both mutable and immutable/persistent implementations, let’s require consistent
146+
BangBang.jl style mutators throughout.
147+
148+
149+
## Transformations/Bijectors
150+
151+
Transformations should ideally be handled explicitely and from outside: automatically by the
152+
compiler macro, or at the places required by samplers.
153+
154+
Implementation-wise, they can probably be expressed as folds?
155+
156+
```julia
157+
map(v -> link(v.dist, v.value), vi)
158+
```
159+
160+
161+
## Linearization
162+
163+
There are multiple possible approaches to handle this:
164+
165+
1. As a special case of conversion: `Vector(vi)`
166+
2. `copy!(vals_array, vi)`.
167+
3. As a fold: `mapreduce(v -> vec(v.value), append!, vi, init=Float64[])`
168+
169+
Also here, I think that the best implementation would be through a fold. Variants (1) or (2) might
170+
additionally be provided as syntactic sugar.
171+
172+
173+
---
174+
175+
# `VarName`-based axioms
176+
177+
What follows is mostly an attempt to formalize subsumption.
178+
179+
First, remember that in Turing.jl we can always work with _concretized_ `VarName`s: `begin`/`end`,
180+
`:`, and boolean indexing are all turned into some form of concrete cartesian or array indexing
181+
(assuming [this suggestion](https://github.com/TuringLang/AbstractPPL.jl/issues/35) being
182+
implemented). This makes all index comparisons static.
183+
184+
Now, `VarName`s have a compositional structure: they can be built by composing a root variable with
185+
more and more lenses (`VarName{v}()` starts off with an `IdentityLens`):
186+
187+
```julia
188+
julia> vn = VarName{:x}() ∘ Setfield.IndexLens((1:10, 1) ∘ Setfield.IndexLens((2, )))
189+
x[1:10,1][2]
190+
```
191+
192+
(_Note that the composition function, ``, is really in wrong order; but this is a heritage of
193+
Setfield.jl._)
194+
195+
By “subsumption”, we mean the notion of a `VarName` expressing a more nested path than another one:
196+
197+
```julia
198+
subsumes(@varname(x.a), @varname(x.a[1]))
199+
@varname(x.a) ⊒ @varname(x.a[1]) # \sqsupseteq
200+
@varname(x.a) ⋢ @varname(x.a[1]) # \nsqsubseteq
201+
```
202+
203+
Thus, we have the following axioms for `VarName`s (“variables” are `VarName{n}()`):
204+
205+
1. `x ⊑ x` for all variables `x`
206+
2. `x ≍ y` for `x ≠ y` (i.e., distinct variables are incomparable; `x ⋢ y` and `y ⋢ x`) (`` is `\asymp`)
207+
3. `x ∘ ℓ ⊑ x` for all variables `x` and lenses ``
208+
4. `x ∘ ℓ₁ ⊑ x ∘ ℓ₂ ⇔ ℓ₁ ⊑ ℓ₂`
209+
210+
For the last axiom to work, we also have to define subsumption of individual, non-composed lenses:
211+
212+
1. `PropertyLens(a) == PropertyLens(b) ⇔ a == b`, for all symbols `a`, `b`
213+
2. `FunctionLens(f) == FunctionLens(g) ⇔ f == g` (under extensional equality; I’m only mentioning
214+
this in case we ever generalize to Bijector-ed variables like `@varname(log(x))`)
215+
3. `IndexLens(ι₁) ⊑ IndexLens(ι₂)` if the index tuple `ι₂` covers all indices in `ι₁`; for example,
216+
`_[1, 2:10] ⊑ _[1:10, 1:20]`. (_This is a bit fuzzy and not all corner cases have been
217+
considered yet!_)
218+
4. `IdentityLens() == IdentityLens()`
219+
4. `ℓ₁ ≍ ℓ₂`, otherwise
220+
221+
Together, this should make `VarName`s under subsumption a reflexive poset.
222+
223+
The fundamental requirement for `VarInfo`s is then:
224+
225+
```
226+
vi[x ∘ ℓ] == get(vi[x], ℓ)
227+
```
228+
229+
So we always want the following to work, automatically:
230+
231+
```julia
232+
vi = insert!!(vi, vn, x)
233+
vi[vn] == x
234+
```
235+
236+
(the trivial case), and
237+
238+
```julia
239+
x = set!!(x, ℓ₁, a)
240+
x = set!!(x, ℓ₂, b)
241+
vi = insert!!(vi, vn, x)
242+
vi[vn ℓ₁] == a
243+
vi[vn ℓ₂] == b
244+
```
245+
246+
since `vn` subsumes both `vn ∘ ℓ₁` and `vn ∘ ℓ₂`.
247+
248+
Whether the opposite case is supported may depend on the implementation. The most complicated part
249+
is “unification”:
250+
251+
```julia
252+
vi = insert!!(vi, vn ℓ₁, a)
253+
vi = insert!!(vi, vn ℓ₂, b)
254+
get(vi[vn], ℓ₁) == a
255+
get(vi[vn], ℓ₂) == b
256+
```
257+
258+
where `vn ∘ ℓ₁` and `vn ∘ ℓ₂` need to be recognized as “children” of a common parent `vn`.

0 commit comments

Comments
 (0)