|
| 1 | +# `AbstractModelTrace`/`VarInfo` interface proposal |
| 2 | + |
| 3 | +## Background |
| 4 | + |
| 5 | +### Why do we do this? |
| 6 | + |
| 7 | +As I have said before: |
| 8 | + |
| 9 | +> There are many aspects that make VarInfo a very complex data structure. |
| 10 | +
|
| 11 | +Currently, there is an insane amount of complexity and implementation details in `varinfo.jl`, which |
| 12 | +has been rewritten multiple times with different concerns in mind – most times to improve concrete |
| 13 | +needs of Turing.jl, such as type stability, or requirements of specific samplers. |
| 14 | + |
| 15 | +This unfortunately makes `VarInfo` extremely opaque: it is hard to refactor without breaking |
| 16 | +anything (nobody really dares touching it), and a lot of knowledge about Turing.jl/DynamicPPL.jl |
| 17 | +internals is needed in order to judge the effects of changes. |
| 18 | + |
| 19 | +### Design choices |
| 20 | + |
| 21 | +Recently, @torfjelde [has shown](https://github.com/TuringLang/DynamicPPL.jl/pull/267/files) that a |
| 22 | +much simpler implementation is feasible – basically, just a wrapped `NamedTuple` with a minimal |
| 23 | +interface. |
| 24 | + |
| 25 | +The purpose of this proposal is twofold: first, to think about what a sufficient interface for |
| 26 | +`AbstractModelTrace`, the abstract supertype of `VarInfo`, should be, to allow multiple specialized |
| 27 | +variants and refactor the existing ones (typed/untyped and simple). Second, to view the problem as |
| 28 | +the design of an abstract data type: the specification of construction and modification mechanisms |
| 29 | +for a dictionary-like structure. |
| 30 | + |
| 31 | +Related previous discussions: |
| 32 | + |
| 33 | +- [Discussion about `VarName`](https://github.com/TuringLang/AbstractPPL.jl/discussions/7) |
| 34 | +- [`AbstractVarInfo` representation](https://github.com/TuringLang/AbstractPPL.jl/discussions/5) |
| 35 | + |
| 36 | +Additionally (but closely related), the second part tries to formalize the “subsumption” mechanism |
| 37 | +of `VarName`s, and its interaction with using `VarName`s as keys/indices. |
| 38 | + |
| 39 | +Our discussions take place in what is a bit of a fuzzy zone between the part that is really |
| 40 | +“abstract”, and meant for the wider purpuse of AbstractPPL.jl – the implementation of probabilistic |
| 41 | +programming systems in general – and our concrete needs within DPPL. I hope to always stay abstract |
| 42 | +and reusable; and there are already a couple of candidates for APPL clients other than DPPL, which |
| 43 | +will hopefully keep us focused: simulation based calibration, SimplePPL (a BUGS-like frontend), and |
| 44 | +ParetoSmoothing.jl. |
| 45 | + |
| 46 | +### What is going to change? |
| 47 | + |
| 48 | +- For the end user of Turing.jl: nothing. You usually don’t use `VarInfo`, or the raw evaluator |
| 49 | +interface, anyways. (Although if the newer data structures are more user-friendly, they might occur |
| 50 | +in more places in the future?) |
| 51 | +- For people having a look into code using `VarInfo`, or starting to hack on Turing.jl/DPPL.jl: a |
| 52 | +huge reduction in cognitive complexity. `VarInfo` implementations should be readable on their own, |
| 53 | +and the implemented functions layed out somewhere. Its usages should look like for any other nice, |
| 54 | +normal data structure. |
| 55 | +- For core DPPL.jl implementors: same as the previous, plus: a standard against which to improve and |
| 56 | +test `VarInfo`, and a clearly defined design space for new data structures. |
| 57 | +- For AbstractPPL.jl clients/PPL implementors: an interface to program against (as with the rest of |
| 58 | +APPL), and an existing set of well-specified, flexible trace data types with different |
| 59 | +characteristics. |
| 60 | + |
| 61 | +And in terms of implementation work in DPPL.jl: once the interface is fixed (or even during fixing |
| 62 | +it), varinfo.jl will undergo a heavy refactoring – which should make it _simpler_! (No three |
| 63 | +different getter functions with slightly different semantics, etc…). |
| 64 | + |
| 65 | + |
| 66 | +## Dictionary interface |
| 67 | + |
| 68 | +The basic idea is for all `VarInfo`s to behave like ordered dictionaries with `VarName` keys – all |
| 69 | +common operations should just work. There are two things that make them more special, though: |
| 70 | + |
| 71 | +1. “Fancy indexing”: since `VarName`s are structured themselves, the `VarInfo` should be have a bit |
| 72 | + like a trie, in the sense that all prefixes of stored keys should be retrievable. Also, |
| 73 | + subsumption of `VarName`s should be respected (see end of this document): |
| 74 | + |
| 75 | + ```julia |
| 76 | + vi[@varname(x.a)] = [1,2,3] |
| 77 | + vi[@varname(x.b)] = [4,5,6] |
| 78 | + vi[@varname(x.a[2])] == 2 |
| 79 | + vi[@varname(x)] == (; a = [1,2,3], b = [4,5,6]) |
| 80 | + ``` |
| 81 | + |
| 82 | + Generalizations that go beyond simple cases (those that you can imagine by storing individual |
| 83 | + `setfield!`s in a tree) need not be implemented in the beginning; e.g., |
| 84 | + |
| 85 | + ```julia |
| 86 | + vi[@varname(x[1])] = 1 |
| 87 | + vi[@varname(x[2])] = 2 |
| 88 | + keys(vi) == [x[1], x[2]] |
| 89 | + |
| 90 | + vi[@varname(x)] = [1,2] |
| 91 | + keys(vi) == [x] |
| 92 | + ``` |
| 93 | + |
| 94 | +2. (_This has to be discussed further._) Information other than the sampled values, such as flags, |
| 95 | + metadata, pointwise likelihoods, etc., can in principle be stored in multiple of these “`VarInfo` |
| 96 | + dicts” with parallel structure. For efficiency, it is thinkable to devise a design such that |
| 97 | + multiple fields can be stored under the same indexing structure. |
| 98 | + |
| 99 | + ```julia |
| 100 | + vi[@varname(x[1])] == 1 |
| 101 | + vi[@varname(x[1])].meta["bla"] == false |
| 102 | + ``` |
| 103 | + |
| 104 | + or something in that direction. |
| 105 | + |
| 106 | + (This is logically equivalent to a dictionary with named tuple values. Maybe we can do what |
| 107 | + [`DictTable`](https://github.com/JuliaData/TypedTables.jl/blob/main/src/DictTable.jl) does?) |
| 108 | + |
| 109 | + The old `order` field, indicating at which position in the evaluator function a variable has |
| 110 | + been added (essentially a counter of insertions) can actually be left out completely, since the |
| 111 | + dictionary is specified to be ordered by insertion. |
| 112 | + |
| 113 | + The important question here is: should the “joint data structure” behave like a dictionary of |
| 114 | + `NamedTuple`s (`eltype(vi) == @NamedTuple{value::T, ℓ::Float64, meta}`), or like a struct of |
| 115 | + dicts with shared keys (`eltype(vi.value) <: T`, `eltype(vi.ℓ) <: Float64`, …)? |
| 116 | + |
| 117 | +The required dictionary functions are about the following: |
| 118 | + |
| 119 | +- Pure functions: |
| 120 | + - `iterate`, yielding pairs of `VarName` and the stored value |
| 121 | + - `IteratorEltype == HasEltype()`, `IteratorSize = HasLength()` |
| 122 | + - `keys`, `values`, `pairs`, `length` consistent with `iterate` |
| 123 | + - `eltype`, `keytype`, `valuetype` |
| 124 | + - `get`, `getindex`, `haskey` for indexing by `VarName` |
| 125 | + - `merge` to join two `VarInfo`s |
| 126 | +- Mutating functions: |
| 127 | + - `insert!!`, `set!!` |
| 128 | + - `merge!!` to add and join elements (TODO: think about `merge`) |
| 129 | + - `setindex!!` |
| 130 | + - `empty!!`, `delete!!`, `unset!!` (_Are these really used anywhere? Not having them makes persistent |
| 131 | + implementations much easier!_) |
| 132 | + |
| 133 | +I believe that adopting the interface of |
| 134 | +[Dictionaries.jl](https://github.com/andyferris/Dictionaries.jl), not `Base.AbstractDict`, would be |
| 135 | +ideal, since their approach make key sharing and certain operations naturally easy (particularly |
| 136 | +“broadcast-style”, i.e., transformations on the values, but not the keys). |
| 137 | + |
| 138 | +Other `Base` functions, like `enumerate`, should follow from the above. |
| 139 | + |
| 140 | +`length` might appear weird – but it should definitely be consistent with the iterator. |
| 141 | + |
| 142 | +It would be really cool if `merge` supported the combination of distinct types of implementations, |
| 143 | +e.g., a dynamic and a tuple-based part. |
| 144 | + |
| 145 | +To support both mutable and immutable/persistent implementations, let’s require consistent |
| 146 | +BangBang.jl style mutators throughout. |
| 147 | + |
| 148 | + |
| 149 | +## Transformations/Bijectors |
| 150 | + |
| 151 | +Transformations should ideally be handled explicitely and from outside: automatically by the |
| 152 | +compiler macro, or at the places required by samplers. |
| 153 | + |
| 154 | +Implementation-wise, they can probably be expressed as folds? |
| 155 | + |
| 156 | +```julia |
| 157 | +map(v -> link(v.dist, v.value), vi) |
| 158 | +``` |
| 159 | + |
| 160 | + |
| 161 | +## Linearization |
| 162 | + |
| 163 | +There are multiple possible approaches to handle this: |
| 164 | + |
| 165 | +1. As a special case of conversion: `Vector(vi)` |
| 166 | +2. `copy!(vals_array, vi)`. |
| 167 | +3. As a fold: `mapreduce(v -> vec(v.value), append!, vi, init=Float64[])` |
| 168 | + |
| 169 | +Also here, I think that the best implementation would be through a fold. Variants (1) or (2) might |
| 170 | +additionally be provided as syntactic sugar. |
| 171 | + |
| 172 | + |
| 173 | +--- |
| 174 | + |
| 175 | +# `VarName`-based axioms |
| 176 | + |
| 177 | +What follows is mostly an attempt to formalize subsumption. |
| 178 | + |
| 179 | +First, remember that in Turing.jl we can always work with _concretized_ `VarName`s: `begin`/`end`, |
| 180 | +`:`, and boolean indexing are all turned into some form of concrete cartesian or array indexing |
| 181 | +(assuming [this suggestion](https://github.com/TuringLang/AbstractPPL.jl/issues/35) being |
| 182 | +implemented). This makes all index comparisons static. |
| 183 | + |
| 184 | +Now, `VarName`s have a compositional structure: they can be built by composing a root variable with |
| 185 | +more and more lenses (`VarName{v}()` starts off with an `IdentityLens`): |
| 186 | + |
| 187 | +```julia |
| 188 | +julia> vn = VarName{:x}() ∘ Setfield.IndexLens((1:10, 1) ∘ Setfield.IndexLens((2, ))) |
| 189 | +x[1:10,1][2] |
| 190 | +``` |
| 191 | + |
| 192 | +(_Note that the composition function, `∘`, is really in wrong order; but this is a heritage of |
| 193 | +Setfield.jl._) |
| 194 | + |
| 195 | +By “subsumption”, we mean the notion of a `VarName` expressing a more nested path than another one: |
| 196 | + |
| 197 | +```julia |
| 198 | +subsumes(@varname(x.a), @varname(x.a[1])) |
| 199 | +@varname(x.a) ⊒ @varname(x.a[1]) # \sqsupseteq |
| 200 | +@varname(x.a) ⋢ @varname(x.a[1]) # \nsqsubseteq |
| 201 | +``` |
| 202 | + |
| 203 | +Thus, we have the following axioms for `VarName`s (“variables” are `VarName{n}()`): |
| 204 | + |
| 205 | +1. `x ⊑ x` for all variables `x` |
| 206 | +2. `x ≍ y` for `x ≠ y` (i.e., distinct variables are incomparable; `x ⋢ y` and `y ⋢ x`) (`≍` is `\asymp`) |
| 207 | +3. `x ∘ ℓ ⊑ x` for all variables `x` and lenses `ℓ` |
| 208 | +4. `x ∘ ℓ₁ ⊑ x ∘ ℓ₂ ⇔ ℓ₁ ⊑ ℓ₂` |
| 209 | + |
| 210 | +For the last axiom to work, we also have to define subsumption of individual, non-composed lenses: |
| 211 | + |
| 212 | +1. `PropertyLens(a) == PropertyLens(b) ⇔ a == b`, for all symbols `a`, `b` |
| 213 | +2. `FunctionLens(f) == FunctionLens(g) ⇔ f == g` (under extensional equality; I’m only mentioning |
| 214 | + this in case we ever generalize to Bijector-ed variables like `@varname(log(x))`) |
| 215 | +3. `IndexLens(ι₁) ⊑ IndexLens(ι₂)` if the index tuple `ι₂` covers all indices in `ι₁`; for example, |
| 216 | + `_[1, 2:10] ⊑ _[1:10, 1:20]`. (_This is a bit fuzzy and not all corner cases have been |
| 217 | + considered yet!_) |
| 218 | +4. `IdentityLens() == IdentityLens()` |
| 219 | +4. `ℓ₁ ≍ ℓ₂`, otherwise |
| 220 | + |
| 221 | +Together, this should make `VarName`s under subsumption a reflexive poset. |
| 222 | + |
| 223 | +The fundamental requirement for `VarInfo`s is then: |
| 224 | + |
| 225 | +``` |
| 226 | +vi[x ∘ ℓ] == get(vi[x], ℓ) |
| 227 | +``` |
| 228 | +
|
| 229 | +So we always want the following to work, automatically: |
| 230 | +
|
| 231 | +```julia |
| 232 | +vi = insert!!(vi, vn, x) |
| 233 | +vi[vn] == x |
| 234 | +``` |
| 235 | + |
| 236 | +(the trivial case), and |
| 237 | + |
| 238 | +```julia |
| 239 | +x = set!!(x, ℓ₁, a) |
| 240 | +x = set!!(x, ℓ₂, b) |
| 241 | +vi = insert!!(vi, vn, x) |
| 242 | +vi[vn ∘ ℓ₁] == a |
| 243 | +vi[vn ∘ ℓ₂] == b |
| 244 | +``` |
| 245 | + |
| 246 | +since `vn` subsumes both `vn ∘ ℓ₁` and `vn ∘ ℓ₂`. |
| 247 | + |
| 248 | +Whether the opposite case is supported may depend on the implementation. The most complicated part |
| 249 | +is “unification”: |
| 250 | + |
| 251 | +```julia |
| 252 | +vi = insert!!(vi, vn ∘ ℓ₁, a) |
| 253 | +vi = insert!!(vi, vn ∘ ℓ₂, b) |
| 254 | +get(vi[vn], ℓ₁) == a |
| 255 | +get(vi[vn], ℓ₂) == b |
| 256 | +``` |
| 257 | + |
| 258 | +where `vn ∘ ℓ₁` and `vn ∘ ℓ₂` need to be recognized as “children” of a common parent `vn`. |
0 commit comments