Merge pull request #14 from TuringLang/phg/model_abstraction

yebai · web-flow · commit eee8e1ccf695 · 2021-05-13T18:43:18.000+01:00
First draft of model abstraction
diff --git a/Project.toml b/Project.toml
@@ -7,7 +7,9 @@ version = "0.1.4"
 
 [deps]
 AbstractMCMC = "80f14c24-f653-4e6a-9b94-39d6b0f70001"
+StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
 
 [compat]
 AbstractMCMC = "2, 3"
+StatsBase = "0.33.4"
 julia = "1"
diff --git a/README.md b/README.md
@@ -20,3 +20,5 @@ simplified models such as GPs, GLMs, or plain log-density problems.
 
 A more short term goal is to start a process of cleanly refactoring and justifying parts of
 AbstractPPL.jl’s design, and hopefully to get on closer terms with Soss.jl.
+
+See [interface draft](interface.md).
diff --git a/interface.md b/interface.md
@@ -0,0 +1,288 @@
+## `AbstractProbabilisticProgram` interface
+
+There are at least two somewhat incompatible conventions used for the term “model”.  None of this is
+particularly exact, but:
+
+- In Turing.jl, if you write down a `@model` function and call it on arguments, you get a model
+  object paired with (a possibly empty set of) observations. This can be treated as instantiated
+  “conditioned” object with fixed values for parameters and observations.
+- In Soss.jl, “model” is used for a symbolic “generative” object from which concrete functions, such as
+  densities and sampling functions, can be derived, _and_ which you can later condition on (and in
+  turn get a conditional density etc.).
+
+Relevant discussions:
+[1](https://julialang.zulipchat.com/#narrow/stream/234072-probprog/topic/Naming.20the.20.22likelihood.22.20thingy),
+[2](https://github.com/TuringLang/AbstractPPL.jl/discussions/10).
+
+
+### TL/DR:
+
+
+There are three interrelating aspects that this interface intends to standardize:
+
+- Density calculation
+- Sampling
+- “Conversions” between different conditionings of models
+
+Therefore, the interface consists of:
+
+- `condition(::Model, ::Trace) -> ConditionedModel`
+- `decondition(::ConditionedModel) -> GenerativeModel`
+- `sample(::Model, ::Sampler = Exact(), [Int])` (from `AbstractMCMC.sample`)
+- `logdensity(::Model, ::Trace)`
+
+
+### Traces & probability expressions
+
+First, an infrastructural requirement which we will need below to write things out.
+
+The kinds of models we consider are, at least in a theoretical sense, distributions over *traces* –
+types which carry collections of values together with their names.  Existing realizations of these
+are `VarInfo` in Turing.jl, choice maps in Gen.jl, and the usage of named tuples in Soss.jl.
+
+Traces solve the problem of having to name random variables in function calls, and in samples from
+models.  In essence, every concrete trace type will just be a fancy kind of dictionary from variable
+names (ideally, `VarName`s) to values.
+
+Since we have to use this kind of mapping a lot in the specification of the interface, let’s for now
+just choose some arbitrary macro-like syntax like the following:
+
+```julia
+@T(Y[1] = …, Z = …)
+```
+
+Some more ideas for this kind of object can be found at the end.
+
+
+### “Conversions”
+
+The purpose of this part is to provide common names for how we want a model instance to be
+understood.  As we have seen, in some modelling languages, model instances are primarily generative,
+with some parameters fixed, while other instance types pair model instances conditioned on
+observations.  What I call “conversions” here is just an interface to transform between these two
+views and unify the involved objects under one language.
+
+Let’s start from a generative model with parameter `μ`:
+
+```julia
+# (hypothetical) generative spec a la Soss
+@generative_model function foo_gen(μ)
+    X ~ Normal(0, μ)
+    Y[1] ~ Normal(X)
+    Y[2] ~ Normal(X + 1)
+end
+```
+
+Applying the “constructor” `foo_gen` now means to fix the parameter, and should return a concrete
+object of the generative type:
+
+```julia
+g = foo_gen(μ=…)::SomeGenerativeModel
+```
+
+With this kind of object, we should be able to sample and calculate joint log-densities from, i.e.,
+over the combined trace space of `X`, `Y[1]`, and `Y[2]` – either directly, or by deriving the
+respective functions (e.g., by converting form a symbolic representation).
+
+For model types that contain enough structural information, it should then be possible to condition
+on observed values and obtain a conditioned model:
+
+```julia
+condition(g, @T(Y = …))::SomeConditionedModel
+```
+
+For this operation, there will probably exist syntactic sugar in the form of
+
+```julia
+g | @T(Y = …)
+```
+
+Now, if we start from a Turing.jl-like model instead, with the “observation part” already specified,
+we have a situation like this, with the observations `Y` fixed in the instantiation:
+
+```julia
+# conditioned spec a la DPPL
+@model function foo(Y, μ)
+    X ~ Normal(0, μ)
+    Y[1] ~ Normal(X)
+    Y[2] ~ Normal(X + 1)
+end
+
+m = foo(Y=…, μ=…)::SomeConditionedModel
+```
+
+From this we can, if supported, go back to the generative form via `decondition`, and back via
+`condition`:
+
+```julia
+decondition(m) == g::SomeGenerativeModel
+m == condition(g, @T(Y = …))
+```
+
+(with equality in distribution).
+
+In the case of Turing.jl, the object `m` would at the same time contain the information about the
+generative and posterior distribution `condition` and `decondition` can simply return different
+kinds of “tagged” model types which put the model specification into a certain context.
+
+Soss.jl pretty much already works like the examples above, with one model object being either a
+`JointModel` or a `ConditionedModel`, and the `|` syntax just being sugar for the latter.
+
+A hypothetical `DensityModel`, or something like the types from LogDensityProblems.jl, would be a
+case for a model type that does not support the structural operations `condition` and
+`decondition`.
+
+The invariances between these operations should follow normal rules of probability theory.  Not all
+methods or directions need to be supported for every modelling language; in this case, a
+`MethodError` or some other runtime error should be raised.
+
+There is no strict requirement for generative models and conditioned models to have different types
+or be tagged with variable names etc.  This is a choice to be made by the concrete implementation.
+
+Decomposing models into prior and observation distributions is not yet specified; the former is
+rather easy, since it is only a marginal of the generative distribution, while the latter requires
+more structural information.  Perhaps both can be generalized under the `query` function I discuss
+at the end.
+
+
+### Sampling
+
+Sampling in this case refers to producing values from the distribution specified in a model
+instance, either following the distribution exactly, or approximating it through a Monte Carlo
+algorithm.
+
+All sampleable model instances are assumed to implement the `AbstractMCMC` interface – i.e., at
+least [`step`](https://github.com/TuringLang/AbstractMCMC.jl#sampling-step), and accordingly
+`sample`, `steps`, `Samples`.  The most important aspect is `sample`, though, which plays the role
+of `rand` for distributions.
+
+The results of `sample` generalize `rand` – while `rand(d, N)` is assumed to give you iid samples,
+`sample(m, sampler, N)` returns a sample from a sequence (known as chain in the case of MCMC) of
+length `N` approximating `m`’s distribution by a specific sampling algorithm (which of course
+subsumes the case that `m` can be sampled from exactly, in which case the “chain” actually is iid).
+
+Depending on which kind of sampling is supported, several methods may be supported.  In the case of
+a (posterior) conditioned model with no known sampling procedure, we just have what is given through
+`AbstractMCMC`:
+
+```julia
+sample([rng], m, N, sampler; [args…]) # chain of length N using `sampler`
+```
+
+In the case of a generative model, or a posterior model with exact solution, we can have some more
+methods without the need to specify a sampler:
+
+```julia
+sample([rng], m; [args…])    # one random sample
+sample([rng], m, N; [args…]) # N iid samples; equivalent to `rand` in certain cases
+```
+
+It should be possible to implement this by a special sampler, say, `Exact` (name still to be
+discussed), that can then also be reused for generative sampling:
+
+```
+step(g, spl = Exact(), state = nothing) # IID sample from exact distribution with trivial state
+sample(g, Exact(), [N]) 
+```
+
+with dispatch failing for models types for which exact sampling is not possible (or not
+implemented).
+
+This could even be useful for Monte Carlo methods not being based on Markov Chains, e.g.,
+particle-based sampling using a return type with weights, or rejection sampling.
+
+Not all variants need to be supported – for example, a posterior model might not support
+`sample(m)` when exact sampling is not possible, only `sample(m, N, alg)` for Markov chains.
+
+`rand` is then just a special case when “trivial” exact sampling works for a model, e.g. a joint
+model.
+
+
+### Density Calculation
+
+Since the different “versions” of how a model is to be understood as generative or conditioned are
+to be expressed in the type or dispatch they support, there should be no need for separate functions
+`logjoint`, `loglikelihood`, etc., which force these semantic distinctions on the implementor; one
+`logdensity` should suffice for all, with the distinction being made by the capabilities of the
+concrete model instance.
+
+Note that this generalizes `logpdf`, too, since the posterior density will of course in general be
+unnormalized and hence not a probability density.
+
+The evaluation will usually work with the internal, concrete trace type, like `VarInfo` in Turing.jl:
+
+```julia
+logdensity(m, vi)
+```
+
+But the user will more likely work on the interface using probability expressions:
+
+```julia
+logdensity(m, @T(X = ...))
+```
+
+(Note that this would replace the current `prob` string macro in Turing.jl.)
+
+Densities need not be normalized.
+
+
+#### Implementation notes 
+
+It should be able to make this fall back on the internal method with the right definition and
+implementation of `maketrace`:
+
+```julia
+logdensity(m, t::ProbabilityExpression) = logdensity(m, maketrace(m, t))
+```
+
+There is one open question – should normalized and unnormalized densities be able to be
+distinguished?  This could be done by dispatch as well, e.g., if the caller wants to make sure
+normalization:
+
+```
+logdensity(g, @T(X = ..., Y = ..., Z = ...); normalized=Val{true})
+```
+
+Although there is proably a better way through traits; maybe like for arrays, with
+`NormalizationStyle(g, t) = IsNormalized()`?
+
+
+## More on probability expressions
+
+Note that this needs to be a macro, if written this way, since the keys may themselves be more
+complex than just symbols (e.g., indexed variables.)  (Don’t hang yourselves up on that `@T` name
+though, this is just a working draft.)
+
+The idea here is to standardize the construction (and manipulation) of *abstract probability
+expressions*, plus the interface for turning them into concrete traces for a specific model – like
+[`@formula`](https://juliastats.org/StatsModels.jl/stable/formula/#Modeling-tabular-data) and
+[`apply_schema`](https://juliastats.org/StatsModels.jl/stable/internals/#Semantics-time-(apply_schema))
+from StatsModels.jl are doing.
+
+Maybe the following would suffice to do that:
+
+```julia
+maketrace(m, t)::tracetype(m, t)
+```
+
+where `maketrace` produces a concrete trace corresponding to `t` for the model `m`, and `tracetype`
+is the corresponding `eltype`–like function giving you the concrete trace type for a certain model
+and probability expression combination.
+
+Possible extensions of this idea:
+
+- Pearl-style do-notation: `@T(Y = y | do(X = x))`
+- Allowing free variables, to specify model transformations: `query(m, @T(X | Y))`
+- “Graph queries”: `@T(X | Parents(X))`, `@T(Y | Not(X))` (a nice way to express Gibbs conditionals!)
+- Predicate style for “measure queries”: `@T(X < Y + Z)`
+
+The latter applications are the reason I originally liked the idea of the macro being called `@P`
+(or even `@𝓅` or `@ℙ`), since then it would look like a “Bayesian probability expression”: `@P(X <
+Y + Z)`.  But this would not be so meaningful in the case of representing a trace instance.
+
+Perhaps both `@T` and `@P` can coexist, and both produce different kinds of `ProbabilityExpression`
+objects?
+
+NB: the exact details of this kind of “schema application”, and what results from it, will need to
+be specified in the interface of `AbstractModelTrace`, aka “the new `VarInfo`”.
+
diff --git a/src/varname.jl b/src/varname.jl
@@ -255,7 +255,7 @@ function varname(expr::Expr)
         sym, inds = vsym(expr), vinds(expr)
         return :($(AbstractPPL.VarName){$(QuoteNode(sym))}($inds))
     else
-        throw("Malformed variable name $(expr)!")
+        error("Malformed variable name $(expr)!")
     end
 end
 
@@ -295,7 +295,7 @@ function vsym(expr::Expr)
     if Meta.isexpr(expr, :ref)
         return vsym(expr.args[1])
     else
-        throw("Malformed variable name $(expr)!")
+        error("Malformed variable name $(expr)!")
     end
 end
 
@@ -363,6 +363,6 @@ function vinds(expr::Expr)
         init = vinds(ex.args[1]).args
         return Expr(:tuple, init..., last)
     else
-        throw("VarName: Mis-formed variable name $(expr)!")
+        error("Mis-formed variable name $(expr)!")
     end
 end