checkpoint on documentation

marcoct · marcoct · commit b173841857dc · 2021-05-18T12:09:58.000-04:00
diff --git a/docs/src/ref/modeling.md b/docs/src/ref/modeling.md
@@ -254,6 +254,7 @@ See [Generative Function Interface](@ref) for more information about traces.
 
 A `@gen` function may begin with an optional block of *trainable parameter declarations*.
 The block consists of a sequence of statements, beginning with `@param`, that declare the name and Julia type for each trainable parameter.
+The Julia type must be either a subtype of `Real` or subtype of `Array{<:Real}`.
 The function below has a single trainable parameter `theta` with type `Float64`:
 ```julia
 @gen function foo(prob::Float64)
@@ -264,23 +265,22 @@ The function below has a single trainable parameter `theta` with type `Float64`:
 end
 ```
 Trainable parameters obey the same scoping rules as Julia local variables defined at the beginning of the function body.
-The value of a trainable parameter is undefined until it is initialized using [`init_param!`](@ref).
+After the definition of the generative function, you must register all of the parameters used by the generative function using [`register_parameters!`](@ref) (this is not required if you instead use the [Static Modeling Language](@ref)):
+```julia
+register_parameters!(foo, [:theta])
+```
+The value of a trainable parameter is undefined until it is initialized using [`init_parameter!`](@ref):
+```julia
+init_parameter!((foo, :theta), 0.0)
+```
 In addition to the current value, each trainable parameter has a current **gradient accumulator** value.
 The gradient accumulator value has the same shape (e.g. array dimension) as the parameter value.
-It is initialized to all zeros, and is incremented by [`accumulate_param_gradients!`](@ref).
-
-The following methods are exported for the trainable parameters of `@gen` functions:
+It is initialized to all zeros, and is incremented by calling [`accumulate_param_gradients!`](@ref) on a trace.
+Additional functions for retrieving and manipulating the values of trainable parameters and their gradient accumulators are described in [Optimizing Trainable Parameters](@ref).
 ```@docs
-init_param!
-get_param
-get_param_grad
-set_param!
-zero_param_grad!
+register_parameters!
 ```
 
-Trainable parameters are designed to be trained using gradient-based methods.
-This is discussed in the next section.
-
 ## Differentiable programming
 
 Given a trace of a `@gen` function, Gen supports automatic differentiation of the log probability (density) of all of the random choices made in the trace with respect to the following types of inputs:
diff --git a/docs/src/ref/parameter_optimization.md b/docs/src/ref/parameter_optimization.md
@@ -1,6 +1,52 @@
 # Optimizing Trainable Parameters
 
+## Parameter stores
+
+Multiple traces of a generative function typically reference the same trainable parameters of the generative function, which are stored outside of the trace in a **parameter store**.
+Different types of generative functions may use different types of parameter stores.
+For example, the [`JuliaParameterStore`](@ref) (discussed below) stores parameters as Julia values in the memory of the Julia runtime process.
+Other types of parameter stores may store parameters in GPU memory, in a filesystem, or even remotely.
+
+When generating a trace of a generative function with [`simulate`](@ref) or [`generate`](@ref), we may pass in an optional **parameter context**, which is a `Dict` that provides information about which parameter store(s) in which to look up the value of parameters.
+A generative function obtains a reference to a specific type of parameter store by looking up its key in the parameter context.
+
+If you are just learning Gen, and are only using the built-in modeling language to write generative functions, you can ignore this complexity, because there is a [`default_julia_parameter_store`](@ref) and a default parameter context [`default_parameter_context`](@ref) that points to this default Julia parameter store that will be used if a parameter context is not provided in the call to `simulate` and `generate`.
+```@docs
+default_parameter_context
+default_julia_parameter_store
+```
+
+## Julia parameter store
+
+Parameters declared using the `@param` keyword in the built-in modeling language are stored in a type of parameter store called a [`JuliaParameterStore`](@ref).
+A generative function can obtain a reference to a `JuliaParameterStore` by looking up the key [`JULIA_PARAMETER_STORE_KEY`](@ref) in a parameter context.
+This is how the built-in modeling language implementation finds the parameter stores to use for `@param`-declared parameters.
+Note that if you are defining your own [custom generative functions](@ref #Custom-generative-functions), you can also use a [`JuliaParameterStore`](@ref) (including the same parameter store used to store parameters of built-in modeling language generative functions) to store and optimize your trainable parameters.
+
+Different types of parameter stores provide different APIs for reading, writing, and updating the values of parameters and gradient accumulators for parameters.
+The `JuliaParameterStore` API is given below.
+(Note that most user learning code only needs to use [`init_parameter!`](@ref), as the other API functions are called by [Optimizers](@ref) which are discussed below.)
+
+```@docs
+JuliaParameterStore
+init_parameter!
+increment_gradient!
+reset_gradient!
+get_parameter_value
+get_gradient
+JULIA_PARAMETER_STORE_KEY
+```
+
+### Multi-threaded gradient accumulation
+
+Note that the [`increment_gradient!`](@ref) call is thread-safe, so that multiple threads can concurrently increment the gradient for the same parameters. This is helpful for parallelizing gradient computation for a batch of traces within stochastic gradient descent learning algorithms.
+
+## Optimizers
+
+TODO
+
 Trainable parameters of generative functions are initialized differently depending on the type of generative function.
+
 Trainable parameters of the built-in modeling language are initialized with [`init_param!`](@ref).
 
 Gradient-based optimization of the trainable parameters of generative functions is based on interleaving two steps:
diff --git a/src/builtin_optimization.jl b/src/builtin_optimization.jl
diff --git a/src/dynamic/dynamic.jl b/src/dynamic/dynamic.jl
@@ -56,13 +56,14 @@ end
 """
     register_parameters!(gen_fn::DynamicDSLFunction, parameters)
 
-Register the altrainable parameters that are used by a DML generative function.
+Register the trainable parameters that used by a DML generative function.
 
-This includes all parameters used within any calls made by the generative function.
+This includes all parameters used within any calls made by the generative function, and includes any parameters that may be used by any possible trace (stochastic control flow may cause a parameter to be used by one trace but not another).
 
-There are two variants:
-
-# TODO document the variants
+The second argument is either a `Vector` or a `Function` that takes a parameter context and returns a `Dict` that maps parameter stores to `Vector`s of parameter IDs.
+When the second argument is a `Vector`, each element is either a `Symbol` that is the name of a parameter declared in the body of `gen_fn` using `@param`, or is a tuple `(other_gen_fn::GenerativeFunction, name::Symbol)` where `@param <name>` was declared in the body of `other_gen_fn`.
+The `Function` input is used when `gen_fn` uses parameters that come from more than one parameter store, including parameters that are housed in parameter stores that are not `JuliaParameterStore`s (e.g. if `gen_fn` invokes a generative function that executes in another non-Julia runtime).
+See [Optimizing Trainable Parameters](@ref) for details on parameter contexts, and parameter stores.
 """
 function register_parameters!(gen_fn::DynamicDSLFunction, parameters)
     gen_fn.parameters = parameters
diff --git a/src/optimization.jl b/src/optimization.jl
@@ -1,15 +1,14 @@
 import Parameters
 
-# we should modify the semantics of the log probability contribution to the gradient
-# so that everything is gradient descent instead of ascent. this will also fix
-# the misnomer names
+# TODO we should modify the semantics of the log probability contribution to
+# the gradient so that everything is gradient descent instead of ascent. this
+# will also fix the misnomer names
 #
 # TODO add tests specifically for JuliaParameterStore etc.
-#
-# TODO in all update and regenerate implementations, need to pass in the parameter context to inner calls to generate
 
 export in_place_add!
 
+
 export FixedStepGradientDescent
 export DecayStepGradientDescent
 export init_optimizer
@@ -22,6 +21,10 @@ export increment_gradient!
 export reset_gradient!
 export get_parameter_value
 export get_gradient
+export JULIA_PARAMETER_STORE_KEY
+
+export default_julia_parameter_store
+export default_parameter_context
 
 #################
 # in_place_add! #
@@ -155,11 +158,9 @@ end
 # TODO create diagram and document the overal framework
 # including parameter contexts and parameter stores,and the default beahviors
 
-abstract type ParameterStore end
-
 """
     optimizer = init_optimizer(
-        conf, parameter_ids,
+        conf, parameter_ids::Vector,
         store=default_julia_parameter_store)
 
 Initialize an iterative gradient-based optimizer.
@@ -187,24 +188,10 @@ function apply_update!(optimizer)
     error("Not implemented")
 end
 
-"""
-
-    optimizer = CompositeOptimizer(conf, parameter_stores_to_ids::Dict{Any,Vector})
-
-Construct an optimizer that applies the given update to parameters in multiple parameter stores.
-
-The first argument defines the mathematical behavior of the update;
-the second argument defines the set of parameters to which the update should be applied at each iteration,
-as a map from parameter stores to a vector of IDs of parameters within that parameter store.
-
-    optimizer = CompositeOptimizer(conf, gen_fn::GenerativeFunction; parameter_context=default_parameter_context)
-
-Constructs a composite optimizer that applies the given update to all parameters used by the given generative function, even when the parameters exist in multiple parameter stores.
-"""
 struct CompositeOptimizer
     conf::Any
     optimizers::Dict{Any,Any}
-    function CompositeOptimizer(conf, parameter_stores_to_ids::Dict{Any,Vector})
+    function CompositeOptimizer(conf, parameter_stores_to_ids)
         optimizers = Dict{Any,Any}()
         for (store, parameter_ids) in parameter_stores_to_ids
             optimizers[store] = init_optimizer(conf, parameter_ids, store)
@@ -218,10 +205,23 @@ function CompositeOptimizer(conf, gen_fn::GenerativeFunction; parameter_context=
 end
 
 """
-    apply_update!(composite_opt::ComposieOptimizer)
 
-Perform one step of an update, possibly mutating the values of parameters in multiple parameter stores.
+    optimizer = init_optimizer(conf, parameter_stores_to_ids::Dict{Any,Vector})
+
+Construct an optimizer that updates parameters in multiple parameter stores.
+
+The first argument configures the mathematical behavior of the update.
+The second argument defines the set of parameters to which the update should be applied at each iteration,
+The parameters are given in a map from parameter store to a vector of IDs of parameters within that parameter store.
+
+    optimizer = init_optimizer(conf, gen_fn::GenerativeFunction; parameter_context=default_parameter_context)
+
+Constructs a composite optimizer that updates all parameters used by the given generative function, even when the parameters exist in multiple parameter stores.
 """
+function init_optimizer(conf, parameter_stores_to_ids::Dict)
+    return CompositeOptimizer(conf, parameter_stores_to_ids)
+end
+
 function apply_update!(composite_opt::CompositeOptimizer)
     for opt in values(composite_opt.optimizers)
         apply_update!(opt)
@@ -247,7 +247,7 @@ Construct a parameter store stores the state of parameters in the memory of the
 
 There is a global Julia parameter store automatically created and named `Gen.default_julia_parameter_store`.
 
-Incrementing the gradients can be safely multi-threaded (see [`increment_gradient!`](@ref)).
+Gradient accumulation is thread-safe (see [`increment_gradient!`](@ref)).
 """
 function JuliaParameterStore()
     return JuliaParameterStore(
@@ -263,29 +263,45 @@ function get_local_parameters(store::JuliaParameterStore, gen_fn)
     end
 end
 
-const default_parameter_context = Dict{Symbol,Any}()
-const default_julia_parameter_store = JuliaParameterStore()
-
 # for looking up in a parameter context when tracing (simulate, generate)
 # once a trace is generated, it is bound to use a particular store
+"""
+    JULIA_PARAMETER_STORE_KEY 
+
+If a parameter context contains a value for this key, then the value is a `JuliaParameterStore`.
+"""
 const JULIA_PARAMETER_STORE_KEY = :julia_parameter_store 
 
 function get_julia_store(context::Dict)
-    if haskey(context, JULIA_PARAMETER_STORE_KEY)
-        return context[JULIA_PARAMETER_STORE_KEY]
-    else
-        return default_julia_parameter_store
-    end
+    return context[JULIA_PARAMETER_STORE_KEY]::JuliaParameterStore
 end
 
+"""
+    default_julia_parameter_store::JuliaParameterStore
+
+The default global Julia parameter store.
+"""
+const default_julia_parameter_store = JuliaParameterStore()
+
+"""
+    default_parameter_context::Dict
+
+The default global parameter context, which is initialized to contain the mapping:
+
+    JULIA_PARAMETER_STORE_KEY => Gen.default_julia_parameter_store
+"""
+const default_parameter_context = Dict{Symbol,Any}(
+    JULIA_PARAMETER_STORE_KEY => default_julia_parameter_store)
+
+
 """
     init_parameter!(
         id::Tuple{GenerativeFunction,Symbol}, value,
         store::JuliaParameterStore=default_julia_parameter_store)
 
 Initialize the the value of a named trainable parameter of a generative function.
 
-Also generates the gradient accumulator for that parameter to `zero(value)`.
+Also initializes the gradient accumulator for that parameter to `zero(value)`.
 
 Example:
 ```julia