Skip to content

Commit b173841

Browse files
committed
checkpoint on documentation
1 parent 3fc5a0e commit b173841

File tree

5 files changed

+115
-52
lines changed

5 files changed

+115
-52
lines changed

docs/src/ref/modeling.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -254,6 +254,7 @@ See [Generative Function Interface](@ref) for more information about traces.
254254

255255
A `@gen` function may begin with an optional block of *trainable parameter declarations*.
256256
The block consists of a sequence of statements, beginning with `@param`, that declare the name and Julia type for each trainable parameter.
257+
The Julia type must be either a subtype of `Real` or subtype of `Array{<:Real}`.
257258
The function below has a single trainable parameter `theta` with type `Float64`:
258259
```julia
259260
@gen function foo(prob::Float64)
@@ -264,23 +265,22 @@ The function below has a single trainable parameter `theta` with type `Float64`:
264265
end
265266
```
266267
Trainable parameters obey the same scoping rules as Julia local variables defined at the beginning of the function body.
267-
The value of a trainable parameter is undefined until it is initialized using [`init_param!`](@ref).
268+
After the definition of the generative function, you must register all of the parameters used by the generative function using [`register_parameters!`](@ref) (this is not required if you instead use the [Static Modeling Language](@ref)):
269+
```julia
270+
register_parameters!(foo, [:theta])
271+
```
272+
The value of a trainable parameter is undefined until it is initialized using [`init_parameter!`](@ref):
273+
```julia
274+
init_parameter!((foo, :theta), 0.0)
275+
```
268276
In addition to the current value, each trainable parameter has a current **gradient accumulator** value.
269277
The gradient accumulator value has the same shape (e.g. array dimension) as the parameter value.
270-
It is initialized to all zeros, and is incremented by [`accumulate_param_gradients!`](@ref).
271-
272-
The following methods are exported for the trainable parameters of `@gen` functions:
278+
It is initialized to all zeros, and is incremented by calling [`accumulate_param_gradients!`](@ref) on a trace.
279+
Additional functions for retrieving and manipulating the values of trainable parameters and their gradient accumulators are described in [Optimizing Trainable Parameters](@ref).
273280
```@docs
274-
init_param!
275-
get_param
276-
get_param_grad
277-
set_param!
278-
zero_param_grad!
281+
register_parameters!
279282
```
280283

281-
Trainable parameters are designed to be trained using gradient-based methods.
282-
This is discussed in the next section.
283-
284284
## Differentiable programming
285285

286286
Given a trace of a `@gen` function, Gen supports automatic differentiation of the log probability (density) of all of the random choices made in the trace with respect to the following types of inputs:

docs/src/ref/parameter_optimization.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,52 @@
11
# Optimizing Trainable Parameters
22

3+
## Parameter stores
4+
5+
Multiple traces of a generative function typically reference the same trainable parameters of the generative function, which are stored outside of the trace in a **parameter store**.
6+
Different types of generative functions may use different types of parameter stores.
7+
For example, the [`JuliaParameterStore`](@ref) (discussed below) stores parameters as Julia values in the memory of the Julia runtime process.
8+
Other types of parameter stores may store parameters in GPU memory, in a filesystem, or even remotely.
9+
10+
When generating a trace of a generative function with [`simulate`](@ref) or [`generate`](@ref), we may pass in an optional **parameter context**, which is a `Dict` that provides information about which parameter store(s) in which to look up the value of parameters.
11+
A generative function obtains a reference to a specific type of parameter store by looking up its key in the parameter context.
12+
13+
If you are just learning Gen, and are only using the built-in modeling language to write generative functions, you can ignore this complexity, because there is a [`default_julia_parameter_store`](@ref) and a default parameter context [`default_parameter_context`](@ref) that points to this default Julia parameter store that will be used if a parameter context is not provided in the call to `simulate` and `generate`.
14+
```@docs
15+
default_parameter_context
16+
default_julia_parameter_store
17+
```
18+
19+
## Julia parameter store
20+
21+
Parameters declared using the `@param` keyword in the built-in modeling language are stored in a type of parameter store called a [`JuliaParameterStore`](@ref).
22+
A generative function can obtain a reference to a `JuliaParameterStore` by looking up the key [`JULIA_PARAMETER_STORE_KEY`](@ref) in a parameter context.
23+
This is how the built-in modeling language implementation finds the parameter stores to use for `@param`-declared parameters.
24+
Note that if you are defining your own [custom generative functions](@ref #Custom-generative-functions), you can also use a [`JuliaParameterStore`](@ref) (including the same parameter store used to store parameters of built-in modeling language generative functions) to store and optimize your trainable parameters.
25+
26+
Different types of parameter stores provide different APIs for reading, writing, and updating the values of parameters and gradient accumulators for parameters.
27+
The `JuliaParameterStore` API is given below.
28+
(Note that most user learning code only needs to use [`init_parameter!`](@ref), as the other API functions are called by [Optimizers](@ref) which are discussed below.)
29+
30+
```@docs
31+
JuliaParameterStore
32+
init_parameter!
33+
increment_gradient!
34+
reset_gradient!
35+
get_parameter_value
36+
get_gradient
37+
JULIA_PARAMETER_STORE_KEY
38+
```
39+
40+
### Multi-threaded gradient accumulation
41+
42+
Note that the [`increment_gradient!`](@ref) call is thread-safe, so that multiple threads can concurrently increment the gradient for the same parameters. This is helpful for parallelizing gradient computation for a batch of traces within stochastic gradient descent learning algorithms.
43+
44+
## Optimizers
45+
46+
TODO
47+
348
Trainable parameters of generative functions are initialized differently depending on the type of generative function.
49+
450
Trainable parameters of the built-in modeling language are initialized with [`init_param!`](@ref).
551

652
Gradient-based optimization of the trainable parameters of generative functions is based on interleaving two steps:

src/builtin_optimization.jl

Whitespace-only changes.

src/dynamic/dynamic.jl

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -56,13 +56,14 @@ end
5656
"""
5757
register_parameters!(gen_fn::DynamicDSLFunction, parameters)
5858
59-
Register the altrainable parameters that are used by a DML generative function.
59+
Register the trainable parameters that used by a DML generative function.
6060
61-
This includes all parameters used within any calls made by the generative function.
61+
This includes all parameters used within any calls made by the generative function, and includes any parameters that may be used by any possible trace (stochastic control flow may cause a parameter to be used by one trace but not another).
6262
63-
There are two variants:
64-
65-
# TODO document the variants
63+
The second argument is either a `Vector` or a `Function` that takes a parameter context and returns a `Dict` that maps parameter stores to `Vector`s of parameter IDs.
64+
When the second argument is a `Vector`, each element is either a `Symbol` that is the name of a parameter declared in the body of `gen_fn` using `@param`, or is a tuple `(other_gen_fn::GenerativeFunction, name::Symbol)` where `@param <name>` was declared in the body of `other_gen_fn`.
65+
The `Function` input is used when `gen_fn` uses parameters that come from more than one parameter store, including parameters that are housed in parameter stores that are not `JuliaParameterStore`s (e.g. if `gen_fn` invokes a generative function that executes in another non-Julia runtime).
66+
See [Optimizing Trainable Parameters](@ref) for details on parameter contexts, and parameter stores.
6667
"""
6768
function register_parameters!(gen_fn::DynamicDSLFunction, parameters)
6869
gen_fn.parameters = parameters

src/optimization.jl

Lines changed: 51 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,14 @@
11
import Parameters
22

3-
# we should modify the semantics of the log probability contribution to the gradient
4-
# so that everything is gradient descent instead of ascent. this will also fix
5-
# the misnomer names
3+
# TODO we should modify the semantics of the log probability contribution to
4+
# the gradient so that everything is gradient descent instead of ascent. this
5+
# will also fix the misnomer names
66
#
77
# TODO add tests specifically for JuliaParameterStore etc.
8-
#
9-
# TODO in all update and regenerate implementations, need to pass in the parameter context to inner calls to generate
108

119
export in_place_add!
1210

11+
1312
export FixedStepGradientDescent
1413
export DecayStepGradientDescent
1514
export init_optimizer
@@ -22,6 +21,10 @@ export increment_gradient!
2221
export reset_gradient!
2322
export get_parameter_value
2423
export get_gradient
24+
export JULIA_PARAMETER_STORE_KEY
25+
26+
export default_julia_parameter_store
27+
export default_parameter_context
2528

2629
#################
2730
# in_place_add! #
@@ -155,11 +158,9 @@ end
155158
# TODO create diagram and document the overal framework
156159
# including parameter contexts and parameter stores,and the default beahviors
157160

158-
abstract type ParameterStore end
159-
160161
"""
161162
optimizer = init_optimizer(
162-
conf, parameter_ids,
163+
conf, parameter_ids::Vector,
163164
store=default_julia_parameter_store)
164165
165166
Initialize an iterative gradient-based optimizer.
@@ -187,24 +188,10 @@ function apply_update!(optimizer)
187188
error("Not implemented")
188189
end
189190

190-
"""
191-
192-
optimizer = CompositeOptimizer(conf, parameter_stores_to_ids::Dict{Any,Vector})
193-
194-
Construct an optimizer that applies the given update to parameters in multiple parameter stores.
195-
196-
The first argument defines the mathematical behavior of the update;
197-
the second argument defines the set of parameters to which the update should be applied at each iteration,
198-
as a map from parameter stores to a vector of IDs of parameters within that parameter store.
199-
200-
optimizer = CompositeOptimizer(conf, gen_fn::GenerativeFunction; parameter_context=default_parameter_context)
201-
202-
Constructs a composite optimizer that applies the given update to all parameters used by the given generative function, even when the parameters exist in multiple parameter stores.
203-
"""
204191
struct CompositeOptimizer
205192
conf::Any
206193
optimizers::Dict{Any,Any}
207-
function CompositeOptimizer(conf, parameter_stores_to_ids::Dict{Any,Vector})
194+
function CompositeOptimizer(conf, parameter_stores_to_ids)
208195
optimizers = Dict{Any,Any}()
209196
for (store, parameter_ids) in parameter_stores_to_ids
210197
optimizers[store] = init_optimizer(conf, parameter_ids, store)
@@ -218,10 +205,23 @@ function CompositeOptimizer(conf, gen_fn::GenerativeFunction; parameter_context=
218205
end
219206

220207
"""
221-
apply_update!(composite_opt::ComposieOptimizer)
222208
223-
Perform one step of an update, possibly mutating the values of parameters in multiple parameter stores.
209+
optimizer = init_optimizer(conf, parameter_stores_to_ids::Dict{Any,Vector})
210+
211+
Construct an optimizer that updates parameters in multiple parameter stores.
212+
213+
The first argument configures the mathematical behavior of the update.
214+
The second argument defines the set of parameters to which the update should be applied at each iteration,
215+
The parameters are given in a map from parameter store to a vector of IDs of parameters within that parameter store.
216+
217+
optimizer = init_optimizer(conf, gen_fn::GenerativeFunction; parameter_context=default_parameter_context)
218+
219+
Constructs a composite optimizer that updates all parameters used by the given generative function, even when the parameters exist in multiple parameter stores.
224220
"""
221+
function init_optimizer(conf, parameter_stores_to_ids::Dict)
222+
return CompositeOptimizer(conf, parameter_stores_to_ids)
223+
end
224+
225225
function apply_update!(composite_opt::CompositeOptimizer)
226226
for opt in values(composite_opt.optimizers)
227227
apply_update!(opt)
@@ -247,7 +247,7 @@ Construct a parameter store stores the state of parameters in the memory of the
247247
248248
There is a global Julia parameter store automatically created and named `Gen.default_julia_parameter_store`.
249249
250-
Incrementing the gradients can be safely multi-threaded (see [`increment_gradient!`](@ref)).
250+
Gradient accumulation is thread-safe (see [`increment_gradient!`](@ref)).
251251
"""
252252
function JuliaParameterStore()
253253
return JuliaParameterStore(
@@ -263,29 +263,45 @@ function get_local_parameters(store::JuliaParameterStore, gen_fn)
263263
end
264264
end
265265

266-
const default_parameter_context = Dict{Symbol,Any}()
267-
const default_julia_parameter_store = JuliaParameterStore()
268-
269266
# for looking up in a parameter context when tracing (simulate, generate)
270267
# once a trace is generated, it is bound to use a particular store
268+
"""
269+
JULIA_PARAMETER_STORE_KEY
270+
271+
If a parameter context contains a value for this key, then the value is a `JuliaParameterStore`.
272+
"""
271273
const JULIA_PARAMETER_STORE_KEY = :julia_parameter_store
272274

273275
function get_julia_store(context::Dict)
274-
if haskey(context, JULIA_PARAMETER_STORE_KEY)
275-
return context[JULIA_PARAMETER_STORE_KEY]
276-
else
277-
return default_julia_parameter_store
278-
end
276+
return context[JULIA_PARAMETER_STORE_KEY]::JuliaParameterStore
279277
end
280278

279+
"""
280+
default_julia_parameter_store::JuliaParameterStore
281+
282+
The default global Julia parameter store.
283+
"""
284+
const default_julia_parameter_store = JuliaParameterStore()
285+
286+
"""
287+
default_parameter_context::Dict
288+
289+
The default global parameter context, which is initialized to contain the mapping:
290+
291+
JULIA_PARAMETER_STORE_KEY => Gen.default_julia_parameter_store
292+
"""
293+
const default_parameter_context = Dict{Symbol,Any}(
294+
JULIA_PARAMETER_STORE_KEY => default_julia_parameter_store)
295+
296+
281297
"""
282298
init_parameter!(
283299
id::Tuple{GenerativeFunction,Symbol}, value,
284300
store::JuliaParameterStore=default_julia_parameter_store)
285301
286302
Initialize the the value of a named trainable parameter of a generative function.
287303
288-
Also generates the gradient accumulator for that parameter to `zero(value)`.
304+
Also initializes the gradient accumulator for that parameter to `zero(value)`.
289305
290306
Example:
291307
```julia

0 commit comments

Comments
 (0)