Add doc section about destructure (#82)

mcabbott · ToucheSir · web-flow · commit acd5ca3fbbca · 2022-06-02T23:26:29.000-04:00
* Add doc section about destructure

* more links

* Update docs/src/index.md

Co-authored-by: Brian Chen &lt;ToucheSir@users.noreply.github.com&gt;

* shorten Lux example

Co-authored-by: Brian Chen &lt;ToucheSir@users.noreply.github.com&gt;
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -1,8 +1,9 @@
 # Optimisers.jl
 
-## Defining an Optimiser
+## Defining an optimisation rule
 
-A new optimiser must overload two functions, `apply!` and `init`:
+A new optimiser must overload two functions, [`apply!`](@ref) and [`init`](@ref).
+These act on one array of parameters:
 
 ```julia
 # Define a container to hold any optimiser specific parameters (if any):
@@ -27,13 +28,12 @@ caried to the next iteration.
 
 Notice that the state is handled separately from the optimiser itself. This
 is a key design principle and allows users to manage their own state explicitly.
-
 It of course also makes it easier to store the state.
 
 ## Usage with [Flux.jl](https://github.com/FluxML/Flux.jl)
 
-To apply such an optimiser to a whole model, `setup` builds a tree containing any initial
-state for every trainable array. Then at each step, `update` uses this and the gradient
+To apply such an optimiser to a whole model, [`setup`](@ref) builds a tree containing any initial
+state for every trainable array. Then at each step, [`update`](@ref) uses this and the gradient
 to adjust the model:
 
 ```julia
@@ -67,7 +67,7 @@ This `∇model` is another tree structure, rather than the dictionary-like objec
 Zygote's "implicit" mode `gradient(() -> loss(...), Flux.params(model))` -- see 
 [Zygote's documentation](https://fluxml.ai/Zygote.jl/dev/#Explicit-and-Implicit-Parameters-1) for more about this difference.
 
-There is also `Optimisers.update!` which similarly returns a new model and new state,
+There is also [`Optimisers.update!`](@ref) which similarly returns a new model and new state,
 but is free to mutate arrays within the old one for efficiency.
 The method of `apply!` you write is likewise free to mutate arrays within its state;
 they are defensively copied when this rule is used with `update`.
@@ -110,3 +110,56 @@ Besides the parameters stored in `params` and gradually optimised, any other mod
 is stored in `lux_state`. For simplicity this example does not show how to propagate the 
 updated `lux_state` to the next iteration, see Lux's documentation.
 
+## Obtaining a flat parameter vector
+
+Instead of a nested tree-like structure, sometimes is is convenient to have all the
+parameters as one simple vector. Optimisers.jl contains a function [`destructure`](@ref)
+which creates this vector, and also creates way to re-build the original structure
+with new parameters. Both flattening and re-building may be used within `gradient` calls.
+
+An example with Flux's `model`:
+
+```julia
+using ForwardDiff  # an example of a package which only likes one array
+
+model = Chain(  # much smaller model example, as ForwardDiff is a slow algorithm here
+          Conv((3, 3), 3 => 5, pad=1, bias=false), 
+          BatchNorm(5, relu), 
+          Conv((3, 3), 5 => 3, stride=16),
+        )
+image = rand(Float32, 224, 224, 3, 1);
+@show sum(model(image));
+
+flat, re = destructure(model)
+st = Optimisers.setup(rule, flat)  # state is just one Leaf now
+
+∇flat = ForwardDiff.gradient(flat) do v
+  m = re(v)      # rebuild a new object like model
+  sum(m(image))  # call that as before
+end
+
+st, flat = Optimisers.update(st, flat, ∇flat)
+@show sum(re(flat)(image));
+```
+
+Here `flat` contains only the 283 trainable parameters, while the non-trainable
+ones are preserved inside `re`.
+When defining new layers, these can be specified if necessary by overloading [`trainable`](@ref).
+By default, all numeric arrays visible to [Functors.jl](https://github.com/FluxML/Functors.jl)
+are assumed to contain trainable parameters.
+
+Lux stores only the trainable parameters in `params`.
+This can also be flattened to a plain `Vector` in the same way:
+
+```julia
+params, lux_state = Lux.setup(Random.default_rng(), lux_model);
+
+flat, re = destructure(params)
+
+∇flat = ForwardDiff.gradient(flat) do v
+  p = re(v)  # rebuild an object like params
+  y, _ = Lux.apply(lux_model, images, p, lux_state)
+  sum(y)
+end
+```
+