|
20 | 20 | Optimisers.jl defines many standard gradient-based optimisation rules, and tools for applying them to deeply nested models.
|
21 | 21 |
|
22 | 22 | This is the future of training for [Flux.jl](https://github.com/FluxML/Flux.jl) neural networks,
|
23 |
| -but it can be used separately on anything understood by [Functors.jl](https://github.com/FluxML/Functors.jl). |
| 23 | +and the present for [Lux.jl](https://github.com/avik-pal/Lux.jl). |
| 24 | +But it can be used separately on anything understood by [Functors.jl](https://github.com/FluxML/Functors.jl). |
24 | 25 |
|
25 | 26 | ## Installation
|
26 | 27 |
|
27 | 28 | ```julia
|
28 |
| -]add Optimisers |
| 29 | +] add Optimisers |
29 | 30 | ```
|
30 | 31 |
|
31 | 32 | ## Usage
|
32 | 33 |
|
33 |
| -Find out more about using Optimisers.jl [in the docs](https://fluxml.ai/Optimisers.jl/dev/). |
| 34 | +The core idea is that optimiser state (such as momentum) is explicitly handled. |
| 35 | +It is initialised by `setup`, and then at each step, `update` returns both the new |
| 36 | +state, and the model with its trainable parameters adjusted: |
| 37 | + |
| 38 | +```julia |
| 39 | +state = Optimisers.setup(Optimisers.ADAM(), model) # just once |
| 40 | + |
| 41 | +state, model = Optimisers.update(state, model, grad) # at every step |
| 42 | +``` |
| 43 | + |
| 44 | +For models with deeply nested layers containing the parameters (like [Flux.jl](https://github.com/FluxML/Flux.jl) models), |
| 45 | +this state is a similarly nested tree. |
| 46 | +The function `destructure` collects all the trainable parameters into one vector, |
| 47 | +and returns this along with a function to re-build a similar model: |
| 48 | + |
| 49 | +```julia |
| 50 | +vector, re = Optimisers.destructure(model) |
| 51 | + |
| 52 | +model2 = re(2 .* vector) |
| 53 | +``` |
| 54 | + |
| 55 | +[The documentation](https://fluxml.ai/Optimisers.jl/dev/) explains usage in more detail, |
| 56 | +describes all the optimization rules, and shows how to define new ones. |
0 commit comments