@@ -23,54 +23,6 @@ The API for defining an optimiser, and using it is simple.
23
23
]add Optimisers
24
24
```
25
25
26
- ## Define an Optimiser
27
-
28
- ``` julia
29
- # Define a container to hold any optimiser specific parameters (if any)
30
- struct Descent{T}
31
- η:: T
32
- end
33
-
34
- # Define an `apply` rule with which to update the current params
35
- # using the gradients
36
- function Optimisers. apply (o:: Descent , state, m, m̄)
37
- o. η .* m̄, state
38
- end
39
-
40
- Optimisers. init (o, x:: AbstractArray ) = nothing
41
- ```
42
-
43
- Notice that the state is handled separately from the optimiser itself. This
44
- is a key design principle and allows users to manage their own state explicitly.
45
-
46
- It of course also makes it easier to store the state.
47
-
48
26
## Usage
49
27
50
- ``` julia
51
-
52
- using Flux, Metalhead, Optimisers
53
-
54
- o = Optimisers. ADAM () # define an ADAM optimiser with default settings
55
- st = Optimisers. state (o, m) # initialize the optimiser before using it
56
-
57
- model = ResNet () # define a model to train on
58
- ip = rand (Float32, 224 , 224 , 3 , 1 ) # dummy data
59
-
60
- m̄, _ = gradient (model, ip) do m, x # calculate the gradients
61
- sum (m (x))
62
- end
63
-
64
-
65
- st, mnew = Optimisers. update (o, st, m, m̄)
66
-
67
- # or
68
-
69
- st, mnew = o (m, m̄, st)
70
- ```
71
-
72
- Notice that a completely new instance of the model is returned. Internally, this
73
- is handled by [ Functors.jl] ( https://fluxml.ai/Functors.jl ) , where we do a walk over the
74
- tree formed by the model and update the parameters using the gradients. Optimisers can
75
- work with different forms of gradients, but most likely use case are the gradients as
76
- returned by [ Zygote.jl] ( https://fluxml.ai/Zygote.jl ) .
28
+ Find out more about using Optimisers.jl [ in the docs] ( https://fluxml.ai/Optimisers.jl/dev/ ) .
0 commit comments