You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In practice, it is fairly common to schedule the learning rate of an optimiser to obtain faster convergence. There are a variety of popular scheduling policies, and you can find implementations of them in [ParameterSchedulers.jl](http://fluxml.ai/ParameterSchedulers.jl/stable). The documentation for ParameterSchedulers.jl provides a more detailed overview of the different scheduling policies, and how to use them with Flux optimisers. Below, we provide a brief snippet illustrating a [cosine annealing](https://arxiv.org/pdf/1608.03983.pdf) schedule with a momentum optimiser.
344
+
345
+
First, we import ParameterSchedulers.jl and initialize a cosine annealing schedule to vary the learning rate between `1e-4` and `1e-2` every 10 epochs. We also create a new [`Momentum`](@ref Optimisers.Momentum) optimiser.
346
+
```julia
347
+
using ParameterSchedulers
348
+
349
+
opt_state = Flux.setup(Momentum(), model)
350
+
schedule =Cos(λ0 =1e-4, λ1 =1e-2, period =10)
351
+
for (eta, epoch) inzip(schedule, 1:100)
352
+
Flux.adjust!(opt_state, eta)
353
+
# your training code here
354
+
end
355
+
```
356
+
`schedule` can also be indexed (e.g. `schedule(100)`) or iterated like any iterator in Julia.
357
+
358
+
ParameterSchedulers.jl schedules are stateless (they don't store their iteration state). If you want a _stateful_ schedule, you can use `ParameterSchedulers.Stateful`:
359
+
```julia
360
+
using ParameterSchedulers: Stateful, next!
361
+
362
+
schedule =Stateful(Cos(λ0 =1e-4, λ1 =1e-2, period =10))
363
+
for epoch in1:100
364
+
Flux.adjust!(opt_state, next!(schedule))
365
+
# your training code here
366
+
end
367
+
```
368
+
369
+
Finally, a scheduling function can be incorporated into the optimser's state, advanced at each gradient update step, and possibly passed to the `train!` function. See [this section](https://fluxml.ai/ParameterSchedulers.jl/stable/tutorials/optimizers/#Working-with-Flux-optimizers) of ParameterSchedulers.jl documentation for more details.
370
+
371
+
ParameterSchedulers.jl allows for many more scheduling policies including arbitrary functions, looping any function with a given period, or sequences of many schedules. See the [ParameterSchedulers.jl documentation](https://fluxml.ai/ParameterSchedulers.jl/stable) for more info.
372
+
340
373
## Freezing layer parameters
341
374
342
375
To completely disable training of some part of the model, use [`freeze!`](@ref Flux.freeze!).
Copy file name to clipboardExpand all lines: docs/src/reference/training/optimisers.md
+1-31Lines changed: 1 addition & 31 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -67,36 +67,6 @@ It is possible to compose optimisers for some added flexibility.
67
67
Optimisers.OptimiserChain
68
68
```
69
69
70
-
## Scheduling Optimisers
71
-
72
-
In practice, it is fairly common to schedule the learning rate of an optimiser to obtain faster convergence. There are a variety of popular scheduling policies, and you can find implementations of them in [ParameterSchedulers.jl](http://fluxml.ai/ParameterSchedulers.jl/stable). The documentation for ParameterSchedulers.jl provides a more detailed overview of the different scheduling policies, and how to use them with Flux optimisers. Below, we provide a brief snippet illustrating a [cosine annealing](https://arxiv.org/pdf/1608.03983.pdf) schedule with a momentum optimiser.
73
-
74
-
First, we import ParameterSchedulers.jl and initialize a cosine annealing schedule to vary the learning rate between `1e-4` and `1e-2` every 10 steps. We also create a new [`Momentum`](@ref Optimisers.Momentum) optimiser.
75
-
```julia
76
-
using ParameterSchedulers
77
-
78
-
opt =Momentum()
79
-
schedule =Cos(λ0 =1e-4, λ1 =1e-2, period =10)
80
-
for (eta, epoch) inzip(schedule, 1:100)
81
-
opt.eta = eta
82
-
# your training code here
83
-
end
84
-
```
85
-
`schedule` can also be indexed (e.g. `schedule(100)`) or iterated like any iterator in Julia.
86
-
87
-
ParameterSchedulers.jl schedules are stateless (they don't store their iteration state). If you want a _stateful_ schedule, you can use `ParameterSchedulers.Stateful`:
88
-
```julia
89
-
using ParameterSchedulers: Stateful, next!
90
-
91
-
schedule =Stateful(Cos(λ0 =1e-4, λ1 =1e-2, period =10))
92
-
for epoch in1:100
93
-
opt.eta =next!(schedule)
94
-
# your training code here
95
-
end
96
-
```
97
-
98
-
ParameterSchedulers.jl allows for many more scheduling policies including arbitrary functions, looping any function with a given period, or sequences of many schedules. See the ParameterSchedulers.jl documentation for more info.
99
-
100
70
## Decays
101
71
102
72
Similar to optimisers, Flux also defines some simple decays that can be used in conjunction with other optimisers, or standalone.
@@ -111,7 +81,7 @@ Optimisers.WeightDecay
111
81
Gradient clipping is useful for training recurrent neural networks, which have a tendency to suffer from the exploding gradient problem. An example usage is
0 commit comments