@@ -16,7 +16,7 @@ For each parameter `p` and its gradient `dp`, this runs `p -= η*dp`.
16
16
- Learning rate (`η`): Amount by which gradients are discounted before updating
17
17
the weights.
18
18
"""
19
- struct Descent{T}
19
+ struct Descent{T} <: AbstractRule
20
20
eta:: T
21
21
end
22
22
Descent () = Descent (1f-1 )
@@ -40,7 +40,7 @@ Gradient descent optimizer with learning rate `η` and momentum `ρ`.
40
40
- Momentum (`ρ`): Controls the acceleration of gradient descent in the
41
41
prominent direction, in effect dampening oscillations.
42
42
"""
43
- struct Momentum{T}
43
+ struct Momentum{T} <: AbstractRule
44
44
eta:: T
45
45
rho:: T
46
46
end
@@ -66,7 +66,7 @@ Gradient descent optimizer with learning rate `η` and Nesterov momentum `ρ`.
66
66
- Nesterov momentum (`ρ`): Controls the acceleration of gradient descent in the
67
67
prominent direction, in effect dampening oscillations.
68
68
"""
69
- struct Nesterov{T}
69
+ struct Nesterov{T} <: AbstractRule
70
70
eta:: T
71
71
rho:: T
72
72
end
@@ -104,7 +104,7 @@ gradients by an estimate their variance, instead of their second moment.
104
104
- Keyword `centred` (or `centered`): Indicates whether to use centred variant
105
105
of the algorithm.
106
106
"""
107
- struct RMSProp{T}
107
+ struct RMSProp{T} <: AbstractRule
108
108
eta:: T
109
109
rho:: T
110
110
epsilon:: T
148
148
- Machine epsilon (`ϵ`): Constant to prevent division by zero
149
149
(no need to change default)
150
150
"""
151
- struct Adam{T}
151
+ struct Adam{T} <: AbstractRule
152
152
eta:: T
153
153
beta:: Tuple{T, T}
154
154
epsilon:: T
183
183
- Machine epsilon (`ϵ`): Constant to prevent division by zero
184
184
(no need to change default)
185
185
"""
186
- struct RAdam{T}
186
+ struct RAdam{T} <: AbstractRule
187
187
eta:: T
188
188
beta:: Tuple{T, T}
189
189
epsilon:: T
224
224
- Machine epsilon (`ϵ`): Constant to prevent division by zero
225
225
(no need to change default)
226
226
"""
227
- struct AdaMax{T}
227
+ struct AdaMax{T} <: AbstractRule
228
228
eta:: T
229
229
beta:: Tuple{T, T}
230
230
epsilon:: T
@@ -258,7 +258,7 @@ is a variant of Adam adding an "optimistic" term suitable for adversarial traini
258
258
- Machine epsilon (`ϵ`): Constant to prevent division by zero
259
259
(no need to change default)
260
260
"""
261
- struct OAdam{T}
261
+ struct OAdam{T} <: AbstractRule
262
262
eta:: T
263
263
beta:: Tuple{T, T}
264
264
epsilon:: T
@@ -293,7 +293,7 @@ Parameters don't need tuning.
293
293
- Machine epsilon (`ϵ`): Constant to prevent division by zero
294
294
(no need to change default)
295
295
"""
296
- struct AdaGrad{T}
296
+ struct AdaGrad{T} <: AbstractRule
297
297
eta:: T
298
298
epsilon:: T
299
299
end
@@ -323,7 +323,7 @@ Parameters don't need tuning.
323
323
- Machine epsilon (`ϵ`): Constant to prevent division by zero
324
324
(no need to change default)
325
325
"""
326
- struct AdaDelta{T}
326
+ struct AdaDelta{T} <: AbstractRule
327
327
rho:: T
328
328
epsilon:: T
329
329
end
@@ -357,7 +357,7 @@ optimiser. Parameters don't need tuning.
357
357
- Machine epsilon (`ϵ`): Constant to prevent division by zero
358
358
(no need to change default)
359
359
"""
360
- struct AMSGrad{T}
360
+ struct AMSGrad{T} <: AbstractRule
361
361
eta:: T
362
362
beta:: Tuple{T, T}
363
363
epsilon:: T
@@ -393,7 +393,7 @@ Parameters don't need tuning.
393
393
- Machine epsilon (`ϵ`): Constant to prevent division by zero
394
394
(no need to change default)
395
395
"""
396
- struct NAdam{T}
396
+ struct NAdam{T} <: AbstractRule
397
397
eta:: T
398
398
beta:: Tuple{T, T}
399
399
epsilon:: T
@@ -447,7 +447,7 @@ Adam optimiser.
447
447
- Machine epsilon (`ϵ::Float32`): Constant to prevent division by zero
448
448
(no need to change default)
449
449
"""
450
- struct AdaBelief{T}
450
+ struct AdaBelief{T} <: AbstractRule
451
451
eta:: T
452
452
beta:: Tuple{T, T}
453
453
epsilon:: T
@@ -479,7 +479,7 @@ This is equivalent to adding ``L_2`` regularization with coefficient ``γ`` to t
479
479
# Parameters
480
480
- Weight decay (`γ`): Decay applied to weights during optimisation.
481
481
"""
482
- struct WeightDecay{T}
482
+ struct WeightDecay{T} <: AbstractRule
483
483
gamma:: T
484
484
end
485
485
WeightDecay () = WeightDecay (5f-4 )
@@ -499,7 +499,7 @@ Restricts every gradient component to obey `-δ ≤ dx[i] ≤ δ`.
499
499
500
500
See also [`ClipNorm`](@ref).
501
501
"""
502
- struct ClipGrad{T<: Real }
502
+ struct ClipGrad{T<: Real } <: AbstractRule
503
503
delta:: T
504
504
end
505
505
ClipGrad () = ClipGrad (10f0 )
@@ -524,7 +524,7 @@ which you can turn off with `throw = false`.
524
524
525
525
See also [`ClipGrad`](@ref).
526
526
"""
527
- struct ClipNorm{T<: Real }
527
+ struct ClipNorm{T<: Real } <: AbstractRule
528
528
omega:: T
529
529
p:: T
530
530
throw:: Bool
@@ -566,7 +566,7 @@ julia> Optimisers.update(s, m, ([0.3, 1, 7],))[2] # clips before discounting
566
566
([-0.03, -0.1, -0.1],)
567
567
```
568
568
"""
569
- struct OptimiserChain{O<: Tuple }
569
+ struct OptimiserChain{O<: Tuple } <: AbstractRule
570
570
opts:: O
571
571
end
572
572
OptimiserChain (opts... ) = OptimiserChain (opts)
0 commit comments