Skip to content

Commit a34cc85

Browse files
author
cossio
committed
eps_root following comment
1 parent e9af20a commit a34cc85

File tree

1 file changed

+6
-5
lines changed

1 file changed

+6
-5
lines changed

src/optimise/optimisers.jl

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -541,14 +541,15 @@ function apply!(o::AdaBelief, x, Δ)
541541
second moment which is usually far enough from zero. This is problematic, since st
542542
can be slightly negative due to numerical error, and the square root below will fail.
543543
Also, if we want to differentiate through the optimizer, √0 is not differentiable.
544-
To protect against this, we add a small number, st -> st + eps_root.
544+
To protect against this, we add a small number, st -> st + eps2.
545545
The original implementation (https://github.com/juntang-zhuang/Adabelief-Optimizer)
546-
uses the square of Adam's epsilon, which we do here. =#
547-
eps_root = o.epsilon^2
546+
uses the square of Adam's epsilon, which we do here.
547+
See also: https://github.com/juntang-zhuang/Adabelief-Optimizer/issues/61 =#
548+
eps2 = o.epsilon^2
548549

549550
@. mt = β[1] * mt + (1 - β[1]) * Δ
550-
@. st = β[2] * st + (1 - β[2]) *- mt) * conj- mt)
551-
@. Δ = η * mt / (1 - βp[1]) / (((st + eps_root) / (1 - βp[2])) + o.epsilon)
551+
@. st = β[2] * st + (1 - β[2]) *- mt) * conj- mt) + eps2
552+
@. Δ = η * mt / (1 - βp[1]) / ((st / (1 - βp[2])) + eps2)
552553
βp .= βp .* β
553554

554555
return Δ

0 commit comments

Comments
 (0)