Skip to content

Commit e9af20a

Browse files
author
cossio
committed
eps_root
1 parent 3816725 commit e9af20a

File tree

1 file changed

+10
-1
lines changed

1 file changed

+10
-1
lines changed

src/optimise/optimisers.jl

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -536,10 +536,19 @@ function apply!(o::AdaBelief, x, Δ)
536536
mt, st, βp = get!(o.state, x) do
537537
(zero(x), zero(x), Float64[β[1], β[2]])
538538
end :: Tuple{typeof(x), typeof(x), Vector{Float64}}
539+
540+
#= st is a variance and can go to zero. This is in contrast to ADAM, which uses the
541+
second moment which is usually far enough from zero. This is problematic, since st
542+
can be slightly negative due to numerical error, and the square root below will fail.
543+
Also, if we want to differentiate through the optimizer, √0 is not differentiable.
544+
To protect against this, we add a small number, st -> st + eps_root.
545+
The original implementation (https://github.com/juntang-zhuang/Adabelief-Optimizer)
546+
uses the square of Adam's epsilon, which we do here. =#
547+
eps_root = o.epsilon^2
539548

540549
@. mt = β[1] * mt + (1 - β[1]) * Δ
541550
@. st = β[2] * st + (1 - β[2]) *- mt) * conj- mt)
542-
@. Δ = η * mt / (1 - βp[1]) / ((st / (1 - βp[2])) + o.epsilon)
551+
@. Δ = η * mt / (1 - βp[1]) / (((st + eps_root) / (1 - βp[2])) + o.epsilon)
543552
βp .= βp .* β
544553

545554
return Δ

0 commit comments

Comments
 (0)