Conversation
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
|
@devmotion can you fix the merge clash before I review this PR? |
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
yebai
left a comment
There was a problem hiding this comment.
Thanks @devmotion -- I left a few questions below.
I'll have to take another closer look later this week.
| end | ||
|
|
||
| computeμ(ϵ::AbstractScalarOrVec{<:AbstractFloat}) = log.(10 * ϵ) | ||
| computeμ(ϵ::AbstractFloat) = log(10 * ϵ) |
There was a problem hiding this comment.
Caution is required here: these support the vectorised version of HMC. Do you know how map would differ from broadcasting here?
There was a problem hiding this comment.
The results of the calculations won't be affected by this change, but using the non-broadcasted formulation for scalars and map for vectors of floats will remove the broadcasting overhead and reduce stress on the compiler, i.e., generally it reduces compilation time. Sometimes it also helps type inference (but this case is too simple for this effect I assume).
In my experience, broadcasting is useful if one's actually broadcasting values of different size and dimensions but otherwise often a suboptimal choice.
| function finalize!(da::NesterovDualAveraging) | ||
| da.state.ϵ = exp.(da.state.x_bar) | ||
| return nothing | ||
| finalize!(da.state) |
|
|
||
| η_H = one(T) / (m + t_0) | ||
| H_bar = (one(T) - η_H) * H_bar .+ η_H * (δ .- α) | ||
| H_bar = (one(T) - η_H) .* H_bar .+ η_H .* (δ .- min.(one(T), α)) |
There was a problem hiding this comment.
HG: I'll have to review these more carefully later this week.
EDIT: This looks good. I am surprised the previous code didn't break any tests, as it didn't properly support vectorised adaption.
yebai
left a comment
There was a problem hiding this comment.
Thanks, @devmotion. Nice improvements. I left a few comments below, mostly about whether we should refactor the vectorised HMC implementation in a concerted effort separately to avoid inconsistency.
| function DAState(ϵ::AbstractVector{T}) where {T} | ||
| n = length(ϵ) | ||
| μ = computeμ(ϵ) | ||
| μ = map(computeμ, ϵ) |
There was a problem hiding this comment.
| μ = map(computeμ, ϵ) | |
| μ = computeμ(ϵ) |
| das.μ .= computeμ(das.ϵ) | ||
| das.x_bar .= zero(T) | ||
| return das.H_bar .= zero(T) | ||
| map!(computeμ, das.μ, das.ϵ) |
There was a problem hiding this comment.
Let's keep this as-is for now. We could refactor the vectorised HMC interface, but better to do it seprately in a concerted effort:
| map!(computeμ, das.μ, das.ϵ) | |
| das.μ .= computeμ(das.ϵ) |
There was a problem hiding this comment.
This suggestion would go against the main intention of the PR, reducing unnecessary allocations: With map! (or das.μ .= computeμ.(das.ϵ), but the broadcasting is more stressful for the compiler) no intermediate array would be created in this line, whereas with the suggestion on the right-hand side a new array is allocated that is then copied to das.μ (as a side remark, for the compiler copyto! should be simpler than broadcasting).
There was a problem hiding this comment.
Fair point!
I recently discovered the AcceleratedKernels package, which provides a unified interface for parallelisation on CPUs, clusters, and GPUs. We could consider switching to AcceleratedKernels.map! for the vectorised HMC implementation, thus the above suggestion.
EDIT: I opened an issue for this suggestion. #412
|
Feel free to merge once CI passes! |
No description provided.