Skip to content

Reduce allocations in stepsize.jl#390

Merged
yebai merged 8 commits intomainfrom
dw/adaptation_stepsize
Mar 27, 2025
Merged

Reduce allocations in stepsize.jl#390
yebai merged 8 commits intomainfrom
dw/adaptation_stepsize

Conversation

@devmotion
Copy link
Member

No description provided.

devmotion and others added 4 commits February 18, 2025 23:15
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@yebai
Copy link
Member

yebai commented Mar 17, 2025

@devmotion can you fix the merge clash before I review this PR?

@yebai yebai self-requested a review March 17, 2025 12:07
devmotion and others added 2 commits March 17, 2025 14:23
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Copy link
Member

@yebai yebai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @devmotion -- I left a few questions below.

I'll have to take another closer look later this week.

end

computeμ(ϵ::AbstractScalarOrVec{<:AbstractFloat}) = log.(10 * ϵ)
computeμ(ϵ::AbstractFloat) = log(10 * ϵ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution is required here: these support the vectorised version of HMC. Do you know how map would differ from broadcasting here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The results of the calculations won't be affected by this change, but using the non-broadcasted formulation for scalars and map for vectors of floats will remove the broadcasting overhead and reduce stress on the compiler, i.e., generally it reduces compilation time. Sometimes it also helps type inference (but this case is too simple for this effect I assume).

In my experience, broadcasting is useful if one's actually broadcasting values of different size and dimensions but otherwise often a suboptimal choice.

function finalize!(da::NesterovDualAveraging)
da.state.ϵ = exp.(da.state.x_bar)
return nothing
finalize!(da.state)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice improvement!


η_H = one(T) / (m + t_0)
H_bar = (one(T) - η_H) * H_bar .+ η_H * (δ .- α)
H_bar = (one(T) - η_H) .* H_bar .+ η_H .* (δ .- min.(one(T), α))
Copy link
Member

@yebai yebai Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HG: I'll have to review these more carefully later this week.

EDIT: This looks good. I am surprised the previous code didn't break any tests, as it didn't properly support vectorised adaption.

Copy link
Member

@yebai yebai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @devmotion. Nice improvements. I left a few comments below, mostly about whether we should refactor the vectorised HMC implementation in a concerted effort separately to avoid inconsistency.

function DAState(ϵ::AbstractVector{T}) where {T}
n = length(ϵ)
μ = computeμ(ϵ)
μ = map(computeμ, ϵ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
μ = map(computeμ, ϵ)
μ = computeμ(ϵ)

das.μ .= computeμ(das.ϵ)
das.x_bar .= zero(T)
return das.H_bar .= zero(T)
map!(computeμ, das.μ, das.ϵ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this as-is for now. We could refactor the vectorised HMC interface, but better to do it seprately in a concerted effort:

Suggested change
map!(computeμ, das.μ, das.ϵ)
das.μ .= computeμ(das.ϵ)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggestion would go against the main intention of the PR, reducing unnecessary allocations: With map! (or das.μ .= computeμ.(das.ϵ), but the broadcasting is more stressful for the compiler) no intermediate array would be created in this line, whereas with the suggestion on the right-hand side a new array is allocated that is then copied to das.μ (as a side remark, for the compiler copyto! should be simpler than broadcasting).

Copy link
Member

@yebai yebai Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point!

I recently discovered the AcceleratedKernels package, which provides a unified interface for parallelisation on CPUs, clusters, and GPUs. We could consider switching to AcceleratedKernels.map! for the vectorised HMC implementation, thus the above suggestion.

EDIT: I opened an issue for this suggestion. #412

yebai
yebai previously approved these changes Mar 26, 2025
penelopeysm
penelopeysm previously approved these changes Mar 26, 2025
@yebai yebai dismissed stale reviews from penelopeysm and themself via 1dc6ccc March 26, 2025 14:29
@yebai
Copy link
Member

yebai commented Mar 26, 2025

Feel free to merge once CI passes!

@yebai yebai merged commit a96ab41 into main Mar 27, 2025
17 checks passed
@yebai yebai deleted the dw/adaptation_stepsize branch March 27, 2025 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants