-
Notifications
You must be signed in to change notification settings - Fork 132
Use Adapt.jl to change storage and element type #2212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Review checklistThis checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging. Purpose and scope
Code quality
Documentation
Testing
Performance
Verification
Created with ❤️ by the Trixi.jl community. |
ranocha
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be close to final when tests are added
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2212 +/- ##
==========================================
- Coverage 96.83% 96.67% -0.16%
==========================================
Files 504 507 +3
Lines 41816 42010 +194
==========================================
+ Hits 40490 40610 +120
- Misses 1326 1400 +74
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ranocha
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Could you please adapt the t8code stuff as well? It causes the remaining failing CI tests like
LoadError: MethodError: Cannot `convert` an object of type Vector{Vector{Int64}} to an object of type Trixi.VecOfArrays{Vector{Int64}}
Closest candidates are:
convert(::Type{T}, ::T) where T
@ Base Base.jl:84
(::Type{Trixi.VecOfArrays{T}} where T<:AbstractArray)(::Any)
@ Trixi ~/work/Trixi.jl/Trixi.jl/src/auxiliary/vector_of_arrays.jl:12
Stacktrace:
[1] setproperty!(x::Trixi.P4estMPICache{Vector{Float64}, Vector{Int64}}, f::Symbol, v::Vector{Vector{Int64}})
@ Base ./Base.jl:40
[2] init_mpi_cache!(mpi_cache::Trixi.P4estMPICache{Vector{Float64}, Vector{Int64}}, mesh::Trixi.T8codeMesh{2, Float64, Static.True, 4, 4}, mpi_mesh_info::@NamedTuple{mpi_mortars::Trixi.P4estMPIMortarContainer{2, Float64, Float64, 3, 4, 5, Array{Float64, 5}, Vector{Float64}, Array}, mpi_interfaces::Trixi.P4estMPIInterfaceContainer{2, Float64, 4, Array{Float64, 4}, Vector{Int64}, Vector{Tuple{Symbol, Symbol}}, Vector{Float64}, Array}, global_mortar_ids::Vector{UInt128}, global_interface_ids::Vector{UInt128}, neighbor_ranks_mortar::Vector{Vector{Int64}}, neighbor_ranks_interface::Vector{Int64}}, nvars::Int64, nnodes::Int64, uEltype::Type)
@ Trixi ~/work/Trixi.jl/Trixi.jl/src/solvers/dgsem_t8code/dg_parallel.jl:93
sloede
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor comments/suggestions, then this should be ready to merge (once the other convo by Hendrik is resolved as well)
| struct VecOfArrays{T <: AbstractArray} | ||
| arrays::Vector{T} | ||
| end | ||
| Base.getindex(v::VecOfArrays, i::Int) = Base.getindex(v.arrays, i) | ||
| Base.IndexStyle(v::VecOfArrays) = Base.IndexStyle(v.arrays) | ||
| Base.size(v::VecOfArrays) = Base.size(v.arrays) | ||
| Base.length(v::VecOfArrays) = Base.length(v.arrays) | ||
| Base.eltype(v::VecOfArrays{T}) where {T} = T | ||
| function Adapt.adapt_structure(to, v::VecOfArrays) | ||
| return VecOfArrays([Adapt.adapt(to, arr) for arr in v.arrays]) | ||
| end | ||
| function Base.convert(::Type{<:VecOfArrays}, v::Vector{<:AbstractArray}) | ||
| VecOfArrays(v) | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| struct VecOfArrays{T <: AbstractArray} | |
| arrays::Vector{T} | |
| end | |
| Base.getindex(v::VecOfArrays, i::Int) = Base.getindex(v.arrays, i) | |
| Base.IndexStyle(v::VecOfArrays) = Base.IndexStyle(v.arrays) | |
| Base.size(v::VecOfArrays) = Base.size(v.arrays) | |
| Base.length(v::VecOfArrays) = Base.length(v.arrays) | |
| Base.eltype(v::VecOfArrays{T}) where {T} = T | |
| function Adapt.adapt_structure(to, v::VecOfArrays) | |
| return VecOfArrays([Adapt.adapt(to, arr) for arr in v.arrays]) | |
| end | |
| function Base.convert(::Type{<:VecOfArrays}, v::Vector{<:AbstractArray}) | |
| VecOfArrays(v) | |
| end | |
| struct VectorOfArrays{T <: AbstractArray} | |
| arrays::Vector{T} | |
| end | |
| Base.getindex(v::VectorOfArrays, i::Int) = Base.getindex(v.arrays, i) | |
| Base.IndexStyle(v::VectorOfArrays) = Base.IndexStyle(v.arrays) | |
| Base.size(v::VectorOfArrays) = Base.size(v.arrays) | |
| Base.length(v::VectorOfArrays) = Base.length(v.arrays) | |
| Base.eltype(v::VectorOfArrays{T}) where {T} = T | |
| function Adapt.adapt_structure(to, v::VectorOfArrays) | |
| return VectorOfArrays([Adapt.adapt(to, arr) for arr in v.arrays]) | |
| end | |
| function Base.convert(::Type{<:VectorOfArrays}, v::Vector{<:AbstractArray}) | |
| VectorOfArrays(v) | |
| end |
We try to avoid abbreviations unless really necessary for sanity 😬 Also, it would be consistent with the file name.
Note, however, that I do not have a super strong opinion here, thus if you deem VecOfArrays to be better, we can stay like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just saw that RecursiveArrayTools.jl seems to already have a VectorOfArrays type (https://github.com/trixi-framework/Trixi.jl/pull/2150/files#diff-6058d487915aa254cfe1a5f8da6315b8c1fef4da04df38b0089a599803e28e5aR63). Should we use that one instead?
If not, then of course we should use a different name for our own data structure, and keeping VecOfArrays as the name seems reasonable to me. In that case, it might be good to leave a short note in the comment why we do not want to use the SciML package's type for future reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using RecursiveArrayTools: VectorOfArray is added in #2150 for DGMulti support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I saw that. I was just wondering whether their data type could (or should) be used instead of rolling our own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RecursiveArrayTools VectorOfArray is not fit for purpose... They define:
function Adapt.adapt_structure(to, VA::AbstractVectorOfArray)
Adapt.adapt(to, Array(VA))
end
Which is not what we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There have been some recent updates and it seems we can now actually use RecursiveArrayTools's VectorOfArray instead of our own. This still needs testing.
| uArray <: DenseArray{uEltype, NDIMSP2}, | ||
| IdsMatrix <: DenseMatrix{Int}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the restriction to dense types? (just curious)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually don't know this is a left-over from the original work.
|
What's the status of this PR? |
|
Okay, current status:
Open questions:
|
db49594 to
e632e4c
Compare
That's great news, excellent!
If it's only with a reasonable overhead, I'd prefer to have a separate PR to make it easer to understand and review the code.
Puh. Do you have a rough idea about how much overhead (downloading + precompilation) this will add to the non-CUDA CI runs? That is, are we talking about tens of seconds or rather minutes? For the time being, my take would be to hold off adding it as a test dependency for now, since it will also impact people testing locally.
It should be OK. The mesh in Trixi.jl is only used as an "infrastructure provider", i.e., it allows the solvers to figure out which cells exist, who their neighbors are, where their neighbors live (when running with MPI parallelism) etc. during the initialization phase. Thus in a usual simulation run there are no performance-critical loops working directly on mesh data. One thought I had: It might make sense to create a meta issue for upcoming (and deferred) changes for GPU/heterogeneous computing support. That is, a single issue to keep track of the various PRs, other issues, and maybe open questions/TODOs that are not living in their own issue. |
The most annoying thing is likely downloading the CUDA_Runtime_jll,, other than that we are talking seconds. The biggest issue is that without it testing the
That's somewhat related to the issue above, it is a bit of code that doesn't impact anything until you are actually trying to move the model to the GPU. I could only do the
Yeah, it's more a question if we are going to access data within it from the GPU, but I guess we don't know until we actually have a model running. As far as I can tell right now Lars didn't move it either. |
OK, then I'd say let's give it a try and see where it takes us.
Hm, ok. I understand you're saying that we'd be able to split it, but then not really review the impact that it will have later on in the subsequent PRs? In that case, it might be best to keep it in one PR. In that case, please try to keep the PR as focused as possible (such that it does not blow up in size even more).
Right. No, there shouldn't be any access of mesh data from the GPU during the regular |
97877e6 to
15a898b
Compare
|
This looks like |
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Hendrik Ranocha <[email protected]>
Co-authored-by: Hendrik Ranocha <[email protected]>
ranocha
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
@sloede Do you want to review this PR as well?
Co-authored-by: Hendrik Ranocha <[email protected]>
ranocha
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot to everybody involved!
|
In the future should I squash the PR beforehand? I normally adjust the commit message before I squash merge so that it is useful (instead of my messy development history) |
|
We usually don't do that. |
In order to eventually support GPU computation we need
to use Adapt.jl to allow GPU backend packages to swap
out host-array types like
CuArraywith device-side typeslike
CuDeviceArray.Additionally this will allow us to change the element type
of a simulation by using
adapt(Array{Float32}.