Use Adapt.jl to change storage and element type #2212

vchuravy · 2024-12-17T16:43:17Z

In order to eventually support GPU computation we need
to use Adapt.jl to allow GPU backend packages to swap
out host-array types like CuArray with device-side types
like CuDeviceArray.

Additionally this will allow us to change the element type
of a simulation by using adapt(Array{Float32}.

github-actions · 2024-12-17T16:43:30Z

src/Trixi.jl

src/solvers/dgsem_p4est/containers_parallel.jl

src/solvers/dgsem_p4est/containers.jl

ranocha

Should be close to final when tests are added

codecov · 2025-02-03T16:26:02Z

Codecov Report

Attention: Patch coverage is 68.44262% with 77 lines in your changes missing coverage. Please review.

Project coverage is 96.67%. Comparing base (43cd98d) to head (1acf007).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/solvers/dgsem_p4est/containers_parallel.jl	26.92%	19 Missing ⚠️
src/auxiliary/containers.jl	63.64%	16 Missing ⚠️
src/solvers/dgsem_p4est/containers.jl	83.53%	14 Missing ⚠️
src/solvers/dg.jl	60.00%	10 Missing ⚠️
src/auxiliary/vector_of_arrays.jl	35.71%	9 Missing ⚠️
ext/TrixiAMDGPUExt.jl	0.00%	6 Missing ⚠️
ext/TrixiCUDAExt.jl	0.00%	2 Missing ⚠️
src/semidiscretization/semidiscretization.jl	90.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2212      +/-   ##
==========================================
- Coverage   96.83%   96.67%   -0.16%     
==========================================
  Files         504      507       +3     
  Lines       41816    42010     +194     
==========================================
+ Hits        40490    40610     +120     
- Misses       1326     1400      +74

Flag	Coverage Δ
unittests	`96.67% <68.44%> (-0.16%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ranocha

Thanks! Could you please adapt the t8code stuff as well? It causes the remaining failing CI tests like

  LoadError: MethodError: Cannot `convert` an object of type Vector{Vector{Int64}} to an object of type Trixi.VecOfArrays{Vector{Int64}}
  
  Closest candidates are:
    convert(::Type{T}, ::T) where T
     @ Base Base.jl:84
    (::Type{Trixi.VecOfArrays{T}} where T<:AbstractArray)(::Any)
     @ Trixi ~/work/Trixi.jl/Trixi.jl/src/auxiliary/vector_of_arrays.jl:12
  
  Stacktrace:
    [1] setproperty!(x::Trixi.P4estMPICache{Vector{Float64}, Vector{Int64}}, f::Symbol, v::Vector{Vector{Int64}})
      @ Base ./Base.jl:40
    [2] init_mpi_cache!(mpi_cache::Trixi.P4estMPICache{Vector{Float64}, Vector{Int64}}, mesh::Trixi.T8codeMesh{2, Float64, Static.True, 4, 4}, mpi_mesh_info::@NamedTuple{mpi_mortars::Trixi.P4estMPIMortarContainer{2, Float64, Float64, 3, 4, 5, Array{Float64, 5}, Vector{Float64}, Array}, mpi_interfaces::Trixi.P4estMPIInterfaceContainer{2, Float64, 4, Array{Float64, 4}, Vector{Int64}, Vector{Tuple{Symbol, Symbol}}, Vector{Float64}, Array}, global_mortar_ids::Vector{UInt128}, global_interface_ids::Vector{UInt128}, neighbor_ranks_mortar::Vector{Vector{Int64}}, neighbor_ranks_interface::Vector{Int64}}, nvars::Int64, nnodes::Int64, uEltype::Type)
      @ Trixi ~/work/Trixi.jl/Trixi.jl/src/solvers/dgsem_t8code/dg_parallel.jl:93

src/solvers/dgsem_unstructured/sort_boundary_conditions.jl

sloede

Just a few minor comments/suggestions, then this should be ready to merge (once the other convo by Hendrik is resolved as well)

src/auxiliary/containers.jl

sloede · 2025-02-06T09:36:13Z

src/auxiliary/vector_of_arrays.jl

+struct VecOfArrays{T <: AbstractArray}
+    arrays::Vector{T}
+end
+Base.getindex(v::VecOfArrays, i::Int) = Base.getindex(v.arrays, i)
+Base.IndexStyle(v::VecOfArrays) = Base.IndexStyle(v.arrays)
+Base.size(v::VecOfArrays) = Base.size(v.arrays)
+Base.length(v::VecOfArrays) = Base.length(v.arrays)
+Base.eltype(v::VecOfArrays{T}) where {T} = T
+function Adapt.adapt_structure(to, v::VecOfArrays)
+    return VecOfArrays([Adapt.adapt(to, arr) for arr in v.arrays])
+end
+function Base.convert(::Type{<:VecOfArrays}, v::Vector{<:AbstractArray})
+    VecOfArrays(v)
+end


Suggested change

struct VecOfArrays{T <: AbstractArray}

arrays::Vector{T}

end

Base.getindex(v::VecOfArrays, i::Int) = Base.getindex(v.arrays, i)

Base.IndexStyle(v::VecOfArrays) = Base.IndexStyle(v.arrays)

Base.size(v::VecOfArrays) = Base.size(v.arrays)

Base.length(v::VecOfArrays) = Base.length(v.arrays)

Base.eltype(v::VecOfArrays{T}) where {T} = T

function Adapt.adapt_structure(to, v::VecOfArrays)

return VecOfArrays([Adapt.adapt(to, arr) for arr in v.arrays])

end

function Base.convert(::Type{<:VecOfArrays}, v::Vector{<:AbstractArray})

VecOfArrays(v)

end

struct VectorOfArrays{T <: AbstractArray}

arrays::Vector{T}

end

Base.getindex(v::VectorOfArrays, i::Int) = Base.getindex(v.arrays, i)

Base.IndexStyle(v::VectorOfArrays) = Base.IndexStyle(v.arrays)

Base.size(v::VectorOfArrays) = Base.size(v.arrays)

Base.length(v::VectorOfArrays) = Base.length(v.arrays)

Base.eltype(v::VectorOfArrays{T}) where {T} = T

function Adapt.adapt_structure(to, v::VectorOfArrays)

return VectorOfArrays([Adapt.adapt(to, arr) for arr in v.arrays])

end

function Base.convert(::Type{<:VectorOfArrays}, v::Vector{<:AbstractArray})

VectorOfArrays(v)

end

We try to avoid abbreviations unless really necessary for sanity 😬 Also, it would be consistent with the file name.

Note, however, that I do not have a super strong opinion here, thus if you deem VecOfArrays to be better, we can stay like this.

I just saw that RecursiveArrayTools.jl seems to already have a VectorOfArrays type (https://github.com/trixi-framework/Trixi.jl/pull/2150/files#diff-6058d487915aa254cfe1a5f8da6315b8c1fef4da04df38b0089a599803e28e5aR63). Should we use that one instead?

If not, then of course we should use a different name for our own data structure, and keeping VecOfArrays as the name seems reasonable to me. In that case, it might be good to leave a short note in the comment why we do not want to use the SciML package's type for future reference.

using RecursiveArrayTools: VectorOfArray is added in #2150 for DGMulti support

Yeah, I saw that. I was just wondering whether their data type could (or should) be used instead of rolling our own.

RecursiveArrayTools VectorOfArray is not fit for purpose... They define:

function Adapt.adapt_structure(to, VA::AbstractVectorOfArray) Adapt.adapt(to, Array(VA)) end

Which is not what we want.

There have been some recent updates and it seems we can now actually use RecursiveArrayTools's VectorOfArray instead of our own. This still needs testing.

src/solvers/dgsem_p4est/containers.jl

sloede · 2025-02-06T09:40:22Z

src/solvers/dgsem_p4est/containers.jl

+                                       uArray <: DenseArray{uEltype, NDIMSP2},
+                                       IdsMatrix <: DenseMatrix{Int},


Why the restriction to dense types? (just curious)

I actually don't know this is a left-over from the original work.

src/solvers/dgsem_p4est/containers.jl

src/solvers/dgsem_p4est/containers_parallel.jl

src/solvers/dgsem_p4est/dg_parallel.jl

sloede · 2025-03-20T10:42:41Z

What's the status of this PR?

vchuravy · 2025-04-21T16:46:58Z

Okay, current status:

Removed AbstractHeterogenousContainer and replaced it with a storage_type function that uses Adapt infrastructure for wrapped arrays.
Started writing some docs on heterogeneous computing & Trixi
Added a minimal extension on CUDA.jl

Open questions:

Should I untangle the whole VecOfArrays business into a separate PR? I am not all that happy with that implementation, but it is something we need to address for full CUDA support.
Can I add CUDA.jl as a test dependency, or should I pursue more the strategy in add minimal CUDA pipeline #2367 where I add it on-demand?
Right now we are not adapting the mesh, similar to remake, is that okay?

sloede · 2025-04-22T08:16:49Z

Started writing some docs on heterogeneous computing & Trixi

That's great news, excellent!

Should I untangle the whole VecOfArrays business into a separate PR? I am not all that happy with that implementation, but it is something we need to address for full CUDA support.

If it's only with a reasonable overhead, I'd prefer to have a separate PR to make it easer to understand and review the code.

Can I add CUDA.jl as a test dependency, or should I pursue more the strategy in add minimal CUDA pipeline #2367 where I add it on-demand?

Puh. Do you have a rough idea about how much overhead (downloading + precompilation) this will add to the non-CUDA CI runs? That is, are we talking about tens of seconds or rather minutes?

For the time being, my take would be to hold off adding it as a test dependency for now, since it will also impact people testing locally.

Right now we are not adapting the mesh, similar to remake, is that okay?

It should be OK. The mesh in Trixi.jl is only used as an "infrastructure provider", i.e., it allows the solvers to figure out which cells exist, who their neighbors are, where their neighbors live (when running with MPI parallelism) etc. during the initialization phase. Thus in a usual simulation run there are no performance-critical loops working directly on mesh data.

One thought I had: It might make sense to create a meta issue for upcoming (and deferred) changes for GPU/heterogeneous computing support. That is, a single issue to keep track of the various PRs, other issues, and maybe open questions/TODOs that are not living in their own issue.

vchuravy · 2025-04-22T09:02:54Z

Puh. Do you have a rough idea about how much overhead (downloading + precompilation) this will add to the non-CUDA CI runs? That is, are we talking about tens of seconds or rather minutes?

The most annoying thing is likely downloading the CUDA_Runtime_jll,, other than that we are talking seconds.

The biggest issue is that without it testing the storage_type=X change is almost impossible and would require significant mocking.

If it's only with a reasonable overhead, I'd prefer to have a separate PR to make it easer to understand and review the code.

That's somewhat related to the issue above, it is a bit of code that doesn't impact anything until you are actually trying to move the model to the GPU.

I could only do the real_type part of this PR , but that excludes quite a bit of complexity.

Thus in a usual simulation run there are no performance-critical loops working directly on mesh data.

Yeah, it's more a question if we are going to access data within it from the GPU, but I guess we don't know until we actually have a model running. As far as I can tell right now Lars didn't move it either.

sloede · 2025-04-22T09:12:24Z

Puh. Do you have a rough idea about how much overhead (downloading + precompilation) this will add to the non-CUDA CI runs? That is, are we talking about tens of seconds or rather minutes?

The most annoying thing is likely downloading the CUDA_Runtime_jll,, other than that we are talking seconds.

The biggest issue is that without it testing the storage_type=X change is almost impossible and would require significant mocking.

OK, then I'd say let's give it a try and see where it takes us.

If it's only with a reasonable overhead, I'd prefer to have a separate PR to make it easier to understand and review the code.

That's somewhat related to the issue above, it is a bit of code that doesn't impact anything until you are actually trying to move the model to the GPU.

I could only do the real_type part of this PR , but that excludes quite a bit of complexity.

Hm, ok. I understand you're saying that we'd be able to split it, but then not really review the impact that it will have later on in the subsequent PRs? In that case, it might be best to keep it in one PR. In that case, please try to keep the PR as focused as possible (such that it does not blow up in size even more).

Thus in a usual simulation run there are no performance-critical loops working directly on mesh data.

Yeah, it's more a question if we are going to access data within it from the GPU, but I guess we don't know until we actually have a model running. As far as I can tell right now Lars didn't move it either.

Right. No, there shouldn't be any access of mesh data from the GPU during the regular rhs! invocation. All possible mesh accesses would occur in one of the callbacks.

.buildkite/pipeline.yml

test/test_p4est_2d.jl

vchuravy · 2025-06-30T12:01:56Z

This looks like test_trixi_include is passing through an expression and is not widening getting it from the parent scope.

/var/lib/buildkite-agent/builds/gpuci-4/julialang/trixi-dot-jl/examples/p4est_2d_dgsem/elixir_advection_basic.jl
elixir_advection_basic.jl (Float32): Error During Test at /var/lib/buildkite-agent/builds/gpuci-4/julialang/trixi-dot-jl/test/test_trixi.jl:186
  Got exception outside of a @test
  LoadError: UndefVarError: `CuArray` not defined

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Co-authored-by: Hendrik Ranocha <[email protected]>

ranocha

Thanks a lot!

@sloede Do you want to review this PR as well?

docs/src/styleguide.md

Co-authored-by: Hendrik Ranocha <[email protected]>

ranocha

Thanks a lot to everybody involved!

vchuravy · 2025-07-17T21:32:51Z

In the future should I squash the PR beforehand? I normally adjust the commit message before I squash merge so that it is useful (instead of my messy development history)

JoshuaLampert · 2025-07-17T22:00:31Z

We usually don't do that.

vchuravy force-pushed the vc/adapt branch from 3fff1fa to f6866b0 Compare January 19, 2025 18:44

ranocha reviewed Feb 3, 2025

View reviewed changes

src/Trixi.jl Outdated Show resolved Hide resolved

src/solvers/dgsem_p4est/containers_parallel.jl Outdated Show resolved Hide resolved

ranocha reviewed Feb 3, 2025

View reviewed changes

src/solvers/dgsem_p4est/containers.jl Outdated Show resolved Hide resolved

ranocha requested changes Feb 3, 2025

View reviewed changes

vchuravy force-pushed the vc/adapt branch from a920040 to 72cb0f2 Compare February 3, 2025 13:05

vchuravy marked this pull request as ready for review February 3, 2025 15:38

vchuravy requested review from benegee and ranocha February 3, 2025 15:38

ranocha requested changes Feb 4, 2025

View reviewed changes

src/solvers/dgsem_unstructured/sort_boundary_conditions.jl Show resolved Hide resolved

sloede reviewed Feb 6, 2025

View reviewed changes

DanielDoehring added the gpu label Feb 10, 2025

vchuravy force-pushed the vc/adapt branch from 4bf0864 to 40b5f58 Compare March 11, 2025 17:28

vchuravy force-pushed the vc/adapt branch from 40b5f58 to 55ed24d Compare April 20, 2025 20:05

vchuravy force-pushed the vc/adapt branch 2 times, most recently from db49594 to e632e4c Compare April 22, 2025 07:30

vchuravy mentioned this pull request Apr 22, 2025

unsafe_wrap for symbols JuliaGPU/CUDA.jl#2753

Merged

vchuravy force-pushed the vc/adapt branch from aeff65d to fefedd2 Compare April 28, 2025 14:19

vchuravy force-pushed the vc/adapt branch 2 times, most recently from 97877e6 to 15a898b Compare May 14, 2025 07:39

vchuravy commented May 14, 2025

View reviewed changes

.buildkite/pipeline.yml Outdated Show resolved Hide resolved

vchuravy commented May 14, 2025

View reviewed changes

test/test_p4est_2d.jl Show resolved Hide resolved

vchuravy force-pushed the vc/adapt branch from 7e72eff to b973758 Compare June 27, 2025 08:25

vchuravy and others added 15 commits July 15, 2025 12:28

use eleixi_advection_basic_gpu.jl for adapt test

2201422

Update examples/p4est_2d_dgsem/elixir_advection_basic_gpu.jl

46f168a

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

something on KA

6ce7e92

KA imports

c2a6f79

fix typo

a3d9104

fix CUDA compat in test/

be60e83

add stepsize callback

d8664d3

fixup! fix CUDA compat in test/

2e0dea4

upgrade minimal diffeqbase

958d308

upgrade StructArrays

7e9ccf8

Apply suggestions from code review

cb135ec

Co-authored-by: Hendrik Ranocha <[email protected]>

add Adapt compat in test/Project.toml

0922acc

Apply suggestions from code review

fd47ebe

Co-authored-by: Hendrik Ranocha <[email protected]>

address feedback from in-person conversation

232b3b5

add news

4258c36

vchuravy force-pushed the vc/adapt branch from c9ce56c to 4258c36 Compare July 15, 2025 10:28

ranocha previously approved these changes Jul 16, 2025

View reviewed changes

vchuravy commented Jul 16, 2025

View reviewed changes

docs/src/styleguide.md Show resolved Hide resolved

Apply suggestions from code review

432dcb8

Co-authored-by: Hendrik Ranocha <[email protected]>

vchuravy dismissed ranocha’s stale review via 432dcb8 July 16, 2025 07:32

vchuravy mentioned this pull request Jul 16, 2025

Interaction between Adapt.jl and jacobian_ad #2473

Open

Merge branch 'main' into vc/adapt

1acf007

ranocha approved these changes Jul 17, 2025

View reviewed changes

ranocha merged commit 6b8d05d into main Jul 17, 2025
35 of 39 checks passed

ranocha deleted the vc/adapt branch July 17, 2025 20:08

vchuravy mentioned this pull request Jul 18, 2025

Support serial execution and introduce backend preference #2476

Merged

benegee mentioned this pull request Jul 21, 2025

update to Trixi.jl v0.13 trixi-framework/TrixiAtmo.jl#105

Merged

ranocha mentioned this pull request Oct 31, 2025

More abstract containers #2632

Open

		uArray <: DenseArray{uEltype, NDIMSP2},
		IdsMatrix <: DenseMatrix{Int},

Use Adapt.jl to change storage and element type #2212

Use Adapt.jl to change storage and element type #2212

Uh oh!

Conversation

vchuravy commented Dec 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 17, 2024

Review checklist

Purpose and scope

Code quality

Documentation

Testing

Performance

Verification

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ranocha left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ranocha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sloede left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sloede commented Mar 20, 2025

Uh oh!

vchuravy commented Apr 21, 2025 • edited by benegee Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sloede commented Apr 22, 2025

Uh oh!

vchuravy commented Apr 22, 2025

Uh oh!

sloede commented Apr 22, 2025

Uh oh!

Uh oh!

Uh oh!

vchuravy commented Jun 30, 2025

Uh oh!

ranocha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ranocha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vchuravy commented Jul 17, 2025

Uh oh!

vchuravy commented Dec 17, 2024 •

edited

Loading

codecov bot commented Feb 3, 2025 •

edited

Loading

vchuravy commented Apr 21, 2025 •

edited by benegee

Loading