Skip to content

Commit 6b8d05d

Browse files
vchuravylchristmbenegeegithub-actions[bot]ranocha
authored
Use Adapt.jl to change storage and element type (#2212)
* Use Adapt.jl to change storage and element type In order to eventually support GPU computation we need to use Adapt.jl to allow GPU backend packages to swap out host-array types like `CuArray` with device-side types like `CuDeviceArray`. Additionally this will allow us to change the element type of a simulation by using `adapt(Array{Float32}`. Co-authored-by: Lars Christmann <[email protected]> Co-authored-by: Benedict Geihe <[email protected]> * restore elixir * offload compute_coefficients * fmt * test native version as well * adapt 1D and 3D version * Downgrade compat with Adapt * update requires to 1.3 * add support for AMDGPU * fix doctest * Use `u_ode` to determine the computational backend Co-authored-by: Benedict Geihe <[email protected]> * Use KA 0.9.31 * handle VectorOfArray in trixi_backend * fixup: runtests * format * fix trixi_backend for RecursiveArrayTools{StaticArray} * fixup: amdgpu test * use unsafe_wrap with lock=false for AMDGPU * Update src/solvers/dg.jl * Update ext/TrixiAMDGPUExt.jl Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update src/auxiliary/containers.jl Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * use eleixi_advection_basic_gpu.jl for adapt test * Update examples/p4est_2d_dgsem/elixir_advection_basic_gpu.jl Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * something on KA * KA imports * fix typo * fix CUDA compat in test/ * add stepsize callback * fixup! fix CUDA compat in test/ * upgrade minimal diffeqbase * upgrade StructArrays * Apply suggestions from code review Co-authored-by: Hendrik Ranocha <[email protected]> * add Adapt compat in test/Project.toml * Apply suggestions from code review Co-authored-by: Hendrik Ranocha <[email protected]> * address feedback from in-person conversation * add news * Apply suggestions from code review Co-authored-by: Hendrik Ranocha <[email protected]> --------- Co-authored-by: Lars Christmann <[email protected]> Co-authored-by: Benedict Geihe <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Hendrik Ranocha <[email protected]>
1 parent 43cd98d commit 6b8d05d

31 files changed

+1195
-263
lines changed

.buildkite/pipeline.yml

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
env:
2+
13
steps:
24
- label: "CUDA Julia {{matrix.version}}"
35
matrix:
@@ -7,12 +9,13 @@ steps:
79
plugins:
810
- JuliaCI/julia#v1:
911
version: "{{matrix.version}}"
10-
command: |
11-
true
12+
- JuliaCI/julia-test#v1: ~
13+
env:
14+
TRIXI_TEST: "CUDA"
1215
agents:
1316
queue: "juliagpu"
1417
cuda: "*"
1518
if: build.message !~ /\[skip ci\]/
1619
timeout_in_minutes: 60
1720
soft_fail:
18-
- exit_status: 3
21+
- exit_status: 3

.github/workflows/GPUCompat.yml

Lines changed: 0 additions & 86 deletions
This file was deleted.

NEWS.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,13 @@ Trixi.jl follows the interpretation of
55
used in the Julia ecosystem. Notable changes will be documented in this file
66
for human readability.
77

8+
## Changes in the v0.12 lifecycle
9+
10+
#### Added
11+
- Initial support for adapting data-structures between different storage arrays was added. This enables future work to support GPU with Trixi ([#2212]).
12+
13+
#### Deprecated
14+
815
## Changes when updating to v0.12 from v0.11.x
916

1017
#### Added

Project.toml

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ version = "0.12.7-DEV"
55

66
[deps]
77
Accessors = "7d9f7c33-5ae7-4f3b-8dc6-eff91059b697"
8+
Adapt = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
89
CodeTracking = "da1fd8a2-8d9e-5ec2-8556-3022fb5608a2"
910
ConstructionBase = "187b0558-2788-49d3-abe0-74a17ed4e7c9"
1011
DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
@@ -16,6 +17,7 @@ EllipsisNotation = "da5c29d0-fa7d-589e-88eb-ea29b0a81949"
1617
FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b"
1718
ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
1819
HDF5 = "f67ccb44-e63f-5c2f-98bd-6dc0ccc4ba2f"
20+
KernelAbstractions = "63c18a36-062a-441e-b654-da1e3ab1ce7c"
1921
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
2022
LinearMaps = "7a12625a-238d-50fd-b39a-03d52299707e"
2123
LoopVectorization = "bdcacae8-1622-11e9-2a5c-532679323890"
@@ -52,31 +54,39 @@ TrixiBase = "9a0f1c46-06d5-4909-a5a3-ce25d3fa3284"
5254
UUIDs = "cf7118a7-6976-5b1a-9a39-7adc72f591a4"
5355

5456
[weakdeps]
57+
AMDGPU = "21141c5a-9bdb-4563-92ae-f87d6854732e"
58+
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
5559
Convex = "f65535da-76fb-5f13-bab9-19810c17039a"
5660
ECOS = "e2685f51-7e38-5353-a97d-a921fd2c8199"
5761
Makie = "ee78f7c6-11fb-53f2-987a-cfe4a2b5a57a"
5862
NLsolve = "2774e3e8-f4cf-5e23-947b-6d7e65073b56"
5963

6064
[extensions]
65+
TrixiAMDGPUExt = "AMDGPU"
66+
TrixiCUDAExt = "CUDA"
6167
TrixiConvexECOSExt = ["Convex", "ECOS"]
6268
TrixiMakieExt = "Makie"
6369
TrixiNLsolveExt = "NLsolve"
6470

6571
[compat]
72+
AMDGPU = "1.3.5"
6673
Accessors = "0.1.36"
74+
Adapt = "4"
75+
CUDA = "5.8"
6776
CodeTracking = "1.0.5"
6877
ConstructionBase = "1.5"
6978
Convex = "0.16"
7079
DataStructures = "0.18.15"
7180
DelimitedFiles = "1"
72-
DiffEqBase = "6.154"
81+
DiffEqBase = "6.155.2"
7382
DiffEqCallbacks = "2.35, 3, 4"
7483
Downloads = "1.6"
7584
ECOS = "1.1.2"
7685
EllipsisNotation = "1.0"
7786
FillArrays = "1.9"
7887
ForwardDiff = "0.10.36, 1"
7988
HDF5 = "0.16.10, 0.17"
89+
KernelAbstractions = "0.9.36"
8090
LinearAlgebra = "1"
8191
LinearMaps = "2.7, 3.0"
8292
LoopVectorization = "0.12.171"
@@ -94,7 +104,7 @@ Printf = "1"
94104
RecipesBase = "1.3.4"
95105
RecursiveArrayTools = "3.31.1"
96106
Reexport = "1.2"
97-
Requires = "1.1"
107+
Requires = "1.3"
98108
SciMLBase = "2.67.0"
99109
SimpleUnPack = "1.1"
100110
SparseArrays = "1"
@@ -104,7 +114,7 @@ Static = "1.1.1"
104114
StaticArrayInterface = "1.5.1"
105115
StaticArrays = "1.9"
106116
StrideArrays = "0.1.29"
107-
StructArrays = "0.6.18, 0.7"
117+
StructArrays = "0.6.20, 0.7"
108118
SummationByPartsOperators = "0.5.52"
109119
T8code = "0.7.4"
110120
TimerOutputs = "0.5.23"

docs/Project.toml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
[deps]
2+
Adapt = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
23
CairoMakie = "13f3f980-e62b-5c42-98c6-ff1f3baf88f0"
34
Changelog = "5217a498-cd5d-4ec6-b8c2-9b85a09b6e3e"
45
Convex = "f65535da-76fb-5f13-bab9-19810c17039a"
@@ -16,9 +17,13 @@ OrdinaryDiffEqSSPRK = "669c94d9-1f4b-4b64-b377-1aa079aa2388"
1617
OrdinaryDiffEqTsit5 = "b1df2697-797e-41e3-8120-5422d3b24e4a"
1718
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
1819
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
20+
Trixi = "a7f1ee26-1774-49b1-8366-f1abc58fbfcb"
1921
Trixi2Vtk = "bc1476a1-1ca6-4cc3-950b-c312b255ff95"
2022
TrixiBase = "9a0f1c46-06d5-4909-a5a3-ce25d3fa3284"
2123

24+
[sources]
25+
Trixi = {path = ".."}
26+
2227
[compat]
2328
CairoMakie = "0.12, 0.13, 0.14, 0.15"
2429
Changelog = "1.1"

docs/make.jl

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -163,7 +163,8 @@ makedocs(
163163
"Style guide" => "styleguide.md",
164164
"Testing" => "testing.md",
165165
"Performance" => "performance.md",
166-
"Parallelization" => "parallelization.md"
166+
"Parallelization" => "parallelization.md",
167+
"Heterogeneous" => "heterogeneous.md"
167168
],
168169
"Troubleshooting and FAQ" => "troubleshooting.md",
169170
"Reference" => [

docs/src/heterogeneous.md

Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
# Heterogeneous computing
2+
3+
Support for heterogeneous computing is currently being worked on.
4+
5+
## The use of Adapt.jl
6+
7+
[Adapt.jl](https://github.com/JuliaGPU/Adapt.jl) is a package in the
8+
[JuliaGPU](https://github.com/JuliaGPU) family that allows for
9+
the translation of nested data structures. The primary goal is to allow the substitution of `Array`
10+
at the storage level with a GPU array like `CuArray` from [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl).
11+
12+
To facilitate this, data structures must be parameterized, so instead of:
13+
14+
```julia
15+
struct Container <: Trixi.AbstractContainer
16+
data::Array{Float64, 2}
17+
end
18+
```
19+
20+
They must be written as:
21+
22+
```jldoctest adapt; output = false, setup=:(import Trixi)
23+
struct Container{D<:AbstractArray} <: Trixi.AbstractContainer
24+
data::D
25+
end
26+
27+
# output
28+
29+
```
30+
31+
furthermore, we need to define a function that allows for the conversion of storage
32+
of our types:
33+
34+
```jldoctest adapt; output = false
35+
using Adapt
36+
37+
function Adapt.adapt_structure(to, C::Container)
38+
return Container(adapt(to, C.data))
39+
end
40+
41+
# output
42+
43+
```
44+
45+
or simply
46+
47+
```julia
48+
Adapt.@adapt_structure(Container)
49+
```
50+
51+
additionally, we must define `Adapt.parent_type`.
52+
53+
```jldoctest adapt; output = false
54+
function Adapt.parent_type(::Type{<:Container{D}}) where D
55+
return D
56+
end
57+
58+
# output
59+
60+
```
61+
62+
All together we can use this machinery to perform conversions of a container.
63+
64+
```jldoctest adapt
65+
julia> C = Container(zeros(3))
66+
Container{Vector{Float64}}([0.0, 0.0, 0.0])
67+
68+
julia> Trixi.storage_type(C)
69+
Array
70+
```
71+
72+
73+
```julia-repl
74+
julia> using CUDA
75+
76+
julia> GPU_C = adapt(CuArray, C)
77+
Container{CuArray{Float64, 1, CUDA.DeviceMemory}}([0.0, 0.0, 0.0])
78+
79+
julia> Trixi.storage_type(C)
80+
CuArray
81+
```
82+
83+
## Element-type conversion with `Trixi.trixi_adapt`.
84+
85+
We can use [`Trixi.trixi_adapt`](@ref) to perform both an element-type and a storage-type adoption:
86+
87+
```jldoctest adapt
88+
julia> C = Container(zeros(3))
89+
Container{Vector{Float64}}([0.0, 0.0, 0.0])
90+
91+
julia> Trixi.trixi_adapt(Array, Float32, C)
92+
Container{Vector{Float32}}(Float32[0.0, 0.0, 0.0])
93+
```
94+
95+
```julia-repl
96+
julia> Trixi.trixi_adapt(CuArray, Float32, C)
97+
Container{CuArray{Float32, 1, CUDA.DeviceMemory}}(Float32[0.0, 0.0, 0.0])
98+
```
99+
100+
!!! note
101+
`adapt(Array{Float32}, C)` is tempting, but it will do the wrong thing
102+
in the presence of `SVector`s and similar arrays from StaticArrays.jl.
103+
104+
105+
## Writing GPU kernels
106+
107+
Offloading computations to the GPU is done with
108+
[KernelAbstractions.jl](https://github.com/JuliaGPU/KernelAbstractions.jl),
109+
allowing for vendor-agnostic GPU code.
110+
111+
### Example
112+
113+
Given the following Trixi.jl code, which would typically be called from within `rhs!`:
114+
115+
```julia
116+
function trixi_rhs_fct(mesh, equations, solver, cache, args)
117+
@threaded for element in eachelement(solver, cache)
118+
# code
119+
end
120+
end
121+
```
122+
123+
1. Put the inner code in a new function `rhs_fct_per_element`. Besides the index
124+
`element`, pass all required fields as arguments, but make sure to `@unpack` them from
125+
their structs in advance.
126+
127+
2. Where `trixi_rhs_fct` is called, get the backend, i.e., the hardware we are currently
128+
running on via `trixi_backend(x)`.
129+
This will, e.g., work with `u_ode`. Internally, KernelAbstractions.jl's `get_backend`
130+
will be called, i.e., KernelAbstractions.jl has to know the type of `x`.
131+
132+
```julia
133+
backend = trixi_backend(u_ode)
134+
```
135+
136+
3. Add a new argument `backend` to `trixi_rhs_fct` used for dispatch.
137+
When `backend` is `nothing`, the legacy implementation should be used:
138+
```julia
139+
function trixi_rhs_fct(backend::Nothing, mesh, equations, solver, cache, args)
140+
@unpack unpacked_args = cache
141+
@threaded for element in eachelement(solver, cache)
142+
rhs_fct_per_element(element, unpacked_args, args)
143+
end
144+
end
145+
```
146+
147+
4. When `backend` is a `Backend` (a type defined by KernelAbstractions.jl), write a
148+
KernelAbstractions.jl kernel:
149+
```julia
150+
function trixi_rhs_fct(backend::Backend, mesh, equations, solver, cache, args)
151+
nelements(solver, cache) == 0 && return nothing # return early when there are no elements
152+
@unpack unpacked_args = cache
153+
kernel! = rhs_fct_kernel!(backend)
154+
kernel!(unpacked_args, args,
155+
ndrange = nelements(solver, cache))
156+
return nothing
157+
end
158+
159+
@kernel function rhs_fct_kernel!(unpacked_args, args)
160+
element = @index(Global)
161+
rhs_fct_per_element(element, unpacked_args, args)
162+
end
163+
```

0 commit comments

Comments
 (0)