Skip to content

Commit b529a25

Browse files
committed
document kernel abstractions backend and runtime reselection system
1 parent be98cec commit b529a25

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -258,28 +258,28 @@ Here is the resulting movie when running the application on 8 GPUs, solving 3-D
258258
The corresponding file can be found [here](/examples/diffusion3D_multigpucpu_hidecomm.jl).
259259

260260
## Interactive prototyping with runtime hardware selection
261-
The KernelAbstractions backend keeps the familiar parse-time `@init_parallel_stencil` workflow while enabling runtime hardware switches through the `select_hardware` and `current_hardware` functions; the runtime hardware target defaults to CPU and can be switched as many times as desired during a session without requiring redefinition of kernels or reinitialization of the backend. The following copy-pasteable example outlines this workflow with a simple SAXPY kernel, demonstrating initial execution on CPU followed by a switch to CUDA GPU and a second execution there:
261+
The KernelAbstractions backend keeps the familiar parse-time `@init_parallel_stencil` workflow while enabling runtime hardware switches through the `select_hardware` and `current_hardware` functions; the runtime hardware target defaults to CPU and can be switched as many times as desired during a session without requiring redefinition of kernels or reinitialization of the backend. The following copy-pasteable example outlines this workflow with a simple SAXPY kernel, demonstrating initial execution on CPU followed by a switch to CUDA-capable GPU and a second execution there:
262262

263263
```julia
264-
# --- Session setup -------------------------------------------------------
264+
# --- Session setup -----------------------------------------------------
265265
using ParallelStencil
266266
@init_parallel_stencil(package=KernelAbstractions, numbertype=Float32) # 1 Initialize KernelAbstractions backend at parse time
267267
const N = 1024
268268
const α = 2.5
269269

270-
# --- Kernel definition ---------------------------------------------------
270+
# --- Kernel definition -------------------------------------------------
271271
@parallel_indices (i) function saxpy!(Y, α, X) # 2 Define a single time a hardware-agnostic SAXPY kernel
272272
Y[i] = α * X[i] + Y[i]
273273
return
274274
end
275275

276-
# --- First run on default runtime hardware (CPU) -------------------------
276+
# --- First run on default runtime hardware (CPU) -----------------------
277277
println("Current runtime hardware target: ", current_hardware()) # 3 Query current (default) runtime hardware target
278278
X = @rand(N) # 4 Allocate data on the current target
279279
Y = @rand(N) # 4 Allocate data on the current target
280280
@parallel saxpy!(Y, α, X) # 5 Launch kernel on the current target
281281

282-
# --- Reselect runtime hardware to CUDA GPU and run again --------------------------------
282+
# --- Reselect runtime hardware to CUDA-capable GPU and run again -------
283283
select_hardware(:gpu_cuda) # 6 Switch runtime hardware target to CUDA-capable GPU
284284
println("Current runtime hardware target: ", current_hardware()) # 7 Confirm the CUDA-capable GPU runtime hardware target
285285
X = @rand(N) # 8 Allocate data on the new target
@@ -472,7 +472,7 @@ Using simple array broadcasting capabilities both with GPU and CPU arrays within
472472
* [Hydro-mechanical porosity waves 2-D app](#hydro-mechanical-porosity-waves-2-d-app)
473473
* More to come, stay tuned...
474474

475-
All miniapp codes follow a similar structure and permit serial and threaded CPU as well as Nvidia GPU execution. The first line of each miniapp code permits to enable the CUDA GPU backend upon setting the `USE_GPU` flag to `true`.
475+
All miniapp codes follow a similar structure and permit serial and threaded CPU as well as Nvidia GPU execution. The first line of each miniapp code permits to enable the CUDA.jl GPU backend upon setting the `USE_GPU` flag to `true`.
476476

477477
All the miniapps can be interactively executed within the [Julia REPL] (this includes the multi-xPU versions when using a single CPU or GPU). Note that for optimal performance the miniapp script of interest `<miniapp_code>` should be launched from the shell using the project's dependencies `--project`, disabling array bound checking `--check-bounds=no`, and using optimization level 3 `-O3`.
478478
```sh

0 commit comments

Comments
 (0)