Skip to content

Commit 8079666

Browse files
committed
document the abstract execution interface
1 parent 048af42 commit 8079666

File tree

2 files changed

+76
-20
lines changed

2 files changed

+76
-20
lines changed

docs/src/index.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,46 @@
11
# GPUArrays Documentation
22

3+
4+
# Abstract GPU interface
5+
6+
GPUArrays supports different platforms like CUDA and OpenCL, which all have different
7+
names for function that offer the same functionality on the hardware.
8+
E.g. how to call a function on the GPU, how to get the thread index etc.
9+
GPUArrays offers an abstract interface for these functions which are overloaded
10+
by the packages like [CLArrays](https://github.com/JuliaGPU/CLArrays.jl) and [CuArrays](https://github.com/JuliaGPU/CuArrays.jl).
11+
This makes it possible to write generic code that can be run on all hardware.
12+
GPUArrays itself even contains a pure Julia implementation of this interface.
13+
The julia reference implementation is also a great way to debug your GPU code, since it
14+
offers many more errors and debugging information compared to the GPU backends - which
15+
mostly silently error or give cryptic errors (so far).
16+
You can use the reference implementation by using the `GPUArrays.JLArray` type.
17+
18+
The functions that are currently part of the interface:
19+
20+
The low level dim + idx function, with a similar naming as in CUDA (with `*` indicating `(x, y, z)`):
21+
```Julia
22+
blockidx_*(state), blockdim_*(state), threadidx_*(state), griddim_*(state)
23+
# Known in OpenCL as:
24+
get_group_id, get_local_size, get_local_id, get_num_groups
25+
```
26+
327
```@docs
428
gpu_call(f, A::GPUArray, args::Tuple, configuration = length(A))
29+
30+
31+
linear_index(state)
32+
33+
global_size(state)
34+
35+
@linearidx(A, statesym = :state)
36+
37+
@cartesianidx(A, statesym = :state)
38+
39+
40+
synchronize_threads(state)
41+
42+
43+
device(A::AbstractArray)
44+
synchronize(A::AbstractArray)
45+
546
```

src/abstract_gpu_interface.jl

Lines changed: 35 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -25,45 +25,60 @@ linear index in a GPU kernel (equal to OpenCL.get_global_id)
2525
@inline function linear_index(state)
2626
UInt32((blockidx_x(state) - UInt32(1)) * blockdim_x(state) + threadidx_x(state))
2727
end
28+
29+
"""
30+
Macro form of `linear_index`, which returns when out of bounds
31+
"""
32+
macro linearidx(A, statesym = :state)
33+
quote
34+
x1 = $(esc(A))
35+
i1 = linear_index($(esc(statesym)))
36+
i1 > length(x1) && return
37+
i1
38+
end
39+
end
40+
41+
42+
"""
43+
Like `@linearidx`, but returns an N-dimensional `NTuple{ndim(A), Cuint}` as index
44+
"""
45+
macro cartesianidx(A, statesym = :state)
46+
quote
47+
x = $(esc(A))
48+
i2 = @linearidx(x, $(esc(statesym)))
49+
gpu_ind2sub(x, i2)
50+
end
51+
end
52+
53+
"""
54+
Global size == blockdim * griddim == total number of kernel execution
55+
"""
2856
@inline function global_size(state)
57+
# TODO nd version
2958
griddim_x(state) * blockdim_x(state)
3059
end
3160

61+
3262
"""
33-
Blocks until all operations are finished on `A`
63+
Gets the device associated to the Array `A`
3464
"""
35-
function synchronize(A::AbstractArray)
65+
function device(A::AbstractArray)
3666
# fallback is a noop, for backends not needing synchronization. This
3767
# makes it easier to write generic code that also works for AbstractArrays
3868
end
3969
"""
40-
Gets the device associated to the Array `A`
70+
Blocks until all operations are finished on `A`
4171
"""
42-
function device(A::AbstractArray)
72+
function synchronize(A::AbstractArray)
4373
# fallback is a noop, for backends not needing synchronization. This
4474
# makes it easier to write generic code that also works for AbstractArrays
4575
end
46-
4776
#
4877
# @inline function synchronize_threads(state)
4978
# CUDAnative.__syncthreads()
5079
# end
5180

52-
macro linearidx(A, statesym = :state)
53-
quote
54-
x1 = $(esc(A))
55-
i1 = linear_index($(esc(statesym)))
56-
i1 > length(x1) && return
57-
i1
58-
end
59-
end
60-
macro cartesianidx(A, statesym = :state)
61-
quote
62-
x = $(esc(A))
63-
i2 = @linearidx(x, $(esc(statesym)))
64-
gpu_ind2sub(x, i2)
65-
end
66-
end
81+
6782

6883

6984
"""

0 commit comments

Comments
 (0)