Skip to content

Commit 7e330b9

Browse files
committed
Update docs.
1 parent b841f66 commit 7e330b9

File tree

8 files changed

+72
-86
lines changed

8 files changed

+72
-86
lines changed

docs/src/interface.md

Lines changed: 33 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -5,83 +5,61 @@ implement the interfaces listed on this page. GPUArrays is design around having
55
different array types to represent a GPU array: one that only ever lives on the host, and
66
one that actually can be instantiated on the device (i.e. in kernels).
77

8-
## Host-side
98

10-
Your host-side array type should build on the `AbstractGPUArray` supertype:
9+
## Device functionality
1110

12-
```@docs
13-
AbstractGPUArray
14-
```
15-
16-
First of all, you should implement operations that are expected to be defined for any
17-
`AbstractArray` type. Refer to the Julia manual for more details, or look at the `JLArray`
18-
reference implementation.
19-
20-
To be able to actually use the functionality that is defined for `AbstractGPUArray`s, you
21-
should provide implementations of the following interfaces:
11+
Several types and interfaces are related to the device and execution of code on it. First of
12+
all, you need to provide a type that represents your device and exposes some properties of
13+
it:
2214

2315
```@docs
24-
GPUArrays.unsafe_reinterpret
25-
```
26-
27-
### Devices
28-
29-
```@docs
30-
GPUArrays.device
31-
GPUArrays.synchronize
16+
GPUArrays.AbstractGPUDevice
17+
GPUArrays.threads
3218
```
3319

34-
### Execution
20+
Another important set of interfaces relates to executing code on the device:
3521

3622
```@docs
3723
GPUArrays.AbstractGPUBackend
38-
GPUArrays.backend
39-
```
40-
41-
```@docs
42-
GPUArrays._gpu_call
24+
GPUArrays.AbstractKernelContext
25+
GPUArrays.gpu_call
26+
GPUArrays.synchronize
27+
GPUArrays.thread_block_heuristic
4328
```
4429

45-
### Linear algebra
30+
Finally, you need to provide implementations of certain methods that will be executed on the
31+
device itself:
4632

4733
```@docs
48-
GPUArrays.blas_module
49-
GPUArrays.blasbuffer
34+
GPUArrays.AbstractDeviceArray
35+
GPUArrays.LocalMemory
36+
GPUArrays.synchronize_threads
37+
GPUArrays.blockidx
38+
GPUArrays.blockdim
39+
GPUArrays.threadidx
40+
GPUArrays.griddim
5041
```
5142

5243

53-
## Device-side
44+
## Host abstractions
5445

55-
To work with GPU memory on the device itself, e.g. within a kernel, we need a different
56-
type: Most functionality will behave differently when running on the GPU, e.g., accessing
57-
memory directly instead of copying it to the host. We should also take care not to call into
58-
any host library, such as the Julia runtime or the system's math library.
46+
You should provide an array type that builds on the `AbstractGPUArray` supertype:
5947

6048
```@docs
61-
AbstractDeviceArray
49+
AbstractGPUArray
6250
```
6351

64-
Your device array type should again implement the core elements of the `AbstractArray`
65-
interface, such as indexing and certain getters. Refer to the Julia manual for more details,
66-
or look at the `JLDeviceArray` reference implementation.
52+
First of all, you should implement operations that are expected to be defined for any
53+
`AbstractArray` type. Refer to the Julia manual for more details, or look at the `JLArray`
54+
reference implementation.
6755

68-
You should also provide implementations of several "GPU intrinsics". To make sure the
69-
correct implementation is called, the first argument to these intrinsics will be the kernel
70-
state object from before.
56+
To be able to actually use the functionality that is defined for `AbstractGPUArray`s, you
57+
should provide implementations of the following interfaces:
7158

7259
```@docs
73-
GPUArrays.LocalMemory
74-
GPUArrays.synchronize_threads
75-
GPUArrays.blockidx_x
76-
GPUArrays.blockidx_y
77-
GPUArrays.blockidx_z
78-
GPUArrays.blockdim_x
79-
GPUArrays.blockdim_y
80-
GPUArrays.blockdim_z
81-
GPUArrays.threadidx_x
82-
GPUArrays.threadidx_y
83-
GPUArrays.threadidx_z
84-
GPUArrays.griddim_x
85-
GPUArrays.griddim_y
86-
GPUArrays.griddim_z
60+
GPUArrays.backend
61+
GPUArrays.device
62+
GPUArrays.unsafe_reinterpret
63+
GPUArrays.blas_module
64+
GPUArrays.blasbuffer
8765
```

src/GPUArrays.jl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,13 +15,13 @@ using Adapt
1515
# device functionality
1616
include("device/device.jl")
1717
include("device/execution.jl")
18-
## on-device
18+
## executed on-device
1919
include("device/abstractarray.jl")
2020
include("device/indexing.jl")
2121
include("device/memory.jl")
2222
include("device/synchronization.jl")
2323

24-
# host array abstraction
24+
# host abstractions
2525
include("host/abstractarray.jl")
2626
include("host/construction.jl")
2727
## integrations and specialized methods

src/device/device.jl

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,13 @@ export AbstractGPUDevice
44

55
abstract type AbstractGPUDevice end
66

7+
"""
8+
device(A::AbstractArray)
9+
10+
Gets the device associated to the Array `A`
11+
"""
12+
device(A::AbstractArray) = error("This array is not a GPU array") # COV_EXCL_LINE
13+
714
"""
815
Hardware threads of device
916
"""

src/device/execution.jl

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,12 @@ abstract type AbstractGPUBackend end
66

77
abstract type AbstractKernelContext end
88

9-
backend(::Type{T}) where T = error("Can't choose GPU backend for $T")
9+
"""
10+
backend(T::Type{<:AbstractArray})
11+
12+
Gets the GPUArrays back-end responsible for managing arrays of type `T`.
13+
"""
14+
backend(::Type{<:AbstractArray}) = error("This array is not a GPU array") # COV_EXCL_LINE
1015

1116
"""
1217
gpu_call(kernel::Function, A::AbstractGPUArray, args::Tuple, configuration = length(A))

src/host/abstractarray.jl

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,8 @@ const AbstractGPUVector{T} = AbstractGPUArray{T, 1}
1515
const AbstractGPUMatrix{T} = AbstractGPUArray{T, 2}
1616
const AbstractGPUVecOrMat{T} = Union{AbstractGPUArray{T, 1}, AbstractGPUArray{T, 2}}
1717

18-
"""
19-
device(A::AbstractArray)
20-
21-
Gets the device associated to the Array `A`
22-
"""
23-
device(A::AbstractArray) = error("Not implemented") # COV_EXCL_LINE
18+
device(::AbstractGPUDevice) = error("Not implemented") # COV_EXCL_LINE
19+
backend(::Type{<:AbstractGPUDevice}) = error("Not implemented") # COV_EXCL_LINE
2420

2521

2622
# input/output

src/reference.jl

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -60,8 +60,8 @@ function GPUArrays._gpu_call(::JLBackend, f, A, args::Tuple, blocks_threads::Tup
6060
for blockidx in 1:blocks
6161
ctx.blockidx = blockidx
6262
for threadidx in 1:threads
63-
thread_state = JLKernelContext(ctx, threadidx)
64-
tasks[threadidx] = @async @allowscalar f(thread_state, device_args...)
63+
thread_ctx = JLKernelContext(ctx, threadidx)
64+
tasks[threadidx] = @async @allowscalar f(thread_ctx, device_args...)
6565
# TODO: require 1.3 and use Base.Threads.@spawn for actual multithreading
6666
# (this would require a different synchronization mechanism)
6767
end
@@ -73,7 +73,7 @@ function GPUArrays._gpu_call(::JLBackend, f, A, args::Tuple, blocks_threads::Tup
7373
end
7474

7575

76-
## on-device
76+
## executed on-device
7777

7878
# array type
7979

@@ -128,7 +128,7 @@ end
128128

129129

130130
#
131-
# Host array abstraction
131+
# Host abstractions
132132
#
133133

134134
struct JLArray{T, N} <: AbstractGPUArray{T, N}

test/testsuite/base.jl

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,20 +6,20 @@ function cartesian_iter(state, A, res, Asize)
66
return
77
end
88

9-
function clmap!(state, f, out, b)
10-
i = linear_index(state) # get the kernel index it gets scheduled on
9+
function clmap!(ctx, f, out, b)
10+
i = linear_index(ctx) # get the kernel index it gets scheduled on
1111
out[i] = f(b[i])
1212
return
1313
end
1414

15-
function ntuple_test(state, result, ::Val{N}) where N
15+
function ntuple_test(ctx, result, ::Val{N}) where N
1616
result[1] = ntuple(Val(N)) do i
1717
Float32(i) * 77f0
1818
end
1919
return
2020
end
2121

22-
function ntuple_closure(state, result, ::Val{N}, testval) where N
22+
function ntuple_closure(ctx, result, ::Val{N}, testval) where N
2323
result[1] = ntuple(Val(N)) do i
2424
Float32(i) * testval
2525
end

test/testsuite/gpuinterface.jl

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -3,44 +3,44 @@ function test_gpuinterface(AT)
33
N = 10
44
x = AT(Vector{Int}(undef, N))
55
x .= 0
6-
gpu_call(x, (x,)) do state, x
7-
x[linear_index(state)] = 2
6+
gpu_call(x, (x,)) do ctx, x
7+
x[linear_index(ctx)] = 2
88
return
99
end
1010
@test all(x-> x == 2, Array(x))
1111

12-
gpu_call(x, (x,), N) do state, x
13-
x[linear_index(state)] = 2
12+
gpu_call(x, (x,), N) do ctx, x
13+
x[linear_index(ctx)] = 2
1414
return
1515
end
1616
@test all(x-> x == 2, Array(x))
1717
configuration = ((N ÷ 2,), (2,))
18-
gpu_call(x, (x,), configuration) do state, x
19-
x[linear_index(state)] = threadidx(state)
18+
gpu_call(x, (x,), configuration) do ctx, x
19+
x[linear_index(ctx)] = threadidx(ctx)
2020
return
2121
end
2222
@test Array(x) == [1,2,1,2,1,2,1,2,1,2]
2323

24-
gpu_call(x, (x,), configuration) do state, x
25-
x[linear_index(state)] = blockidx(state)
24+
gpu_call(x, (x,), configuration) do ctx, x
25+
x[linear_index(ctx)] = blockidx(ctx)
2626
return
2727
end
2828
@test Array(x) == [1, 1, 2, 2, 3, 3, 4, 4, 5, 5]
2929
x2 = AT([0])
30-
gpu_call(x, (x2,), configuration) do state, x
31-
x[1] = blockdim(state)
30+
gpu_call(x, (x2,), configuration) do ctx, x
31+
x[1] = blockdim(ctx)
3232
return
3333
end
3434
@test Array(x2) == [2]
3535

36-
gpu_call(x, (x2,), configuration) do state, x
37-
x[1] = griddim(state)
36+
gpu_call(x, (x2,), configuration) do ctx, x
37+
x[1] = griddim(ctx)
3838
return
3939
end
4040
@test Array(x2) == [5]
4141

42-
gpu_call(x, (x2,), configuration) do state, x
43-
x[1] = global_size(state)
42+
gpu_call(x, (x2,), configuration) do ctx, x
43+
x[1] = global_size(ctx)
4444
return
4545
end
4646
@test Array(x2) == [10]

0 commit comments

Comments
 (0)