Skip to content

Commit 59e4bc8

Browse files
committed
improve docs
1 parent c3815cf commit 59e4bc8

File tree

2 files changed

+89
-17
lines changed

2 files changed

+89
-17
lines changed

docs/src/index.md

Lines changed: 55 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,30 @@
11
# GPUArrays Documentation
22

3+
GPUArrays is an abstract interface for GPU computations.
4+
Think of it as the AbstractArray interface in Julia Base but for GPUs.
5+
It allows you to write generic julia code for all GPU platforms and implements common algorithms for the GPU.
6+
Like Julia Base, this includes BLAS wrapper, FFTs, maps, broadcasts and mapreduces.
7+
So when you inherit from GPUArrays and overload the interface correctly, you will get a lot
8+
of functionality for free.
9+
This will allow to have multiple GPUArray implementation for different purposes, while
10+
maximizing the ability to share code.
11+
Currently there are two packages implementing the interface namely [CLArrays](https://github.com/JuliaGPU/CLArrays.jl) and [CuArrays](https://github.com/JuliaGPU/CuArrays.jl).
12+
As the name suggests, the first implements the interface using OpenCL and the latter uses CUDA.
313

4-
# Abstract GPU interface
514

6-
GPUArrays supports different platforms like CUDA and OpenCL, which all have different
7-
names for function that offer the same functionality on the hardware.
8-
E.g. how to call a function on the GPU, how to get the thread index etc.
9-
GPUArrays offers an abstract interface for these functions which are overloaded
10-
by the packages like [CLArrays](https://github.com/JuliaGPU/CLArrays.jl) and [CuArrays](https://github.com/JuliaGPU/CuArrays.jl).
15+
16+
# The Abstract GPU interface
17+
18+
Different GPU computation frameworks like CUDA and OpenCL, have different
19+
names for accessing the same hardware functionality.
20+
E.g. how to launch a GPU Kernel, how to get the thread index and so forth.
21+
GPUArrays offers a unified abstract interface for these functions.
1122
This makes it possible to write generic code that can be run on all hardware.
12-
GPUArrays itself even contains a pure Julia implementation of this interface.
13-
The julia reference implementation is also a great way to debug your GPU code, since it
14-
offers many more errors and debugging information compared to the GPU backends - which
23+
GPUArrays itself even contains a pure [Julia implementation](https://github.com/JuliaGPU/GPUArrays.jl/blob/master/src/jlbackend.jl) of this interface.
24+
The julia reference implementation is a great way to debug your GPU code, since it
25+
offers more informative errors and debugging information compared to the GPU backends - which
1526
mostly silently error or give cryptic errors (so far).
27+
1628
You can use the reference implementation by using the `GPUArrays.JLArray` type.
1729

1830
The functions that are currently part of the interface:
@@ -25,6 +37,8 @@ blockidx_*(state), blockdim_*(state), threadidx_*(state), griddim_*(state)
2537
get_group_id, get_local_size, get_local_id, get_num_groups
2638
```
2739

40+
Higher level functionality:
41+
2842
```@docs
2943
gpu_call(f, A::GPUArray, args::Tuple, configuration = length(A))
3044
@@ -42,3 +56,35 @@ device(A::AbstractArray)
4256
4357
synchronize(A::AbstractArray)
4458
```
59+
60+
61+
# The abstract TestSuite
62+
63+
Since all array packages inheriting from GPUArrays need to offer the same functionality
64+
and interface, it makes sense to test them in the same way.
65+
This is why GPUArrays contains a test suite which can be called with the array type
66+
you want to test.
67+
68+
You can run the test suite like this:
69+
70+
```@example
71+
using GPUArrays, GPUArrays.TestSuite
72+
TestSuite.run_tests(MyGPUArrayType)
73+
```
74+
If you don't want to run the whole suite, you can also run parts of it:
75+
76+
77+
```@example
78+
Typ = JLArray
79+
GPUArrays.allowslow(false) # fail tests when slow indexing path into Array type is used.
80+
81+
TestSuite.run_gpuinterface(Typ) # interface functions like gpu_call, threadidx, etc
82+
TestSuite.run_base(Typ) # basic functionality like launching a kernel on the GPU and Base operations
83+
TestSuite.run_blas(Typ) # tests the blas interface
84+
TestSuite.run_broadcasting(Typ) # tests the broadcasting implementation
85+
TestSuite.run_construction(Typ) # tests all kinds of different ways of constructing the array
86+
TestSuite.run_fft(Typ) # fft tests
87+
TestSuite.run_linalg(Typ) # linalg function tests
88+
TestSuite.run_mapreduce(Typ) # mapreduce sum, etc
89+
TestSuite.run_indexing(Typ) # indexing tests
90+
```

src/abstract_gpu_interface.jl

Lines changed: 34 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -13,24 +13,40 @@ end
1313

1414

1515
"""
16+
synchronize_threads(state)
17+
1618
in CUDA terms `__synchronize`
19+
in OpenCL terms: `barrier(CLK_LOCAL_MEM_FENCE)`
1720
"""
1821
function synchronize_threads(state)
1922
error("Not implemented")
2023
end
2124

2225

2326
"""
24-
inear_index(state)
27+
linear_index(state)
28+
29+
linear index corresponding to each kernel launch (in OpenCL equal to get_global_id).
2530
26-
linear index in a GPU kernel (equal to OpenCL.get_global_id)
2731
"""
2832
@inline function linear_index(state)
2933
UInt32((blockidx_x(state) - UInt32(1)) * blockdim_x(state) + threadidx_x(state))
3034
end
3135

3236
"""
33-
Macro form of `linear_index`, which returns when out of bounds
37+
linearidx(A, statesym = :state)
38+
39+
Macro form of `linear_index`, which calls return when out of bounds.
40+
So it can be used like this:
41+
```
42+
function kernel(state, A)
43+
idx = @linear_index A state
44+
# from here on it's save to index into A with idx
45+
@inbounds begin
46+
A[idx] = ...
47+
end
48+
end
49+
```
3450
"""
3551
macro linearidx(A, statesym = :state)
3652
quote
@@ -43,6 +59,8 @@ end
4359

4460

4561
"""
62+
cartesianidx(A, statesym = :state)
63+
4664
Like `@linearidx`, but returns an N-dimensional `NTuple{ndim(A), Cuint}` as index
4765
"""
4866
macro cartesianidx(A, statesym = :state)
@@ -54,22 +72,28 @@ macro cartesianidx(A, statesym = :state)
5472
end
5573

5674
"""
75+
global_size(state)
76+
5777
Global size == blockdim * griddim == total number of kernel execution
5878
"""
5979
@inline function global_size(state)
6080
# TODO nd version
6181
griddim_x(state) * blockdim_x(state)
6282
end
6383

64-
6584
"""
85+
device(A::AbstractArray)
86+
6687
Gets the device associated to the Array `A`
6788
"""
6889
function device(A::AbstractArray)
6990
# fallback is a noop, for backends not needing synchronization. This
7091
# makes it easier to write generic code that also works for AbstractArrays
7192
end
93+
7294
"""
95+
synchronize(A::AbstractArray)
96+
7397
Blocks until all operations are finished on `A`
7498
"""
7599
function synchronize(A::AbstractArray)
@@ -85,15 +109,17 @@ end
85109

86110

87111
"""
112+
gpu_call(f, A::GPUArray, args::Tuple, configuration = length(A))
113+
88114
Calls function `f` on the GPU.
89115
`A` must be an GPUArray and will help to dispatch to the correct GPU backend
90116
and supplies queues and contexts.
91-
Calls kernel with `kernel(state, args...)`, where state is dependant on the backend
92-
and can be used for e.g getting an index into A with `linear_index(state)`.
93-
Optionally, launch configuration can be supplied in the following way:
117+
Calls the kernel function with `kernel(state, args...)`, where state is dependant on the backend
118+
and can be used for getting an index into `A` with `linear_index(state)`.
119+
Optionally, a launch configuration can be supplied in the following way:
94120
95121
1) A single integer, indicating how many work items (total number of threads) you want to launch.
96-
in this case `linear_index(state)` will be a number in the range 1:configuration
122+
in this case `linear_index(state)` will be a number in the range `1:configuration`
97123
2) Pass a tuple of integer tuples to define blocks and threads per blocks!
98124
99125
"""

0 commit comments

Comments
 (0)