Skip to content

Commit 6156725

Browse files
committed
add documentation
1 parent 1b564e4 commit 6156725

File tree

4 files changed

+107
-34
lines changed

4 files changed

+107
-34
lines changed

README.md

Lines changed: 31 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,9 @@ CUDAnative.jl is using (via LLVM + SPIR-V).
1414

1515
# Why another GPU array package in yet another language?
1616

17-
Julia offers countless advantages for a GPU array package.
17+
Julia offers great advantages for programming the GPU.
18+
This [blog post](http://mikeinnes.github.io/2017/08/24/cudanative.html) outlines a few of those.
19+
1820
E.g., we can use Julia's JIT to generate optimized kernels for map/broadcast operations.
1921

2022
This works even for things like complex arithmetic, since we can compile what's already in Julia Base.
@@ -45,15 +47,6 @@ Checkout the examples, to see how this can be used to emit specialized code whil
4547
In theory, we could go as far as inspecting user defined callbacks (we can get the complete AST), count operations and estimate register usage and use those numbers to optimize our kernels!
4648

4749

48-
### Automatic Differentiation
49-
50-
Because of neural networks, automatic differentiation is super hyped right now!
51-
Julia offers a couple of packages for that, e.g. [ReverseDiff](https://github.com/JuliaDiff/ReverseDiff.jl).
52-
It heavily relies on Julia's strength to specialize generic code and dispatch to different implementations depending on the Array type, allowing an almost overheadless automatic differentiation.
53-
Making this work with GPUArrays will be a bit more involved, but the
54-
first [prototype](https://github.com/JuliaGPU/GPUArrays.jl/blob/master/examples/logreg.jl) looks already promising!
55-
There is also [ReverseDiffSource](https://github.com/JuliaDiff/ReverseDiffSource.jl), which should already work for simple functions.
56-
5750
# Scope
5851

5952
Current backends: OpenCL, CUDA, Julia Threaded
@@ -80,10 +73,33 @@ gemm!, scal!, gemv! and the high level functions that are implemented with these
8073

8174
# Usage
8275

76+
A backend will be initialized by default,
77+
but can be explicitly set with opencl(), cudanative(), threaded().
78+
There is also GPUArrays.init(device_symbol, filterfuncs...), which can be used to programatically
79+
initialize a backend.
80+
Filterfuncs can be used to select a device like this (`opencl()`, etc also support those):
81+
```Julia
82+
Pkg.init(:cudanative, is_gpu, dev-> has_atleast(dev, threads, 512))
83+
```
84+
You can also temporarily create a context on the currently selected backend with this construct:
85+
```Julia
86+
on_device([device = GPUArrays.current_device()]) do context
87+
A = GPUArray(rand(Float32, 32, 32))
88+
c = A .+ A
89+
end
90+
```
91+
Or you can run some code on all currently available devices like this:
92+
93+
```Julia
94+
forall_devices(filterfuncs...) do context
95+
A = GPUArray(rand(Float32, 32, 32))
96+
c = A .+ A
97+
end
98+
```
99+
100+
83101
```Julia
84102
using GPUArrays
85-
# A backend will be initialized by default on first call to the GPUArray constructor
86-
# But can be explicitely called like e.g.: CLBackend.init(), CUBackend.init(), JLBackend.init()
87103

88104
a = GPUArray(rand(Float32, 32, 32)) # can be constructed from any Julia Array
89105
b = similar(a) # similar and other Julia.Base operations are defined
@@ -103,12 +119,12 @@ gpu_call(kernel::Function, DispatchDummy::GPUArray, args::Tuple, global_size = l
103119
with kernel looking like this:
104120

105121
function kernel(state, arg1, arg2, arg3) # args get splatted into the kernel call
106-
# state gets always passed as the first argument and is needed to offer the same
122+
# state gets always passed as the first argument and is needed to offer the same
107123
# functionality across backends, even though they have very different ways of of getting e.g. the thread index
108124
# arg1 can be any gpu array - this is needed to dispatch to the correct intrinsics.
109-
# if you call gpu_call without any further modifications to global/local size, this should give you a linear index into
125+
# if you call gpu_call without any further modifications to global/local size, this should give you a linear index into
110126
# DispatchDummy
111-
idx = linear_index(state, arg1::GPUArray)
127+
idx = linear_index(state, arg1::GPUArray)
112128
arg1[idx] = arg2[idx] + arg3[idx]
113129
return #kernel must return void
114130
end

src/backends/backends.jl

Lines changed: 74 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ end
1313

1414
#interface
1515
function create_buffer(ctx, array) end
16+
1617
"""
1718
Blocks until all operations are finished on `A`
1819
"""
@@ -35,14 +36,39 @@ free(x::AbstractArray) = nothing
3536
#=
3637
Functions to select contexts
3738
=#
38-
39+
"""
40+
Hardware threads of device
41+
"""
3942
threads(device) = 0
40-
blocks(device) = 0
43+
44+
"""
45+
Blocks that group together hardware threads
46+
"""
47+
blocks(device) = 1
48+
"""
49+
Global memory, e.g. VRAM or RAM of device
50+
"""
4151
global_memory(device) = 0
52+
53+
"""
54+
Free global memory. Isn't supported for AMD cards right now, in which case it returns NaN,
55+
so don't rely on the output of this function.
56+
"""
4257
free_global_memory(device) = NaN
58+
59+
"""
60+
Block local memory
61+
"""
4362
local_memory(device) = 0
63+
64+
"""
65+
Hardware name of a device
66+
"""
4467
name(device) = "Undefined"
4568

69+
"""
70+
Summarizes all features of a device and prints it to `io`
71+
"""
4672
function device_summary(io::IO, device)
4773
println(io, "Device: ", name(device))
4874
for (n, f) in (:threads => threads, :blocks => blocks)
@@ -56,8 +82,20 @@ end
5682

5783
################################
5884
# Device selection functions for e.g. devices(filterfuncs)
85+
"""
86+
Returns true if `device` is a gpu
87+
"""
5988
is_gpu(device) = false
89+
90+
"""
91+
Returns true if `device` is a cpu
92+
"""
6093
is_cpu(device) = false
94+
95+
"""
96+
Checks a device for a certain attribute and returns true if it has at least `value`.
97+
Can be used with e.g. `threads`, `blocks`, `global_memory`, `local_memory`
98+
"""
6199
has_atleast(device, attribute, value) = attribute(ctx_or_device) >= value
62100

63101

@@ -74,12 +112,29 @@ is_cudanative(ctx) = false
74112
is_julia(ctx) = false
75113
is_opengl(ctx) = false
76114

115+
const filterfuncs = """
116+
Device can be filtered by passing `filter_funcs`, e.g. :
117+
`is_gpu`, `is_cpu`, `(dev)-> has_atleast(dev, threads, 512)`
118+
"""
77119

120+
"""
121+
Initializes the opencl backend with a default device.
122+
$filterfuncs
123+
"""
78124
opencl(filterfuncs...) = init(:opencl, filterfuncs...)
125+
126+
"""
127+
Initializes the cudanative backend with a default device.
128+
$filterfuncs
129+
"""
79130
cudanative(filterfuncs...) = init(:cudanative, filterfuncs...)
131+
"""
132+
Initializes the threaded backend with a default device.
133+
$filterfuncs
134+
"""
80135
threaded(filterfuncs...) = init(:threaded, filterfuncs...)
81136

82-
export opencl, cudanative, threaded
137+
83138

84139
"""
85140
Creates a new context from `device` without caching the resulting context.
@@ -88,6 +143,13 @@ function new_context(device)
88143
error("Device $device not supported")
89144
end
90145

146+
"""
147+
Destroys context, freeing all it's resources.
148+
"""
149+
function destroy!(context)
150+
error("Device $context not supported")
151+
end
152+
91153
"""
92154
Resets a context freeing all resources and creating a new context.
93155
"""
@@ -162,14 +224,17 @@ Context gets destroyed afterwards. Note, that creating a temporary context is ex
162224
"""
163225
function on_device(f, device = current_device())
164226
ctx = new_context(device)
165-
f(ctx)
166-
destroy!(ctx)
227+
try
228+
f(ctx)
229+
finally
230+
destroy!(ctx)
231+
end
167232
return
168233
end
169234

170235
"""
171236
Returns all devices for the current backend.
172-
Can be filtered by passing `filter_funcs`, e.g. `is_gpu`, `is_cpu`, `(dev)-> has_atleast(dev, threads, 512)`
237+
$filterfuncs
173238
"""
174239
function available_devices(filter_funcs...)
175240
result = []
@@ -184,7 +249,7 @@ end
184249

185250
"""
186251
Returns all devices from `backends = active_backends()`.
187-
Can be filtered by passing `filter_funcs`, e.g. `is_gpu`, `is_cpu`, `dev-> has_atleast(dev, threads, 512)`
252+
$filterfuncs
188253
"""
189254
function all_devices(filter_funcs...; backends = active_backends())
190255
result = []
@@ -198,15 +263,6 @@ function all_devices(filter_funcs...; backends = active_backends())
198263
result
199264
end
200265

201-
"""
202-
Iterates through all backends and calls `f` after initializing the current one!
203-
"""
204-
function perbackend(f)
205-
for backend in supported_backends()
206-
ctx = GPUArrays.init(backend)
207-
f(ctx)
208-
end
209-
end
210266

211267
"""
212268
Iterates through all available devices and calls `f(context)` after initializing the standard context for that device.
@@ -219,4 +275,5 @@ function forall_devices(f, filterfuncs...)
219275
end
220276

221277

222-
export is_cudanative, is_julia, is_opencl
278+
export is_cudanative, is_julia, is_opencl, on_device
279+
export opencl, cudanative, threaded

src/backends/cudanative/cudanative.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ let contexts = Dict{CUDAdrv.CuDevice, CUContext}(), active_device = CUDAdrv.CuDe
8383
ctx
8484
end
8585

86-
function destroy!(context::CUContext)
86+
function GPUArrays.destroy!(context::CUContext)
8787
# don't destroy primary device context
8888
dev = context.device
8989
if haskey(contexts, dev) && contexts[dev] == context

src/backends/opencl/opencl.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ let contexts = Dict{cl.Device, CLContext}(), active_device = cl.Device[]
7777
ctx
7878
end
7979

80-
function destroy!(context::CLContext)
80+
function GPUArrays.destroy!(context::CLContext)
8181
# don't destroy primary device context
8282
dev = context.device
8383
if haskey(contexts, dev) && contexts[dev] == context

0 commit comments

Comments
 (0)