Skip to content

Commit 26b32bf

Browse files
authored
Merge pull request #55 from JuliaGPU/sd/devices
better device api + fixes
2 parents 4f35ef0 + 8671151 commit 26b32bf

21 files changed

+555
-340
lines changed

README.md

Lines changed: 31 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,9 @@ CUDAnative.jl is using (via LLVM + SPIR-V).
1414

1515
# Why another GPU array package in yet another language?
1616

17-
Julia offers countless advantages for a GPU array package.
17+
Julia offers great advantages for programming the GPU.
18+
This [blog post](http://mikeinnes.github.io/2017/08/24/cudanative.html) outlines a few of those.
19+
1820
E.g., we can use Julia's JIT to generate optimized kernels for map/broadcast operations.
1921

2022
This works even for things like complex arithmetic, since we can compile what's already in Julia Base.
@@ -45,15 +47,6 @@ Checkout the examples, to see how this can be used to emit specialized code whil
4547
In theory, we could go as far as inspecting user defined callbacks (we can get the complete AST), count operations and estimate register usage and use those numbers to optimize our kernels!
4648

4749

48-
### Automatic Differentiation
49-
50-
Because of neural networks, automatic differentiation is super hyped right now!
51-
Julia offers a couple of packages for that, e.g. [ReverseDiff](https://github.com/JuliaDiff/ReverseDiff.jl).
52-
It heavily relies on Julia's strength to specialize generic code and dispatch to different implementations depending on the Array type, allowing an almost overheadless automatic differentiation.
53-
Making this work with GPUArrays will be a bit more involved, but the
54-
first [prototype](https://github.com/JuliaGPU/GPUArrays.jl/blob/master/examples/logreg.jl) looks already promising!
55-
There is also [ReverseDiffSource](https://github.com/JuliaDiff/ReverseDiffSource.jl), which should already work for simple functions.
56-
5750
# Scope
5851

5952
Current backends: OpenCL, CUDA, Julia Threaded
@@ -80,10 +73,33 @@ gemm!, scal!, gemv! and the high level functions that are implemented with these
8073

8174
# Usage
8275

76+
A backend will be initialized by default,
77+
but can be explicitly set with `opencl()`, `cudanative()`, `threaded()`.
78+
There is also `GPUArrays.init(device_symbol, filterfuncs...)`, which can be used to programmatically
79+
initialize a backend.
80+
Filterfuncs can be used to select a device like this (`opencl()`, etc also support those):
81+
```Julia
82+
Pkg.init(:cudanative, is_gpu, dev-> has_atleast(dev, threads, 512))
83+
```
84+
You can also temporarily create a context on the currently selected backend with this construct:
85+
```Julia
86+
on_device([device = GPUArrays.current_device()]) do context
87+
A = GPUArray(rand(Float32, 32, 32))
88+
c = A .+ A
89+
end
90+
```
91+
Or you can run some code on all currently available devices like this:
92+
93+
```Julia
94+
forall_devices(filterfuncs...) do context
95+
A = GPUArray(rand(Float32, 32, 32))
96+
c = A .+ A
97+
end
98+
```
99+
100+
83101
```Julia
84102
using GPUArrays
85-
# A backend will be initialized by default on first call to the GPUArray constructor
86-
# But can be explicitely called like e.g.: CLBackend.init(), CUBackend.init(), JLBackend.init()
87103

88104
a = GPUArray(rand(Float32, 32, 32)) # can be constructed from any Julia Array
89105
b = similar(a) # similar and other Julia.Base operations are defined
@@ -103,12 +119,12 @@ gpu_call(kernel::Function, DispatchDummy::GPUArray, args::Tuple, global_size = l
103119
with kernel looking like this:
104120

105121
function kernel(state, arg1, arg2, arg3) # args get splatted into the kernel call
106-
# state gets always passed as the first argument and is needed to offer the same
122+
# state gets always passed as the first argument and is needed to offer the same
107123
# functionality across backends, even though they have very different ways of of getting e.g. the thread index
108124
# arg1 can be any gpu array - this is needed to dispatch to the correct intrinsics.
109-
# if you call gpu_call without any further modifications to global/local size, this should give you a linear index into
125+
# if you call gpu_call without any further modifications to global/local size, this should give you a linear index into
110126
# DispatchDummy
111-
idx = linear_index(state, arg1::GPUArray)
127+
idx = linear_index(state, arg1::GPUArray)
112128
arg1[idx] = arg2[idx] + arg3[idx]
113129
return #kernel must return void
114130
end

REQUIRE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Matcha 0.0.2
88

99
CUDAnative 0.4.1 # llvm codegen fix
1010
CUDAdrv 0.5.1
11-
CUDArt
11+
CUDArt 0.4.0 # for cuda c compiler support
1212
CUBLAS 0.2.0 # for CUDAdrv support
1313
CUFFT
1414

deps/build.jl

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@ info("""
22
This process will figure out which acceleration Packages you have installed
33
and therefore which backends GPUArrays can offer.
44
Theoretically available:
5-
:cudanative, :julia, :opencl
5+
:cudanative, :threaded, :opencl
66
7-
:julia is the default backend, which should always work.
7+
:threaded is the default backend, which should always work.
88
Just start Julia with:
99
`JULIA_NUM_THREADS=8 julia -O3` to get it some threads.
1010
8 is just an example and should be chosen depending on the processor you have.
@@ -13,7 +13,7 @@ acceleration, you might as well want optimization level 3!
1313
In the future, OpenCL, CUDA and OpenGL will be added as another backend.
1414
""")
1515

16-
supported_backends = [:julia]
16+
supported_backends = [:threaded]
1717

1818
cudanative_dir = get(ENV, "CUDANATIVE_PATH", Pkg.dir("CUDAnative"))
1919
install_cudanative = true
@@ -41,7 +41,8 @@ if !isdir(cudanative_dir)
4141
end
4242

4343
# Julia will always be available
44-
info("julia added as a backend.")
44+
info("threaded backend added.")
45+
4546
test_kernel() = nothing
4647
try
4748
using CUDAnative, CUDAdrv

src/abstractarray.jl

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,6 @@ end
2020
#=
2121
Interface for accessing the lower level
2222
=#
23-
2423
buffer(A::AbstractAccArray) = A.buffer
2524
context(A::AbstractAccArray) = A.context
2625
default_buffer_type(typ, context) = error("Found unsupported context: $context")
@@ -66,7 +65,6 @@ function Base.similar{N, ET}(x::AbstractAccArray, ::Type{ET}, sz::NTuple{N, Int}
6665
end
6766

6867

69-
using Compat.TypeUtils
7068
function Base.similar{T <: GPUArray, ET, N}(
7169
::Type{T}, ::Type{ET}, sz::NTuple{N, Int};
7270
context::Context = current_context(), kw_args...
@@ -77,9 +75,6 @@ end
7775

7876

7977

80-
81-
82-
8378
#=
8479
Host to Device data transfers
8580
=#

0 commit comments

Comments
 (0)