You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -44,7 +47,7 @@ In theory, we could go as far as inspecting user defined callbacks (we can get t
44
47
45
48
### Automatic Differentiation
46
49
47
-
Because of neuronal netorks, automatic differentiation is super hyped right now!
50
+
Because of neural networks, automatic differentiation is super hyped right now!
48
51
Julia offers a couple of packages for that, e.g. [ReverseDiff](https://github.com/JuliaDiff/ReverseDiff.jl).
49
52
It heavily relies on Julia's strength to specialize generic code and dispatch to different implementations depending on the Array type, allowing an almost overheadless automatic differentiation.
50
53
Making this work with GPUArrays will be a bit more involved, but the
@@ -55,33 +58,25 @@ There is also [ReverseDiffSource](https://github.com/JuliaDiff/ReverseDiffSource
55
58
56
59
Current backends: OpenCL, CUDA, Julia Threaded
57
60
58
-
Planned backends: OpenGL, Vulkan
59
-
60
61
Implemented for all backends:
61
62
62
63
```Julia
63
64
map(f, ::GPUArray...)
64
65
map!(f, dest::GPUArray, ::GPUArray...)
65
66
66
-
# maps
67
-
mapidx(f, A::GPUArray, args...) do idx, a, args...
68
-
# e.g
69
-
if idx <length(A)
70
-
a[idx+1] = a[idx]
71
-
end
72
-
end
73
-
74
-
75
67
broadcast(f, ::GPUArray...)
76
68
broadcast!(f, dest::GPUArray, ::GPUArray...)
77
69
78
-
# calls `f` on args, with queues, block heuristics and context taken from `array`
79
-
# f can be a julia function or a tuple (String, Symbol),
80
-
# being a C kernel source string + the name of the kernel function
81
-
gpu_call(array::GPUArray, f, args::Tuple)
70
+
mapreduce(f, op, ::GPUArray...) # so support for sum/mean/minimum etc comes for free
When you program with GPUArrays, you can just write normal julia functions, feed them to gpu_call and depending on what backend you choose it will use Transpiler.jl or CUDAnative.
100
+
"""
101
+
#Signature, global_size == cuda blocks, local size == cuda threads
functionkernel(state, arg1, arg2, arg3) # args get splatted into the kernel call
106
+
# state gets always passed as the first argument and is needed to offer the same
107
+
# functionality across backends, even though they have very different ways of of getting e.g. the thread index
108
+
# arg1 can be any gpu array - this is needed to dispatch to the correct intrinsics.
109
+
# if you call gpu_call without any further modifications to global/local size, this should give you a linear index into
110
+
# DispatchDummy
111
+
idx =linear_index(state, arg1::GPUArray)
112
+
arg1[idx] = arg2[idx] + arg3[idx]
113
+
return#kernel must return void
114
+
end
103
115
```
116
+
Example for [gpu_call](https://github.com/JuliaGPU/GPUArrays.jl/blob/master/examples/custom_kernels.jl)
104
117
105
-
CLFFT, CUFFT, CLBLAS and CUBLAS will soon be supported.
106
-
A prototype of generic support of these libraries can be found in [blas.jl](https://github.com/JuliaGPU/GPUArrays.jl/blob/master/src/backends/blas.jl).
107
-
The OpenCL backend already supports mat mul via `CLBLAS.gemm!` and `fft!`/`ifft!`.
108
-
CUDAnative could support these easily as well, but we currently run into problems with the interactions of `CUDAdrv` and `CUDArt`.
109
-
110
-
111
-
# Benchmarks
112
-
113
-
We have only benchmarked Blackscholes and not much time has been spent to optimize our kernels yet.
working with immutable isbits (not containing pointers) type should be completely supported
121
+
non allocating code (so no constructs like `x = [1, 2, 3]`). Note that tuples are isbits, so this works x = (1, 2, 3).
122
+
Transpiler/OpenCL has problems with putting GPU arrays on the gpu into a struct - so no views and actually no multidimensional indexing. For that `size` is needed which would need to be part of the array struct. A fix for that is in sight, though.
139
123
140
124
# TODO / up for grabs
141
125
142
-
* stencil operations
126
+
* stencil operations, convolutions
143
127
* more tests and benchmarks
144
128
* tests, that only switch the backend but use the same code
* interop between OpenCL, CUDA and OpenGL is there as a protype, but needs proper hooking up via `Base.copy!` / `convert`
148
-
* share implementation of broadcast etc between backends. Currently they don't, since there are still subtle differences which should be eliminated over time!
149
131
150
132
151
133
# Installation
152
134
153
-
For the cudanative backend, you need to install [CUDAnative.jl manually](https://github.com/JuliaGPU/CUDAnative.jl/#installation).
154
-
The cudanative backend only works on 0.6, while the other backends also support Julia 0.5.
155
-
Make sure to have CUDA and OpenCL driver installed correctly.
135
+
I recently added a lot of features and bug fixes to the master branch, so you might want to check that out (`Pkg.checkout("GPUArrays")`).
136
+
137
+
For the cudanative backend, you need to install [CUDAnative.jl manually](https://github.com/JuliaGPU/CUDAnative.jl/#installation) and it works only on osx + linux with a julia source build.
138
+
Make sure to have either CUDA and/or OpenCL drivers installed correctly.
156
139
`Pkg.build("GPUArrays")` will pick those up and should include the working backends.
157
140
So if your system configuration changes, make sure to run `Pkg.build("GPUArrays")` again.
158
141
The rest should work automatically:
142
+
159
143
```Julia
160
144
Pkg.add("GPUArrays")
145
+
Pkg.checkout("GPUArrays") # optional but recommended to checkout master branch
161
146
Pkg.build("GPUArrays") # should print out information about what backends are added
162
147
# Test it!
163
148
Pkg.test("GPUArrays")
164
149
```
150
+
If a backend is not supported by the hardware, you will see build errors while running `Pkg.add("GPUArrays")`.
151
+
Since GPUArrays selects only working backends when running `Pkg.build("GPUArrays")`
0 commit comments