Skip to content

Commit 2b184f8

Browse files
authored
Update README.md
1 parent a22a89c commit 2b184f8

File tree

1 file changed

+23
-34
lines changed

1 file changed

+23
-34
lines changed

README.md

Lines changed: 23 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -102,43 +102,33 @@ function test(a, b)
102102
Complex64(sin(a / b))
103103
end
104104
complex_c = test.(c, b)
105-
fft!(complex_c) # fft!/ifft! is currently implemented for JLBackend and CLBackend
106-
105+
fft!(complex_c) # fft!/ifft!/plan_fft, plan_ifft, plan_fft!, plan_ifft!
106+
107+
"""
108+
When you program with GPUArrays, you can just write normal julia functions, feed them to gpu_call and depending on what backend you choose it will use Transpiler.jl or CUDAnative.
109+
"""
110+
#Signature, global_size == cuda blocks, local size == cuda threads
111+
gpu_call(kernel::Function, DispatchDummy::GPUArray, args::Tuple, global_size = length(DispatchDummy), local_size = nothing)
112+
with kernel looking like this:
113+
114+
function kernel(state, arg1, arg2, arg3) # args get splatted into the kernel call
115+
# state gets always passed as the first argument and is needed to offer the same
116+
# functionality across backends, even though they have very different ways of of getting e.g. the thread index
117+
# arg1 can be any gpu array - this is needed to dispatch to the correct intrinsics.
118+
# if you call gpu_call without any further modifications to global/local size, this should give you a linear index into
119+
# DispatchDummy
120+
idx = linear_index(state, arg1::GPUArray)
121+
arg1[idx] = arg2[idx] + arg3[idx]
122+
return #kernel must return void
123+
end
107124
```
108125

109-
CLFFT, CUFFT, CLBLAS and CUBLAS will soon be supported.
110-
A prototype of generic support of these libraries can be found in [blas.jl](https://github.com/JuliaGPU/GPUArrays.jl/blob/master/src/backends/blas.jl).
111-
The OpenCL backend already supports mat mul via `CLBLAS.gemm!` and `fft!`/`ifft!`.
112-
CUDAnative could support these easily as well, but we currently run into problems with the interactions of `CUDAdrv` and `CUDArt`.
113-
114-
115-
# Benchmarks
116-
117-
We have only benchmarked Blackscholes and not much time has been spent to optimize our kernels yet.
118-
So please treat these numbers with care!
119-
120-
[source](https://github.com/JuliaGPU/GPUArrays.jl/blob/master/examples/blackscholes.jl)
121-
122-
![blackscholes](https://cdn.rawgit.com/JuliaGPU/GPUArrays.jl/91678a36/examples/blackscholes.svg)
123-
124-
Interestingly, on the GTX950, the CUDAnative backend outperforms the OpenCL backend by a factor of 10.
125-
This is most likely due to the fact, that LLVM is great at unrolling and vectorizing loops,
126-
while it seems that the nvidia OpenCL compiler isn't. So with our current primitive kernel,
127-
quite a bit of performance is missed out with OpenCL right now!
128-
This can be fixed by putting more effort into emitting specialized kernels, which should
129-
be straightforward with Julia's great meta programming and `@generated` functions.
130-
126+
# Currently supported subset of Julia Code
131127

132-
Times in a table:
128+
working with immutable isbits (not containing pointers) type should be completely supported
129+
non allocating code (so no constructs like `x = [1, 2, 3]`). Note that tuples are isbits, so this works x = (1, 2, 3).
130+
Transpiler/OpenCL has problems with putting GPU arrays on the gpu into a struct - so no views and actually no multidimensional indexing. For that `size` is needed which would need to be part of the array struct. A fix for that is in sight, though.
133131

134-
| Backend | Time (s) for N = 10^7 | OP/s in million | Speedup |
135-
| ---- | ---- | ---- | ---- |
136-
| JLContext i3-4130 CPU @ 3.40GHz 1 threads | 1.0085 s| 10 | 1.0|
137-
| JLContext i7-6700 CPU @ 3.40GHz 1 threads | 0.8773 s| 11 | 1.1|
138-
| CLContext: i7-6700 CPU @ 3.40GHz 8 threads | 0.2093 s| 48 | 4.8|
139-
| JLContext i7-6700 CPU @ 3.40GHz 8 threads | 0.1981 s| 50 | 5.1|
140-
| CLContext: GeForce GTX 950 | 0.0301 s| 332 | 33.5|
141-
| CUContext: GeForce GTX 950 | 0.0032 s| 3124 | 315.0|
142132
| CLContext: FirePro w9100 | 0.0013 s| 7831 | 789.8|
143133

144134
# TODO / up for grabs
@@ -155,7 +145,6 @@ Times in a table:
155145
# Installation
156146

157147
I recently added a lot of features and bug fixes to the master branch.
158-
Please check that out first and see [pull #37](https://github.com/JuliaGPU/GPUArrays.jl/pull/37) for a list of new features.
159148

160149
For the cudanative backend, you need to install [CUDAnative.jl manually](https://github.com/JuliaGPU/CUDAnative.jl/#installation) and it works only on osx + linux with a julia source build.
161150
Make sure to have either CUDA and/or OpenCL drivers installed correctly.

0 commit comments

Comments
 (0)