Skip to content

Commit f1b46d2

Browse files
committed
Added first OpenCL benchmarks
1 parent 63ec68e commit f1b46d2

File tree

3 files changed

+45
-1
lines changed

3 files changed

+45
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -194,7 +194,7 @@ Again, this is only possible because of the unique Julia compilation model, the
194194

195195

196196
## 2. Status
197-
The AcceleratedKernels.jl sorters were adopted as the official [AMDGPU algorithms](https://github.com/JuliaGPU/AMDGPU.jl/pull/688)! The API is starting to stabilise; it follows the Julia standard library fairly closely - and additionally exposing all temporary arrays for memory reuse. For any new ideas / requests, please join the conversation on [Julia Discourse](https://discourse.julialang.org/t/ann-acceleratedkernels-jl-cross-architecture-parallel-algorithms-for-julias-gpu-backends/119698/16) or post [an issue](https://github.com/juliagpu/AcceleratedKernels.jl/issues).
197+
The AcceleratedKernels.jl GPU `sort` and `accumulate` implementations were adopted as the official [AMDGPU algorithms](https://github.com/JuliaGPU/AMDGPU.jl/pull/688)! The API is starting to stabilise; it follows the Julia standard library fairly closely - and additionally exposing all temporary arrays for memory reuse. For any new ideas / requests, please join the conversation on [Julia Discourse](https://discourse.julialang.org/t/ann-acceleratedkernels-jl-cross-architecture-parallel-algorithms-for-julias-gpu-backends/119698/16) or post [an issue](https://github.com/juliagpu/AcceleratedKernels.jl/issues).
198198

199199
We have an extensive randomised test suite that we run on the CPU (single- and multi-threaded) backend on Windows, Ubuntu and MacOS for Julia LTS, Stable, and Pre-Release, plus the CUDA, AMDGPU, oneAPI and Metal backends on the [JuliaGPU buildkite](https://github.com/JuliaGPU/buildkite) - the exact same tests are run on all architectures to ensure uniform interfaces.
200200

prototype/opencl/Project.toml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
[deps]
2+
AcceleratedKernels = "6a4ca0a5-0e36-4168-a932-d9be78d558f1"
3+
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
4+
Cthulhu = "f68482b8-f384-11e8-15f7-abe071a5a75f"
5+
OpenCL = "08131aa3-fb12-5dee-8b74-c09406e224a2"
6+
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
7+
pocl_jll = "627d6b7a-bbe6-5189-83e7-98cc0a5aeadd"

prototype/opencl/sort_benchmark.jl

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
using BenchmarkTools
2+
using Random
3+
4+
using OpenCL, pocl_jll
5+
import AcceleratedKernels as AK
6+
7+
8+
Random.seed!(0)
9+
OpenCL.versioninfo()
10+
11+
12+
# Generate random numbers
13+
n = 10_000_000
14+
d = CLArray{Int64}(undef, n);
15+
16+
17+
function aksort!(d, temp)
18+
AK.sort!(d, temp=temp, block_size=512)
19+
AK.synchronize(AK.get_backend(d))
20+
d
21+
end
22+
23+
24+
println("AcceleratedKernels Sort:")
25+
temp = similar(d)
26+
display(@benchmark aksort!($d, temp) setup=(rand!(d)))
27+
28+
29+
println("Base Sort:")
30+
dh = Array(d)
31+
temph = Array(temp)
32+
display(@benchmark aksort!($dh, temph) setup=(rand!(dh)))
33+
34+
35+
# println("BUC / CUDA Thrust Sort:")
36+
# display(@benchmark buc_sort!($d) setup=(rand!(d)))
37+

0 commit comments

Comments
 (0)