Added first OpenCL benchmarks

anicusan · anicusan · commit f1b46d255f2f · 2025-02-07T12:16:29.000Z
diff --git a/README.md b/README.md
@@ -194,7 +194,7 @@ Again, this is only possible because of the unique Julia compilation model, the
 
 
 ## 2. Status
-The AcceleratedKernels.jl sorters were adopted as the official [AMDGPU algorithms](https://github.com/JuliaGPU/AMDGPU.jl/pull/688)! The API is starting to stabilise; it follows the Julia standard library fairly closely - and additionally exposing all temporary arrays for memory reuse. For any new ideas / requests, please join the conversation on [Julia Discourse](https://discourse.julialang.org/t/ann-acceleratedkernels-jl-cross-architecture-parallel-algorithms-for-julias-gpu-backends/119698/16) or post [an issue](https://github.com/juliagpu/AcceleratedKernels.jl/issues).
+The AcceleratedKernels.jl GPU `sort` and `accumulate` implementations were adopted as the official [AMDGPU algorithms](https://github.com/JuliaGPU/AMDGPU.jl/pull/688)! The API is starting to stabilise; it follows the Julia standard library fairly closely - and additionally exposing all temporary arrays for memory reuse. For any new ideas / requests, please join the conversation on [Julia Discourse](https://discourse.julialang.org/t/ann-acceleratedkernels-jl-cross-architecture-parallel-algorithms-for-julias-gpu-backends/119698/16) or post [an issue](https://github.com/juliagpu/AcceleratedKernels.jl/issues).
 
 We have an extensive randomised test suite that we run on the CPU (single- and multi-threaded) backend on Windows, Ubuntu and MacOS for Julia LTS, Stable, and Pre-Release, plus the CUDA, AMDGPU, oneAPI and Metal backends on the [JuliaGPU buildkite](https://github.com/JuliaGPU/buildkite) - the exact same tests are run on all architectures to ensure uniform interfaces.
 
diff --git a/prototype/opencl/Project.toml b/prototype/opencl/Project.toml
@@ -0,0 +1,7 @@
+[deps]
+AcceleratedKernels = "6a4ca0a5-0e36-4168-a932-d9be78d558f1"
+BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
+Cthulhu = "f68482b8-f384-11e8-15f7-abe071a5a75f"
+OpenCL = "08131aa3-fb12-5dee-8b74-c09406e224a2"
+Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
+pocl_jll = "627d6b7a-bbe6-5189-83e7-98cc0a5aeadd"
diff --git a/prototype/opencl/sort_benchmark.jl b/prototype/opencl/sort_benchmark.jl
@@ -0,0 +1,37 @@
+using BenchmarkTools
+using Random
+
+using OpenCL, pocl_jll
+import AcceleratedKernels as AK
+
+
+Random.seed!(0)
+OpenCL.versioninfo()
+
+
+# Generate random numbers
+n = 10_000_000
+d = CLArray{Int64}(undef, n);
+
+
+function aksort!(d, temp)
+    AK.sort!(d, temp=temp, block_size=512)
+    AK.synchronize(AK.get_backend(d))
+    d
+end
+
+
+println("AcceleratedKernels Sort:")
+temp = similar(d)
+display(@benchmark aksort!($d, temp) setup=(rand!(d)))
+
+
+println("Base Sort:")
+dh = Array(d)
+temph = Array(temp)
+display(@benchmark aksort!($dh, temph) setup=(rand!(dh)))
+
+
+# println("BUC / CUDA Thrust Sort:")
+# display(@benchmark buc_sort!($d) setup=(rand!(d)))
+