Added multithreaded implementations of 1D and ND accumulate, mapreduce. Removed Polyester and OhMyThreads dependencies. #40

anicusan · 2025-05-23T16:20:13Z

Now all algorithms are parallel on CPUs too.

I removed the Polyester stack and OhMyThreads from the dependencies - Julia base threads are now extremely performant, and allocate very little (and a 5-line AK implementation of mapreduce has the same speed as OMT).

The dependencies are now minimal: ArgCheck, GPUArraysCore, KernelAbstractions - with none actually bringing in a backend stack.

I had to remove the scheduler kwarg from foreachindex and mapreduce, which is a breaking change - hence the version bump to 0.4.0.

…e; now all algorithms are parallel on CPUs too. Removed Polyester and OhMyThreads dependencies (and the `scheduler` kwarg, needing a version bump)

anicusan · 2025-05-23T16:21:41Z

Are there any other non-backwards-compatible changes we might need from the GPUArrays.jl side before tagging 0.4.0? @christiangnrd @maleadt @vchuravy

christiangnrd · 2025-05-24T03:07:16Z

I believe this can be done in a backwards-compatible manner, but the idea I had for dealing with the concept of CPU and GPU backends going away in future versions of KA would introduce a new keyword argument.

Essentially the new kwarg would be a bool that defaults to whatever doesn’t cause breaking changes. For my example I’ll call it ka_with_cpu and it would default to false, and then in the function you would replace

- if backend isa GPU
+ if ka_with_cpu || dst <: AbstractGPUArray # or AnyGPUArray?

Open to feedback.

anicusan · 2025-05-24T23:06:27Z

Hmm, even if CPU and GPU backends are removed from KernelAbstractions (I assume there'd just be the GPUCompiler backends for CUDA, AMDGPU, oneAPI, Metal, OpenCL, regardless of the type of hardware it targets - e.g. OpenCL on CPUs), I think just making a BaseBackend or ThreadsBackend within AcceleratedKernels for the Julia Base multithreaded implementations would be most natural - the same interface remains. In the end it does satisfy the KA Backend definition - any reason to special-case it with a kwarg?

anicusan · 2025-05-24T23:09:09Z

Or would this be too type-pirate-y?

struct ThreadsBackend <: Backend end
get_backend(::Array) = ThreadsBackend()

…actGPUVector from imports. Full sample_sort benchmark suite

christiangnrd · 2025-05-25T00:38:37Z

Yeah unfortunately that's definitely too type-piratey:

https://github.com/JuliaGPU/KernelAbstractions.jl/blob/110d78475861281298ec930c7292af903e8360d0/src/pocl/backend.jl#L58

It's not just defining a foreign function foreign arguments, it's also redefining an already-existing function

anicusan · 2025-05-25T00:46:32Z

Oh:

KA.get_backend(::Array) = POCLBackend()

I wasn't expecting that - PoCL also targets hardware accelerators, right? Won't there be a CLArray anymore?

anicusan · 2025-05-25T14:49:03Z

@christiangnrd is right, even if we want to keep the base threaded algorithms, this can be added in a non-breaking way with a kwarg like prefer_threads depending on how KernelAbstractions / PoCL progress - in the end, all I care about is maximum performance on both CPUs and GPUs with a uniform API.

I'll merge this now

Added multithreaded implementations of 1D and ND accumulate, mapreduc…

5e2b06b

…e; now all algorithms are parallel on CPUs too. Removed Polyester and OhMyThreads dependencies (and the `scheduler` kwarg, needing a version bump)

Add max_tasks and min_elems kwargs to Metal ext specialisations

cf6e08d

anicusan added 2 commits May 25, 2025 01:12

sample_sortperm uses by/rev/ord in the right order now. Removed Abstr…

a8e7d77

…actGPUVector from imports. Full sample_sort benchmark suite

Add simple AK example to README

32e7dbe

anicusan merged commit 14de3f2 into main May 25, 2025
37 of 38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added multithreaded implementations of 1D and ND accumulate, mapreduce. Removed Polyester and OhMyThreads dependencies. #40

Added multithreaded implementations of 1D and ND accumulate, mapreduce. Removed Polyester and OhMyThreads dependencies. #40

Uh oh!

anicusan commented May 23, 2025

Uh oh!

anicusan commented May 23, 2025

Uh oh!

christiangnrd commented May 24, 2025 •

edited

Loading

Uh oh!

anicusan commented May 24, 2025

Uh oh!

anicusan commented May 24, 2025

Uh oh!

christiangnrd commented May 25, 2025 •

edited

Loading

Uh oh!

anicusan commented May 25, 2025

Uh oh!

anicusan commented May 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Added multithreaded implementations of 1D and ND accumulate, mapreduce. Removed Polyester and OhMyThreads dependencies. #40

Added multithreaded implementations of 1D and ND accumulate, mapreduce. Removed Polyester and OhMyThreads dependencies. #40

Uh oh!

Conversation

anicusan commented May 23, 2025

Uh oh!

anicusan commented May 23, 2025

Uh oh!

christiangnrd commented May 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anicusan commented May 24, 2025

Uh oh!

anicusan commented May 24, 2025

Uh oh!

christiangnrd commented May 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anicusan commented May 25, 2025

Uh oh!

anicusan commented May 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

christiangnrd commented May 24, 2025 •

edited

Loading

christiangnrd commented May 25, 2025 •

edited

Loading