Skip to content

Conversation

@anicusan
Copy link
Member

Now all algorithms are parallel on CPUs too.

I removed the Polyester stack and OhMyThreads from the dependencies - Julia base threads are now extremely performant, and allocate very little (and a 5-line AK implementation of mapreduce has the same speed as OMT).

The dependencies are now minimal: ArgCheck, GPUArraysCore, KernelAbstractions - with none actually bringing in a backend stack.

I had to remove the scheduler kwarg from foreachindex and mapreduce, which is a breaking change - hence the version bump to 0.4.0.

…e; now all algorithms are parallel on CPUs too. Removed Polyester and OhMyThreads dependencies (and the `scheduler` kwarg, needing a version bump)
@anicusan
Copy link
Member Author

Are there any other non-backwards-compatible changes we might need from the GPUArrays.jl side before tagging 0.4.0? @christiangnrd @maleadt @vchuravy

@christiangnrd
Copy link
Member

christiangnrd commented May 24, 2025

I believe this can be done in a backwards-compatible manner, but the idea I had for dealing with the concept of CPU and GPU backends going away in future versions of KA would introduce a new keyword argument.

Essentially the new kwarg would be a bool that defaults to whatever doesn’t cause breaking changes. For my example I’ll call it ka_with_cpu and it would default to false, and then in the function you would replace

- if backend isa GPU
+ if ka_with_cpu || dst <: AbstractGPUArray # or AnyGPUArray?

Open to feedback.

@anicusan
Copy link
Member Author

Hmm, even if CPU and GPU backends are removed from KernelAbstractions (I assume there'd just be the GPUCompiler backends for CUDA, AMDGPU, oneAPI, Metal, OpenCL, regardless of the type of hardware it targets - e.g. OpenCL on CPUs), I think just making a BaseBackend or ThreadsBackend within AcceleratedKernels for the Julia Base multithreaded implementations would be most natural - the same interface remains. In the end it does satisfy the KA Backend definition - any reason to special-case it with a kwarg?

@anicusan
Copy link
Member Author

Or would this be too type-pirate-y?

struct ThreadsBackend <: Backend end
get_backend(::Array) = ThreadsBackend()

@christiangnrd
Copy link
Member

christiangnrd commented May 25, 2025

Yeah unfortunately that's definitely too type-piratey:

https://github.com/JuliaGPU/KernelAbstractions.jl/blob/110d78475861281298ec930c7292af903e8360d0/src/pocl/backend.jl#L58

It's not just defining a foreign function foreign arguments, it's also redefining an already-existing function

@anicusan
Copy link
Member Author

Oh:

KA.get_backend(::Array) = POCLBackend()

I wasn't expecting that - PoCL also targets hardware accelerators, right? Won't there be a CLArray anymore?

@anicusan
Copy link
Member Author

@christiangnrd is right, even if we want to keep the base threaded algorithms, this can be added in a non-breaking way with a kwarg like prefer_threads depending on how KernelAbstractions / PoCL progress - in the end, all I care about is maximum performance on both CPUs and GPUs with a uniform API.

I'll merge this now

@anicusan anicusan merged commit 14de3f2 into main May 25, 2025
37 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants