-
Notifications
You must be signed in to change notification settings - Fork 8
Added multithreaded implementations of 1D and ND accumulate, mapreduce. Removed Polyester and OhMyThreads dependencies. #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…e; now all algorithms are parallel on CPUs too. Removed Polyester and OhMyThreads dependencies (and the `scheduler` kwarg, needing a version bump)
|
Are there any other non-backwards-compatible changes we might need from the GPUArrays.jl side before tagging 0.4.0? @christiangnrd @maleadt @vchuravy |
|
I believe this can be done in a backwards-compatible manner, but the idea I had for dealing with the concept of Essentially the new kwarg would be a bool that defaults to whatever doesn’t cause breaking changes. For my example I’ll call it - if backend isa GPU
+ if ka_with_cpu || dst <: AbstractGPUArray # or AnyGPUArray?Open to feedback. |
|
Hmm, even if |
|
Or would this be too type-pirate-y? struct ThreadsBackend <: Backend end
get_backend(::Array) = ThreadsBackend() |
…actGPUVector from imports. Full sample_sort benchmark suite
|
Yeah unfortunately that's definitely too type-piratey: It's not just defining a foreign function foreign arguments, it's also redefining an already-existing function |
|
Oh:
I wasn't expecting that - PoCL also targets hardware accelerators, right? Won't there be a |
|
@christiangnrd is right, even if we want to keep the base threaded algorithms, this can be added in a non-breaking way with a kwarg like I'll merge this now |
Now all algorithms are parallel on CPUs too.
I removed the Polyester stack and OhMyThreads from the dependencies - Julia base threads are now extremely performant, and allocate very little (and a 5-line AK implementation of mapreduce has the same speed as OMT).
The dependencies are now minimal: ArgCheck, GPUArraysCore, KernelAbstractions - with none actually bringing in a backend stack.
I had to remove the
schedulerkwarg fromforeachindexandmapreduce, which is a breaking change - hence the version bump to 0.4.0.