Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Parallel algorithm building blocks for the Julia ecosystem, targeting multithrea


### A Uniform API, Everywhere
Offering standard library functions (e.g., `sort`, `mapreduce`, `accumulate`), higher-order functions (e.g., `sum`, `cumprod`, `any`), and cross-architecture custom loops (`foreachindex`, `foraxes`), AcceleratedKernels.jl lets you write high-performance code once and run it on any supported architecture — no separate or special-cased kernels needed. It’s the classic “write once, run everywhere” principle, but supercharged for modern parallel CPU and GPU computing.
Offering standard library algorithms (e.g., `sort`, `mapreduce`, `accumulate`), higher-order functions (e.g., `sum`, `cumprod`, `any`), and cross-architecture custom loops (`foreachindex`, `foraxes`), AcceleratedKernels.jl lets you write high-performance code once and run it on all supported architectures — no separate or special-cased kernels needed. It’s the classic “write once, run everywhere” principle, but supercharged for modern parallel CPU and GPU computing.


<table>
Expand Down Expand Up @@ -320,8 +320,7 @@ Help is very welcome for any of the below:
switch_below=(1, 10, 100, 1000, 10000)
end
```
- Add performant multithreaded Julia implementations to all algorithms; e.g. `foreachindex` has one, `any` does not.
- EDIT: as of v0.2.0, only `sort` needs a multithreaded implementation.
- We need multithreaded implementations of `sort`, N-dimensional `mapreduce` (in `OhMyThreads.tmapreduce`) and `accumulate` (again, probably in `OhMyThreads`).
- Any way to expose the warp-size from the backends? Would be useful in reductions.
- Add a performance regressions runner.
- **Other ideas?** Post an issue, or open a discussion on the Julia Discourse.
Expand Down
12 changes: 6 additions & 6 deletions src/predicates.jl
Original file line number Diff line number Diff line change
Expand Up @@ -57,9 +57,9 @@ it in your application. When only one thread is needed, there is no overhead.

## GPU
There are two possible `alg` choices:
- `ConcurrentWrite()`: the default algorithm, using concurrent writing to a global flag;
and uses a global flag to write the result; this is only one platform we are aware of (Intel UHD
620 integrated graphics cards) where such writes hang.
- `ConcurrentWrite()`: the default algorithm, using concurrent writing to a global flag; there is
only one platform we are aware of (Intel UHD 620 integrated graphics cards) where multiple
threads writing to the same memory location - even if writing the same value - hang the device.
- `MapReduce(; temp=nothing, switch_below=0)`: a conservative [`mapreduce`](@ref)-based
implementation which can be used on all platforms, but does not use shortcircuiting
optimisations. You can set the `temp` and `switch_below` keyword arguments to be forwarded to
Expand Down Expand Up @@ -201,9 +201,9 @@ it in your application. When only one thread is needed, there is no overhead.

## GPU
There are two possible `alg` choices:
- `ConcurrentWrite()`: the default algorithm, using concurrent writing to a global flag;
and uses a global flag to write the result; this is only one platform we are aware of (Intel UHD
620 integrated graphics cards) where such writes hang.
- `ConcurrentWrite()`: the default algorithm, using concurrent writing to a global flag; there is
only one platform we are aware of (Intel UHD 620 integrated graphics cards) where multiple
threads writing to the same memory location - even if writing the same value - hang the device.
- `MapReduce(; temp=nothing, switch_below=0)`: a conservative [`mapreduce`](@ref)-based
implementation which can be used on all platforms, but does not use shortcircuiting
optimisations. You can set the `temp` and `switch_below` keyword arguments to be forwarded to
Expand Down
Loading