Skip to content

Commit 63ec68e

Browse files
authored
Merge pull request #21 from JuliaGPU/reduce-init-use
typos
2 parents 6661069 + 0091efb commit 63ec68e

File tree

2 files changed

+8
-9
lines changed

2 files changed

+8
-9
lines changed

README.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Parallel algorithm building blocks for the Julia ecosystem, targeting multithrea
1212

1313

1414
### A Uniform API, Everywhere
15-
Offering standard library functions (e.g., `sort`, `mapreduce`, `accumulate`), higher-order functions (e.g., `sum`, `cumprod`, `any`), and cross-architecture custom loops (`foreachindex`, `foraxes`), AcceleratedKernels.jl lets you write high-performance code once and run it on any supported architecture — no separate or special-cased kernels needed. It’s the classic “write once, run everywhere” principle, but supercharged for modern parallel CPU and GPU computing.
15+
Offering standard library algorithms (e.g., `sort`, `mapreduce`, `accumulate`), higher-order functions (e.g., `sum`, `cumprod`, `any`), and cross-architecture custom loops (`foreachindex`, `foraxes`), AcceleratedKernels.jl lets you write high-performance code once and run it on all supported architectures — no separate or special-cased kernels needed. It’s the classic “write once, run everywhere” principle, but supercharged for modern parallel CPU and GPU computing.
1616

1717

1818
<table>
@@ -320,8 +320,7 @@ Help is very welcome for any of the below:
320320
switch_below=(1, 10, 100, 1000, 10000)
321321
end
322322
```
323-
- Add performant multithreaded Julia implementations to all algorithms; e.g. `foreachindex` has one, `any` does not.
324-
- EDIT: as of v0.2.0, only `sort` needs a multithreaded implementation.
323+
- We need multithreaded implementations of `sort`, N-dimensional `mapreduce` (in `OhMyThreads.tmapreduce`) and `accumulate` (again, probably in `OhMyThreads`).
325324
- Any way to expose the warp-size from the backends? Would be useful in reductions.
326325
- Add a performance regressions runner.
327326
- **Other ideas?** Post an issue, or open a discussion on the Julia Discourse.

src/predicates.jl

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -57,9 +57,9 @@ it in your application. When only one thread is needed, there is no overhead.
5757
5858
## GPU
5959
There are two possible `alg` choices:
60-
- `ConcurrentWrite()`: the default algorithm, using concurrent writing to a global flag;
61-
and uses a global flag to write the result; this is only one platform we are aware of (Intel UHD
62-
620 integrated graphics cards) where such writes hang.
60+
- `ConcurrentWrite()`: the default algorithm, using concurrent writing to a global flag; there is
61+
only one platform we are aware of (Intel UHD 620 integrated graphics cards) where multiple
62+
threads writing to the same memory location - even if writing the same value - hang the device.
6363
- `MapReduce(; temp=nothing, switch_below=0)`: a conservative [`mapreduce`](@ref)-based
6464
implementation which can be used on all platforms, but does not use shortcircuiting
6565
optimisations. You can set the `temp` and `switch_below` keyword arguments to be forwarded to
@@ -201,9 +201,9 @@ it in your application. When only one thread is needed, there is no overhead.
201201
202202
## GPU
203203
There are two possible `alg` choices:
204-
- `ConcurrentWrite()`: the default algorithm, using concurrent writing to a global flag;
205-
and uses a global flag to write the result; this is only one platform we are aware of (Intel UHD
206-
620 integrated graphics cards) where such writes hang.
204+
- `ConcurrentWrite()`: the default algorithm, using concurrent writing to a global flag; there is
205+
only one platform we are aware of (Intel UHD 620 integrated graphics cards) where multiple
206+
threads writing to the same memory location - even if writing the same value - hang the device.
207207
- `MapReduce(; temp=nothing, switch_below=0)`: a conservative [`mapreduce`](@ref)-based
208208
implementation which can be used on all platforms, but does not use shortcircuiting
209209
optimisations. You can set the `temp` and `switch_below` keyword arguments to be forwarded to

0 commit comments

Comments
 (0)