Skip to content

Commit b6523d1

Browse files
authored
Merge pull request #24 from pxl-th/pxl-th/accumulate-t
Compute accumulate destination type based on the `op`
2 parents f1b46d2 + 1b02e6e commit b6523d1

File tree

2 files changed

+5
-5
lines changed

2 files changed

+5
-5
lines changed

Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name = "AcceleratedKernels"
22
uuid = "6a4ca0a5-0e36-4168-a932-d9be78d558f1"
33
authors = ["Andrei-Leonard Nicusan <leonard@evophase.co.uk> and contributors"]
4-
version = "0.3.0"
4+
version = "0.3.1"
55

66
[deps]
77
ArgCheck = "dce04be8-c92d-5529-be00-80e4d2c0e197"

src/accumulate/accumulate.jl

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -71,11 +71,11 @@ For compatibility with the `Base.accumulate!` function, we provide the two-array
7171
we do not need the constraint of `dst` and `src` being different; to minimise memory use, we
7272
recommend using the single-array interface (the first one above).
7373
74-
## CPU
74+
## CPU
7575
The CPU implementation is currently single-threaded; we are waiting on a multithreaded
7676
implementation in OhMyThreads.jl ([issue](https://github.com/JuliaFolds2/OhMyThreads.jl/issues/129)).
7777
78-
## GPU
78+
## GPU
7979
For the 1D case (`dims=nothing`), the `alg` can be one of the following:
8080
- `DecoupledLookback()`: the default algorithm, using opportunistic lookback to reuse earlier
8181
blocks' results; requires device-level memory consistency guarantees, which Apple Metal does not
@@ -241,7 +241,7 @@ function accumulate(
241241
temp::Union{Nothing, AbstractArray}=nothing,
242242
temp_flags::Union{Nothing, AbstractArray}=nothing,
243243
)
244-
dst_type = promote_type(eltype(v), typeof(init))
244+
dst_type = Base.promote_op(op, eltype(v), typeof(init))
245245
vcopy = similar(v, dst_type)
246246
copyto!(vcopy, v)
247247
accumulate!(
@@ -252,7 +252,7 @@ function accumulate(
252252
inclusive=inclusive,
253253

254254
alg=alg,
255-
255+
256256
block_size=block_size,
257257
temp=temp,
258258
temp_flags=temp_flags,

0 commit comments

Comments
 (0)