Skip to content

Commit f9bf7de

Browse files
committed
datadeps: Add at-stencil helper
1 parent b3b7153 commit f9bf7de

File tree

13 files changed

+924
-21
lines changed

13 files changed

+924
-21
lines changed

docs/make.jl

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,10 @@ makedocs(;
2626
"Scopes" => "scopes.md",
2727
"Processors" => "processors.md",
2828
"Task Queues" => "task-queues.md",
29-
"Datadeps" => "datadeps.md",
29+
"Datadeps" => [
30+
"Basics" => "datadeps.md",
31+
"Stencils" => "stencils.md",
32+
],
3033
"GPUs" => "gpu.md",
3134
"Option Propagation" => "propagation.md",
3235
"Logging and Visualization" => [

docs/src/index.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -361,6 +361,39 @@ DA = rand(Blocks(32, 32), 256, 128)
361361
collect(DA) # returns a `Matrix{Float64}`
362362
```
363363

364+
-----
365+
366+
## Quickstart: Stencil Operations
367+
368+
Dagger's `@stencil` macro allows for easy specification of stencil operations on `DArray`s, often used in simulations and image processing. These operations typically involve updating an element based on the values of its neighbors.
369+
370+
For more details: [Stencil Operations](@ref)
371+
372+
### Applying a Simple Stencil
373+
374+
Here's how to apply a stencil that averages each element with its immediate neighbors, using a `Wrap` boundary condition (where neighbor access at the array edges wrap around).
375+
376+
```julia
377+
using Dagger
378+
import Dagger: @stencil, Wrap
379+
380+
# Create a 5x5 DArray, partitioned into 2x2 blocks
381+
A = rand(Blocks(2, 2), 5, 5)
382+
B = zeros(Blocks(2,2), 5, 5)
383+
384+
Dagger.spawn_datadeps() do
385+
@stencil begin
386+
# For each element in A, calculate the sum of its 3x3 neighborhood
387+
# (including itself) and store the average in B.
388+
# Values outside the array bounds are determined by Wrap().
389+
B[idx] = sum(@neighbors(A[idx], 1, Wrap())) / 9.0
390+
end
391+
end
392+
393+
# B now contains the averaged values.
394+
```
395+
In this example, `idx` refers to the coordinates of each element being processed. `@neighbors(A[idx], 1, Wrap())` fetches the 3x3 neighborhood around `A[idx]`. The `1` indicates a neighborhood distance of 1 from the central element, and `Wrap()` specifies the boundary behavior.
396+
364397
## Quickstart: Datadeps
365398

366399
Datadeps is a feature in Dagger.jl that facilitates parallelism control within designated regions, allowing tasks to write to their arguments while ensuring dependencies are respected.

docs/src/stencils.md

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
# Stencil Operations
2+
3+
The `@stencil` macro in Dagger.jl provides a convenient way to perform stencil computations on `DArray`s. It operates within a `Dagger.spawn_datadeps()` block and allows you to define operations that apply to each element of a `DArray`, potentially accessing values from each element's neighbors.
4+
5+
## Basic Usage
6+
7+
The fundamental structure of a `@stencil` block involves iterating over an implicit index, named `idx` in the following example , which represents the coordinates of an element in the processed `DArray`s.
8+
9+
```julia
10+
using Dagger
11+
import Dagger: @stencil, Wrap, Pad
12+
13+
# Initialize a DArray
14+
A = zeros(Blocks(2, 2), Int, 4, 4)
15+
16+
Dagger.spawn_datadeps() do
17+
@stencil begin
18+
A[idx] = 1 # Assign 1 to every element of A
19+
end
20+
end
21+
22+
@assert all(collect(A) .== 1)
23+
```
24+
25+
In this example, `A[idx] = 1` is executed for each chunk of `A`. The `idx` variable corresponds to the indices within each chunk.
26+
27+
## Neighborhood Access with `@neighbors`
28+
29+
The true power of stencils comes from accessing neighboring elements. The `@neighbors` macro facilitates this.
30+
31+
`@neighbors(array[idx], distance, boundary_condition)`
32+
33+
- `array[idx]`: The array and current index from which to find neighbors.
34+
- `distance`: An integer specifying the extent of the neighborhood (e.g., `1` for a 3x3 neighborhood in 2D).
35+
- `boundary_condition`: Defines how to handle accesses beyond the array boundaries. Available conditions are:
36+
- `Wrap()`: Wraps around to the other side of the array.
37+
- `Pad(value)`: Pads with a specified `value`.
38+
39+
### Example: Averaging Neighbors with `Wrap`
40+
41+
```julia
42+
import Dagger: Wrap
43+
44+
# Initialize a DArray
45+
A = ones(Blocks(1, 1), Int, 3, 3)
46+
A[2,2] = 10 # Central element has a different value
47+
B = zeros(Blocks(1, 1), Float64, 3, 3)
48+
49+
Dagger.spawn_datadeps() do
50+
@stencil begin
51+
# Calculate the average of the 3x3 neighborhood (including the center)
52+
B[idx] = sum(@neighbors(A[idx], 1, Wrap())) / 9.0
53+
end
54+
end
55+
56+
# Manually calculate expected B for verification
57+
expected_B = zeros(Float64, 3, 3)
58+
A_collected = collect(A)
59+
for r in 1:3, c in 1:3
60+
local_sum = 0.0
61+
for dr in -1:1, dc in -1:1
62+
nr, nc = mod1(r+dr, 3), mod1(c+dc, 3)
63+
local_sum += A_collected[nr, nc]
64+
end
65+
expected_B[r,c] = local_sum / 9.0
66+
end
67+
68+
@assert collect(B) expected_B
69+
```
70+
71+
### Example: Convolution with `Pad`
72+
73+
```julia
74+
import Pad
75+
76+
# Initialize a DArray
77+
A = ones(Blocks(2, 2), Int, 4, 4)
78+
B = zeros(Blocks(2, 2), Int, 4, 4)
79+
80+
Dagger.spawn_datadeps() do
81+
@stencil begin
82+
B[idx] = sum(@neighbors(A[idx], 1, Pad(0))) # Pad with 0
83+
end
84+
end
85+
86+
# Expected result for a 3x3 sum filter with zero padding
87+
expected_B_padded = [
88+
4 6 6 4;
89+
6 9 9 6;
90+
6 9 9 6;
91+
4 6 6 4
92+
]
93+
@assert collect(B) == expected_B_padded
94+
```
95+
96+
## Sequential Semantics
97+
98+
Expressions within a `@stencil` block are executed sequentially in terms of their effect on the data. This means that the result of one statement is visible to the subsequent statements, as if they were applied "all at once" across all indices before the next statement begins.
99+
100+
```julia
101+
A = zeros(Blocks(2, 2), Int, 4, 4)
102+
B = zeros(Blocks(2, 2), Int, 4, 4)
103+
104+
Dagger.spawn_datadeps() do
105+
@stencil begin
106+
A[idx] = 1 # First, A is initialized
107+
B[idx] = A[idx] * 2 # Then, B is computed using the new values of A
108+
end
109+
end
110+
111+
expected_A = [1 for r in 1:4, c in 1:4]
112+
expected_B_seq = expected_A .* 2
113+
114+
@assert collect(A) == expected_A
115+
@assert collect(B) == expected_B_seq
116+
```
117+
118+
## Operations on Multiple `DArray`s
119+
120+
You can read from and write to multiple `DArray`s within a single `@stencil` block, provided they have compatible chunk structures.
121+
122+
```julia
123+
A = ones(Blocks(1, 1), Int, 2, 2)
124+
B = DArray(fill(3, 2, 2), Blocks(1, 1))
125+
C = zeros(Blocks(1, 1), Int, 2, 2)
126+
127+
Dagger.spawn_datadeps() do
128+
@stencil begin
129+
C[idx] = A[idx] + B[idx]
130+
end
131+
end
132+
@assert all(collect(C) .== 4)
133+
```
134+
135+
## Example: Game of Life
136+
137+
The following demonstrates a more complex example: Conway's Game of Life.
138+
139+
```julia
140+
# Ensure Plots and other necessary packages are available for the example
141+
using Plots
142+
143+
N = 27 # Size of one dimension of a tile
144+
nt = 3 # Number of tiles in each dimension (results in nt x nt grid of tiles)
145+
niters = 10 # Number of iterations for the animation
146+
147+
tiles = zeros(Blocks(N, N), Bool, N*nt, N*nt)
148+
outputs = zeros(Blocks(N, N), Bool, N*nt, N*nt)
149+
150+
# Create a fun initial state (e.g., a glider and some random noise)
151+
tiles[13, 14] = true
152+
tiles[14, 14] = true
153+
tiles[15, 14] = true
154+
tiles[15, 15] = true
155+
tiles[14, 16] = true
156+
# Add some random noise in one of the tiles
157+
@view(tiles[(2N+1):3N, (2N+1):3N]) .= rand(Bool, N, N)
158+
159+
160+
161+
anim = @animate for _ in 1:niters
162+
Dagger.spawn_datadeps() do
163+
@stencil begin
164+
outputs[idx] = begin
165+
nhood = @neighbors(tiles[idx], 1, Wrap())
166+
neighs = sum(nhood) - tiles[idx] # Sum neighborhood, but subtract own value
167+
if tiles[idx] && neighs < 2
168+
0 # Dies of underpopulation
169+
elseif tiles[idx] && neighs > 3
170+
0 # Dies of overpopulation
171+
elseif !tiles[idx] && neighs == 3
172+
1 # Becomes alive by reproduction
173+
else
174+
tiles[idx] # Keeps its prior value
175+
end
176+
end
177+
tiles[idx] = outputs[idx] # Update tiles for the next iteration
178+
end
179+
end
180+
heatmap(Int.(collect(outputs))) # Generate a heatmap visualization
181+
end
182+
path = mp4(anim; fps=5, show_msg=true).filename # Create an animation of the heatmaps over time
183+
```

ext/CUDAExt.jl

Lines changed: 38 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -252,24 +252,6 @@ Dagger.move(from_proc::CPUProc, to_proc::CuArrayDeviceProc, x::Function) = x
252252
Dagger.move(from_proc::CPUProc, to_proc::CuArrayDeviceProc, x::Chunk{T}) where {T<:Function} =
253253
Dagger.move(from_proc, to_proc, fetch(x))
254254

255-
# Adapt BLAS/LAPACK functions
256-
import LinearAlgebra: BLAS, LAPACK
257-
for lib in [BLAS, LAPACK]
258-
for name in names(lib; all=true)
259-
name == nameof(lib) && continue
260-
startswith(string(name), '#') && continue
261-
endswith(string(name), '!') || continue
262-
263-
for culib in [CUBLAS, CUSOLVER]
264-
if name in names(culib; all=true)
265-
fn = getproperty(lib, name)
266-
cufn = getproperty(culib, name)
267-
@eval Dagger.move(from_proc::CPUProc, to_proc::CuArrayDeviceProc, ::$(typeof(fn))) = $cufn
268-
end
269-
end
270-
end
271-
end
272-
273255
# Task execution
274256
function Dagger.execute!(proc::CuArrayDeviceProc, f, args...; kwargs...)
275257
@nospecialize f args kwargs
@@ -291,6 +273,44 @@ function Dagger.execute!(proc::CuArrayDeviceProc, f, args...; kwargs...)
291273
end
292274
end
293275

276+
# Adapt BLAS/LAPACK functions
277+
import LinearAlgebra: BLAS, LAPACK
278+
for lib in [BLAS, LAPACK]
279+
for name in names(lib; all=true)
280+
name == nameof(lib) && continue
281+
startswith(string(name), '#') && continue
282+
endswith(string(name), '!') || continue
283+
284+
for culib in [CUBLAS, CUSOLVER]
285+
if name in names(culib; all=true)
286+
fn = getproperty(lib, name)
287+
cufn = getproperty(culib, name)
288+
@eval Dagger.move(from_proc::CPUProc, to_proc::CuArrayDeviceProc, ::$(typeof(fn))) = $cufn
289+
end
290+
end
291+
end
292+
end
293+
294+
CuArray(H::Dagger.HaloArray) = convert(CuArray, H)
295+
Base.convert(::Type{C}, H::Dagger.HaloArray) where {C<:CuArray} =
296+
Dagger.HaloArray(C(H.center),
297+
C.(H.edges),
298+
C.(H.corners),
299+
H.halo_width)
300+
Adapt.adapt_structure(to::CUDA.KernelAdaptor, H::Dagger.HaloArray) =
301+
Dagger.HaloArray(adapt(to, H.center),
302+
adapt.(Ref(to), H.edges),
303+
adapt.(Ref(to), H.corners),
304+
H.halo_width)
305+
function Dagger.inner_stencil_proc!(::CuArrayDeviceProc, f, output, read_vars)
306+
Dagger.Kernel(_inner_stencil!)(f, output, read_vars; ndrange=size(output))
307+
return
308+
end
309+
@kernel function _inner_stencil!(f, output, read_vars)
310+
idx = @index(Global, Cartesian)
311+
f(idx, output, read_vars)
312+
end
313+
294314
Dagger.gpu_processor(::Val{:CUDA}) = CuArrayDeviceProc
295315
Dagger.gpu_can_compute(::Val{:CUDA}) = CUDA.has_cuda()
296316
Dagger.gpu_kernel_backend(::CuArrayDeviceProc) = CUDABackend()

ext/IntelExt.jl

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -259,6 +259,26 @@ function Dagger.execute!(proc::oneArrayDeviceProc, f, args...; kwargs...)
259259
end
260260
end
261261

262+
oneArray(H::Dagger.HaloArray) = convert(oneArray, H)
263+
Base.convert(::Type{C}, H::Dagger.HaloArray) where {C<:oneArray} =
264+
Dagger.HaloArray(C(H.center),
265+
C.(H.edges),
266+
C.(H.corners),
267+
H.halo_width)
268+
Adapt.adapt_structure(to::oneAPI.KernelAdaptor, H::Dagger.HaloArray) =
269+
Dagger.HaloArray(adapt(to, H.center),
270+
adapt.(Ref(to), H.edges),
271+
adapt.(Ref(to), H.corners),
272+
H.halo_width)
273+
function Dagger.inner_stencil_proc!(::oneArrayDeviceProc, f, output, read_vars)
274+
Dagger.Kernel(_inner_stencil!)(f, output, read_vars; ndrange=size(output))
275+
return
276+
end
277+
@kernel function _inner_stencil!(f, output, read_vars)
278+
idx = @index(Global, Cartesian)
279+
f(idx, output, read_vars)
280+
end
281+
262282
Dagger.gpu_processor(::Val{:oneAPI}) = oneArrayDeviceProc
263283
Dagger.gpu_can_compute(::Val{:oneAPI}) = oneAPI.functional()
264284
Dagger.gpu_kernel_backend(::oneArrayDeviceProc) = oneAPIBackend()

ext/MetalExt.jl

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -274,6 +274,21 @@ function Dagger.execute!(proc::MtlArrayDeviceProc, f, args...; kwargs...)
274274
end
275275
end
276276

277+
MtlArray(H::Dagger.HaloArray) = convert(MtlArray, H)
278+
Base.convert(::Type{C}, H::Dagger.HaloArray) where {C<:MtlArray} =
279+
Dagger.HaloArray(C(H.center),
280+
C.(H.edges),
281+
C.(H.corners),
282+
H.halo_width)
283+
function Dagger.inner_stencil_proc!(::MtlArrayDeviceProc, f, output, read_vars)
284+
Dagger.Kernel(_inner_stencil!)(f, output, read_vars; ndrange=size(output))
285+
return
286+
end
287+
@kernel function _inner_stencil!(f, output, read_vars)
288+
idx = @index(Global, Cartesian)
289+
f(idx, output, read_vars)
290+
end
291+
277292
function Base.show(io::IO, proc::MtlArrayDeviceProc)
278293
print(io, "MtlArrayDeviceProc(worker $(proc.owner), device $(something(_get_metal_device(proc)).name))")
279294
end

ext/OpenCLExt.jl

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -242,6 +242,26 @@ function Dagger.execute!(proc::CLArrayDeviceProc, f, args...; kwargs...)
242242
end
243243
end
244244

245+
CLArray(H::Dagger.HaloArray) = convert(CLArray, H)
246+
Base.convert(::Type{C}, H::Dagger.HaloArray) where {C<:CLArray} =
247+
Dagger.HaloArray(C(H.center),
248+
C.(H.edges),
249+
C.(H.corners),
250+
H.halo_width)
251+
Adapt.adapt_structure(to::OpenCL.KernelAdaptor, H::Dagger.HaloArray) =
252+
Dagger.HaloArray(adapt(to, H.center),
253+
adapt.(Ref(to), H.edges),
254+
adapt.(Ref(to), H.corners),
255+
H.halo_width)
256+
function Dagger.inner_stencil_proc!(::CLArrayDeviceProc, f, output, read_vars)
257+
Dagger.Kernel(_inner_stencil!)(f, output, read_vars; ndrange=size(output))
258+
return
259+
end
260+
@kernel function _inner_stencil!(f, output, read_vars)
261+
idx = @index(Global, Cartesian)
262+
f(idx, output, read_vars)
263+
end
264+
245265
Dagger.gpu_processor(::Val{:OpenCL}) = CLArrayDeviceProc
246266
Dagger.gpu_can_compute(::Val{:OpenCL}) = length(cl.platforms()) > 0
247267
Dagger.gpu_kernel_backend(::CLArrayDeviceProc) = OpenCLBackend()

0 commit comments

Comments
 (0)