Skip to content

Commit ac2f258

Browse files
authored
Merge pull request #2 from JuliaGPU/vc/docs
Cleanup docs and remove ScalarCPU
2 parents b9d6d02 + 6ba503f commit ac2f258

File tree

13 files changed

+168
-122
lines changed

13 files changed

+168
-122
lines changed

.travis.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
## Documentation: http://docs.travis-ci.com/user/languages/julia/
22
language: julia
3+
branches:
4+
only:
5+
- master
36
os:
47
- linux
58
- osx

docs/make.jl

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,20 @@ using Documenter, KernelAbstractions
22

33
makedocs(
44
modules = [KernelAbstractions],
5-
sitename = "KernelAbstractions.jl",
5+
sitename = "KernelAbstractions",
66
format = Documenter.HTML(
77
prettyurls = get(ENV, "CI", nothing) == "true"
88
),
99
pages = [
10-
"Home" => "index.md",
11-
"Kernel Language" => "kernels.md",
12-
"Design" => "design.md"
10+
"Home" => "index.md",
11+
"Writing kernels" => "kernels.md",
12+
"Examples" => [
13+
"examples/memcopy.md"
14+
],
15+
"API" => "api.md",
16+
"Extras" => [
17+
"extras/unrolling.md"
18+
]
1319
],
1420
doctest = true
1521
)

docs/src/api.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# API
2+
3+
## [Kernel language](@id api_kernel_language)
4+
5+
```@docs
6+
@kernel
7+
@Const
8+
@index
9+
@localmem
10+
@private
11+
@synchronize
12+
```
13+
14+
## Host interface
15+
16+
## Internal
17+
18+
```@docs
19+
KernelAbstractions.Kernel
20+
KernelAbstractions.partition
21+
```

docs/src/examples/memcopy.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Memcopy
2+
3+
The first example simple copies memory from `A` to `B`
4+
5+
````@eval
6+
using Markdown
7+
using KernelAbstractions
8+
path = joinpath(dirname(pathof(KernelAbstractions)), "..", "examples/memcopy.jl")
9+
Markdown.parse("""
10+
```julia
11+
$(read(path, String))
12+
```
13+
""")
14+
````

docs/src/extras/unrolling.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Unroll macro
2+
3+
```@meta
4+
CurrentModule = KernelAbstractions.Extras
5+
```
6+
7+
```@docs
8+
@unroll
9+
```

docs/src/index.md

Lines changed: 54 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,70 @@
1-
# KernelAbstractions.jl
1+
# KernelAbstractions
22

3-
```@contents
4-
```
3+
`KernelAbstractions.jl` is a package that allows you to write GPU-like kernels that
4+
target different execution backends. It is intended to be a minimal, and performant
5+
library that explores ways to best write heterogenous code.
56

6-
## Kernel functions
7+
!!! note
8+
While `KernelAbstraction.jl` is focused on performance portatbility, it is GPU-biased
9+
and therefore the kernel language has several constructs that are necessary for good
10+
performance on the GPU, but may hurt performance on the CPU.
711

8-
```@docs
9-
@kernel
10-
```
12+
## Quickstart
1113

12-
### Marking input arguments as constant
14+
### Writing your first kernel
1315

14-
```@docs
15-
@Const
16+
Kernel functions have to be marked with the [`@kernel`](@ref). Inside the `@kernel` macro
17+
you can use the [kernel language](@ref api_kernel_language). As an example the `mul2` kernel
18+
below will multiply each element of the array `A` by `2`. It uses the [`@index`](@ref) macro
19+
to obtain the global linear index of the current workitem.
20+
21+
```julia
22+
@kernel function mul2(A)
23+
I = @index(Global)
24+
A[I] = 2 * A[I]
25+
end
1626
```
1727

18-
## Important difference to Julia
28+
### Launching your first kernel
29+
30+
You can construct a kernel for a specific backend by calling the kernel function
31+
with the first argument being the device kind, the second argument being the size
32+
of the workgroup and the third argument being a static `ndrange`. The second and
33+
third argument are optional. After instantiating the kernel you can launch it by
34+
calling the kernel object with the right arguments and some keyword arguments that
35+
configure the specific launch. The example below creates a kernel with a static
36+
workgroup size of `16` and a dynamic `ndrange`. Since the `ndrange` is dynamic it
37+
has to be provided for the launch as a keyword argument.
38+
39+
```julia
40+
A = ones(1024, 1024)
41+
kernel = mul2(CPU(), 16)
42+
event = kernel(A, ndrange=size(A))
43+
wait(event)
44+
all(A .== 2.0)
45+
```
1946

20-
- Functions inside kernels are forcefully inlined, except when marked with `@noinline`
21-
- Floating-point multiplication, addition, subtraction are marked contractable
47+
!!! danger
48+
All kernel launches are asynchronous, each kernel produces an event token that
49+
has to be waited upon, before reading or writing memory that was passed as an
50+
argument to the kernel. See [dependencies](@ref dependencies) for a full
51+
explanation.
2252

23-
## Examples
24-
### Memory copy
53+
## Important difference to Julia
2554

26-
The first example simple copies memory from `A` to `B`
55+
1. Functions inside kernels are forcefully inlined, except when marked with `@noinline`.
56+
2. Floating-point multiplication, addition, subtraction are marked contractable.
2757

58+
## Important differences to CUDAnative
2859

29-
````@eval
30-
using Markdown
31-
Markdown.parse("""
32-
```julia
33-
$(read("../../examples/memcopy.jl", String))
34-
```
35-
""")
36-
````
60+
1. The kernels are automatically bounds-checked against either the dynamic or statically
61+
provided `ndrange`.
62+
2. Functions like `Base.sin` are mapped to `CUDAnative.sin`.
3763

3864
## How to debug kernels
3965

66+
*TODO*
67+
4068
## How to profile kernels
69+
70+
*TODO*

docs/src/kernels.md

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,20 @@
1-
# Kernel language operations
1+
# Writing kernels
22

3-
These kernel language constructs are intendend to be used as part
3+
These kernel language constructs are intended to be used as part
44
of [`@kernel`](@ref) functions and not outside that context.
55

6-
```@docs
7-
@shmem
8-
@scratchpad
9-
@index
10-
@synchronize
11-
```
6+
## Constant arguments
127

13-
## Memory kinds
14-
### Shared memory:
15-
### Scratch memory
8+
[`@Const`](@ref)
9+
10+
## Indexing
11+
12+
There are several [`@index`](@ref) variants.
13+
14+
## Local memory, variable lifetime and private memory
15+
16+
[`@localmem`](@ref), [`@synchronize`](@ref), [`@private`](@ref)
17+
18+
# Launching kernels
19+
20+
## [Kernel dependencies](@id dependencies)

examples/memcopy.jl

Lines changed: 3 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -7,43 +7,20 @@ using Test
77
@inbounds A[I] = B[I]
88
end
99

10-
A = zeros(128, 128)
11-
B = ones(128, 128)
12-
1310
function mycopy!(A::Array, B::Array)
1411
@assert size(A) == size(B)
15-
kernel = copy_kernel!(ScalarCPU(), 32)
12+
kernel = copy_kernel!(CPU(), 8)
1613
kernel(A, B, ndrange=length(A))
1714
end
1815

1916
function mycopy_static!(A::Array, B::Array)
2017
@assert size(A) == size(B)
21-
kernel = copy_kernel!(ScalarCPU(), 32, size(A)) # if size(A) varies this will cause recompilation
22-
kernel(A, B, ndrange=size(A))
23-
end
24-
25-
event = mycopy!(A, B)
26-
wait(event)
27-
@test A == B
28-
29-
A = zeros(128, 128)
30-
event = mycopy_static!(A, B)
31-
wait(event)
32-
@test A == B
33-
34-
function mycopy!(A::Array, B::Array)
35-
@assert size(A) == size(B)
36-
kernel = copy_kernel!(ThreadedCPU(), 8)
37-
kernel(A, B, ndrange=length(A))
38-
end
39-
40-
function mycopy_static!(A::Array, B::Array)
41-
@assert size(A) == size(B)
42-
kernel = copy_kernel!(ThreadedCPU(), 32, size(A)) # if size(A) varies this will cause recompilation
18+
kernel = copy_kernel!(CPU(), 32, size(A)) # if size(A) varies this will cause recompilation
4319
kernel(A, B, ndrange=size(A))
4420
end
4521

4622
A = zeros(128, 128)
23+
B = ones(128, 128)
4724
event = mycopy!(A, B)
4825
wait(event)
4926
@test A == B

src/KernelAbstractions.jl

Lines changed: 30 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,23 @@
11
module KernelAbstractions
22

33
export @kernel
4-
export @shmem, @scratchpad, @synchronize, @index
5-
export Device, GPU, CPU, CUDA, ScalarCPU, ThreadedCPU
4+
export @Const, @localmem, @private, @synchronize, @index
5+
export Device, GPU, CPU, CUDA
66

77
using StaticArrays
88
using Cassette
99
using Requires
1010

11-
###
11+
"""
12+
@kernel function f(args) end
13+
"""
1214
macro kernel end
1315

16+
"""
17+
@Const(A)
18+
"""
19+
macro Const end
20+
1421
abstract type Event end
1522
import Base.wait
1623

@@ -21,31 +28,46 @@ function async_copy! end
2128

2229
###
2330
# Kernel language
24-
# - @shmem
25-
# - @scratchpad
31+
# - @localmem
32+
# - @private
2633
# - @synchronize
2734
# - @index
2835
###
2936

3037
const shmem_id = Ref(0)
31-
macro shmem(T, dims)
38+
39+
"""
40+
@localmem T dims
41+
"""
42+
macro localmem(T, dims)
3243
id = (shmem_id[]+= 1)
3344

3445
quote
3546
$SharedMemory($(esc(T)), Val($(esc(dims))), Val($id))
3647
end
3748
end
3849

39-
macro scratchpad(T, dims)
50+
"""
51+
@private T dims
52+
"""
53+
macro private(T, dims)
4054
quote
4155
$Scratchpad($(esc(T)), Val($(esc(dims))))
4256
end
4357
end
4458

59+
"""
60+
@synchronize()
61+
"""
4562
macro synchronize()
4663
@error "@synchronize not captured or used outside @kernel"
4764
end
4865

66+
"""
67+
@index(Global)
68+
@index(Local)
69+
@index(Global, Cartesian)
70+
"""
4971
macro index(locale, args...)
5072
if !(locale === :Global || locale === :Local)
5173
error("@index requires as first argument either :Global or :Local")
@@ -85,11 +107,8 @@ function __index_Global_Cartesian end
85107

86108
abstract type Device end
87109
abstract type GPU <: Device end
88-
abstract type CPU <: Device end
89-
90-
struct ScalarCPU <: CPU end
91-
struct ThreadedCPU <: CPU end
92110

111+
struct CPU <: Device end
93112
struct CUDA <: GPU end
94113
# struct AMD <: GPU end
95114
# struct Intel <: GPU end

0 commit comments

Comments
 (0)