You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Clarify use of the word "codegen", and use sentence case for headings.
[I accidentally squashed two commits, and can't be bothered separating
them.]
The existing text uses "codegen" frequently as a shorthand for "codegen
backend". I found this confusing and distracting. ("Codegens" is even
worse.) This commit replaces these uses with "codegen backend" (or
occasionally something else more appropriate).
The commit preserves the use of "codegen" for the act of code generation,
e.g. "during codegen we do XYZ", because that's more standard.
Also, currently headings are a mix of sentence case ("The quick brown
fox") and title case ("The Quick Brown Fox"). Title case is extremely
formal, so sentence case feels more natural here.
Copy file name to clipboardExpand all lines: guide/src/faq.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
# Frequently Asked Questions
1
+
# Frequently asked questions
2
2
3
3
This page will cover a lot of the questions people often have when they encounter this project,
4
4
so they are addressed all at once.
@@ -14,8 +14,8 @@ This can be circumvented by building LLVM in a special way, but this is far beyo
14
14
which yield considerable performance differences (especially on more complex kernels with more information in the IR).
15
15
- For some reason (either rustc giving weird LLVM IR or the LLVM PTX backend being broken) the LLVM PTX backend often
16
16
generates completely invalid PTX for trivial programs, so it is not an acceptable workflow for a production pipeline.
17
-
- GPU and CPU codegen is fundamentally different, creating a codegen that is only for the GPU allows us to
18
-
seamlessly implement features which would have been impossible or very difficult to implement in the existing codegen, such as:
17
+
- GPU and CPU codegen is fundamentally different, creating a codegen backend that is only for the GPU allows us to
18
+
seamlessly implement features which would have been impossible or very difficult to implement in the existing codegen backend, such as:
19
19
- Shared memory, this requires some special generation of globals with custom addrspaces, its just not possible to do without backend explicit handling.
20
20
- Custom linking logic to do dead code elimination so as to not end up with large PTX files full of dead functions/globals.
21
21
- Stripping away everything we do not need, no complex ABI handling, no shared lib handling, control over how function calls are generated, etc.
@@ -33,7 +33,7 @@ Long answer, there are a couple of things that make this impossible:
33
33
- NVVM IR is a __subset__ of LLVM IR, there are tons of things that NVVM will not accept. Such as a lot of function attrs not being allowed.
34
34
This is well documented and you can find the spec [here](https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html). Not to mention
35
35
many bugs in libNVVM that I have found along the way, the most infuriating of which is nvvm not accepting integer types that arent `i1, i8, i16, i32, or i64`.
36
-
This required special handling in the codegen to convert these "irregular" types into vector types.
36
+
This required special handling in the codegen backend to convert these "irregular" types into vector types.
37
37
38
38
## What is the point of using Rust if a lot of things in kernels are unsafe?
Copy file name to clipboardExpand all lines: guide/src/guide/compute_capabilities.md
+16-16Lines changed: 16 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,9 +1,9 @@
1
-
# Compute Capability Gating
1
+
# Compute capability gating
2
2
3
3
This section covers how to write code that adapts to different CUDA compute capabilities
4
4
using conditional compilation.
5
5
6
-
## What are Compute Capabilities?
6
+
## What are compute capabilities?
7
7
8
8
CUDA GPUs have different "compute capabilities" that determine which features they
9
9
support. Each capability is identified by a version number like `3.5`, `5.0`, `6.1`,
@@ -17,7 +17,7 @@ For example:
17
17
18
18
For comprehensive details, see [NVIDIA's CUDA documentation on GPU architectures](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#gpu-compilation).
19
19
20
-
## Virtual vs Real Architectures
20
+
## Virtual vs real Architectures
21
21
22
22
In CUDA terminology:
23
23
@@ -28,7 +28,7 @@ In CUDA terminology:
28
28
Rust CUDA works exclusively with virtual architectures since it only generates PTX. The
29
29
`NvvmArch::ComputeXX` enum values correspond to CUDA's virtual architectures.
30
30
31
-
## Using Target Features
31
+
## Using target features
32
32
33
33
When building your kernel, the `NvvmArch::ComputeXX` variant you choose enables specific
34
34
`target_feature` flags. These can be used with `#[cfg(...)]` to conditionally compile
@@ -51,12 +51,12 @@ which `NvvmArch::ComputeXX` is used to build the kernel, there is a different an
51
51
These features let you write optimized code paths for specific GPU generations while
52
52
still supporting older ones.
53
53
54
-
## Specifying Compute Capabilites
54
+
## Specifying compute capabilites
55
55
56
56
Starting with CUDA 12.9, NVIDIA introduced architecture suffixes that affect
57
57
compatibility.
58
58
59
-
### Base Architecture (No Suffix)
59
+
### Base architecture (no suffix)
60
60
61
61
Example: `NvvmArch::Compute70`
62
62
@@ -79,7 +79,7 @@ CudaBuilder::new("kernels")
79
79
#[cfg(target_feature ="compute_80")] // ✗ Fail (higher base variant)
80
80
```
81
81
82
-
### Family Suffix ('f')
82
+
### Family suffix ('f')
83
83
84
84
Example: `NvvmArch::Compute101f`
85
85
@@ -108,7 +108,7 @@ CudaBuilder::new("kernels")
108
108
#[cfg(target_feature ="compute_110")] // ✗ Fail (higher base variant)
109
109
```
110
110
111
-
### Architecture Suffix ('a')
111
+
### Architecture suffix ('a')
112
112
113
113
Example: `NvvmArch::Compute100a`
114
114
@@ -142,7 +142,7 @@ Note: While the 'a' variant enables all these features during compilation (allow
142
142
143
143
For more details on suffixes, see [NVIDIA's blog post on family-specific architecture features](https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/).
144
144
145
-
### Manual Compilation (Without`cuda_builder`)
145
+
### Manual compilation (without`cuda_builder`)
146
146
147
147
If you're invoking `rustc` directly instead of using `cuda_builder`, you only need to specify the architecture through LLVM args:
Copy file name to clipboardExpand all lines: guide/src/guide/getting_started.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,16 +1,16 @@
1
-
# Getting Started
1
+
# Getting started
2
2
3
3
This section covers how to get started writing GPU crates with `cuda_std` and `cuda_builder`.
4
4
5
-
## Required Libraries
5
+
## Required libraries
6
6
7
7
Before you can use the project to write GPU crates, you will need a couple of prerequisites:
8
8
9
9
-[The CUDA SDK](https://developer.nvidia.com/cuda-downloads), version 11.2 or later (and the appropriate driver - [see CUDA release notes](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html)).
10
10
11
11
This is only for building GPU crates, to execute built PTX you only need CUDA `9+`.
12
12
13
-
- LLVM 7.x (7.0 to 7.4), The codegen searches multiple places for LLVM:
13
+
- LLVM 7.x (7.0 to 7.4), The codegen backend searches multiple places for LLVM:
14
14
15
15
- If `LLVM_CONFIG` is present, it will use that path as `llvm-config`.
16
16
- Or, if `llvm-config` is present as a binary, it will use that, assuming that `llvm-config --version` returns `7.x.x`.
@@ -60,7 +60,7 @@ We changed our crate's crate types to `cdylib` and `rlib`. We specified `cdylib`
60
60
61
61
## `lib.rs`
62
62
63
-
Before we can write any GPU kernels, we must add a few directives to our `lib.rs` which are required by the codegen:
63
+
Before we can write any GPU kernels, we must add a few directives to our `lib.rs` which are required by the codegen backend:
64
64
65
65
```rs
66
66
#![cfg_attr(
@@ -76,7 +76,7 @@ This does a couple of things:
76
76
77
77
- It only applies the attributes if we are compiling the crate for the GPU (target_os = "cuda").
78
78
- It declares the crate to be `no_std` on CUDA targets.
79
-
- It registers a special attribute required by the codegen for things like figuring out
79
+
- It registers a special attribute required by the codegen backend for things like figuring out
80
80
what functions are GPU kernels.
81
81
- It explicitly includes `kernel` macro and `thread`
82
82
@@ -156,7 +156,7 @@ Internally what this does is it first checks that a couple of things are right i
156
156
- The function is `unsafe`.
157
157
- The function does not return anything.
158
158
159
-
Then it declares this kernel to the codegen so that the codegen can tell CUDA this is a GPU kernel.
159
+
Then it declares this kernel to the codegen backend so it can tell CUDA this is a GPU kernel.
160
160
It also applies `#[no_mangle]` so the name of the kernel is the same as it is declared in the code.
Copy file name to clipboardExpand all lines: guide/src/guide/kernel_abi.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
# Kernel ABI
2
2
3
-
This section details how parameters are passed to GPU kernels by the Codegen at the current time.
4
-
In other words, how the codegen expects you to pass different types to GPU kernels from the CPU.
3
+
This section details how parameters are passed to GPU kernels by the codegen backend. In other
4
+
words, how the codegen backend expects you to pass different types to GPU kernels from the CPU.
5
5
6
6
⚠️ If you find any bugs in the ABI please report them. ⚠️
7
7
@@ -15,7 +15,7 @@ other ABI we override purely to avoid footguns.
15
15
16
16
Functions marked as `#[kernel]` are enforced to be `extern "C"` by the kernel macro, and it is expected
17
17
that __all__ GPU kernels be `extern "C"`, not that you should be declaring any kernels without the `#[kernel]` macro,
18
-
because the codegen/`cuda_std` is allowed to rely on the behavior of `#[kernel]` for correctness.
18
+
because the codegen backend/`cuda_std` is allowed to rely on the behavior of `#[kernel]` for correctness.
19
19
20
20
## Structs
21
21
@@ -119,7 +119,7 @@ unsafe {
119
119
}
120
120
```
121
121
122
-
You may get warnings about slices being an improper C-type, but the warnings are safe to ignore, the codegen guarantees
122
+
You may get warnings about slices being an improper C-type, but the warnings are safe to ignore, the codegen backend guarantees
123
123
that slices are passed as pairs of params.
124
124
125
125
You cannot however pass mutable slices, this is because it would violate aliasing rules, each thread receiving a copy of the mutable
@@ -135,7 +135,7 @@ ZSTs (zero-sized types) are ignored and become nothing in the final PTX.
135
135
Primitive types are passed directly by value, same as structs. They map to the special PTX types `.s8`, `.s16`, `.s32`, `.s64`, `.u8`, `.u16`, `.u32`, `.u64`, `.f32`, and `.f64`.
136
136
With the exception that `u128` and `i128` are passed as byte arrays (but this has no impact on how they are passed from the CPU).
137
137
138
-
## References And Pointers
138
+
## References And pointers
139
139
140
140
References and Pointers are both passed as expected, as pointers. It is therefore expected that you pass such parameters using device memory:
0 commit comments