You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Fix various typos, badly expressed sentences, etc.
- Streamline the "The CUDA Pipeline" section, which is repetitive and
contains a broken link to a non-existent image.
- Remove reference to LLVM 12/13, which are now very old.
- Tweak text about supporting CUDA versions; 12.x support is no longer
experimental.
- It's now `rust-toolchain.toml`, not `rust-toolchain`. And no need to
include an out-of-date copy of it in the docs.
- Remove a reference to `spirv_builder` which isn't that helpful.
- Fix broken `Dockerfile` link.
Copy file name to clipboardExpand all lines: guide/src/faq.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,7 +29,7 @@ over CUDA C/C++ with the same (or better!) performance and features, therefore,
29
29
Short answer, no.
30
30
31
31
Long answer, there are a couple of things that make this impossible:
32
-
- At the time of writing, libnvvm expects LLVM 7 bitcode, giving it LLVM 12/13 bitcode (which is what rustc uses) does not work.
32
+
- At the time of writing, libnvvm expects LLVM 7 bitcode, which is a very old format. Giving it bitcode from later LLVM version (which is what rustc uses) does not work.
33
33
- NVVM IR is a __subset__ of LLVM IR, there are tons of things that nvvm will not accept. Such as a lot of function attrs not being allowed.
34
34
This is well documented and you can find the spec [here](https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html). Not to mention
35
35
many bugs in libnvvm that i have found along the way, the most infuriating of which is nvvm not accepting integer types that arent `i1, i8, i16, i32, or i64`.
Copy file name to clipboardExpand all lines: guide/src/guide/getting_started.md
+15-31Lines changed: 15 additions & 31 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,11 +6,7 @@ This section covers how to get started writing GPU crates with `cuda_std` and `c
6
6
7
7
Before you can use the project to write GPU crates, you will need a couple of prerequisites:
8
8
9
-
-[The CUDA SDK](https://developer.nvidia.com/cuda-downloads), version `11.2-11.8` (and the appropriate driver - [see cuda release notes](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html)).
10
-
11
-
- We recently [added experimental support for the `12.x`
12
-
SDK](https://github.com/Rust-GPU/rust-cuda/issues/100), please file any issues you
13
-
see
9
+
-[The CUDA SDK](https://developer.nvidia.com/cuda-downloads), version 11.2 or later (and the appropriate driver - [see CUDA release notes](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html)).
14
10
15
11
This is only for building GPU crates, to execute built PTX you only need CUDA `9+`.
16
12
@@ -27,13 +23,14 @@ Before you can use the project to write GPU crates, you will need a couple of pr
27
23
28
24
- You may wish to use or consult the bundled [Dockerfile](#docker) to assist in your local config
29
25
30
-
## rust-toolchain
26
+
## rust-toolchain.toml
31
27
32
-
Currently, the Codegen only works on nightly (because it uses rustc internals), and it only works on a specific version of nightly.
33
-
This is why you must copy the `rust-toolchain` file in the project repository to your own project. This will ensure
34
-
you are on the correct nightly version so the codegen builds.
28
+
NVVM codegen currently requires a specific version of Rust nightly, because it uses rustc internals
29
+
that are subject to change. Therefore, you must copy the `rust-toolchain.toml` file in the project
30
+
repository so that your own project uses the correct nightly version.
35
31
36
-
Only the codegen requires nightly, `cust` and other CPU-side libraries work perfectly fine on stable.
32
+
Note: `cust` and other CPU-side libraries work with stable Rust, but they will end up being
33
+
compiled with the version of nightly specified in `rust-toolchain.toml`.
37
34
38
35
## Cargo.toml
39
36
@@ -111,9 +108,9 @@ thread, with the number of threads being decided by the caller (the CPU).
111
108
We call these parameters the launch dimensions of the kernel. Launch dimensions are split
112
109
up into two basic concepts:
113
110
114
-
- Threads, a single thread executes the GPU kernel **once**, and it makes the index
111
+
-**Threads:** A single thread executes the GPU kernel **once**, and it makes the index
115
112
of itself available to the kernel through special registers (functions in our case).
116
-
- Blocks, Blocks house multiple threads that they execute on their own. Thread indices
113
+
-**Blocks:** A single block houses multiple threads that it execute on its own. Thread indices
117
114
are only unique across the thread's block, therefore CUDA also exposes the index
118
115
of the current block.
119
116
@@ -150,8 +147,8 @@ If you have used CUDA C++ before, this should seem fairly familiar, with a few o
150
147
is unsound. The reason being that rustc assumes `&mut` does not alias. However, because every thread gets a copy of the arguments, this would cause it to alias, thereby violating
151
148
this invariant and yielding technically unsound code. Pointers do not have such an invariant on the other hand. Therefore, we use a pointer and only make a mutable reference once we
152
149
are sure the elements are disjoint: `let elem = &mut *c.add(idx);`.
153
-
- We check that the index is not out of bounds before doing anything, this is because it is
154
-
common to launch kernels with thread amounts that are not exactly divisible by the length for optimization.
150
+
- We check that the index is not out of bounds before doing anything, because it is common to
151
+
launch kernels with thread counts that are not exactly divisible by the length for optimization.
155
152
156
153
Internally what this does is it first checks that a couple of things are right in the kernel:
157
154
@@ -165,8 +162,7 @@ It also applies `#[no_mangle]` so the name of the kernel is the same as it is de
165
162
## Building the GPU crate
166
163
167
164
Now that you have some kernels defined in a crate, you can build them easily using `cuda_builder`.
168
-
`cuda_builder` is a helper crate similar to `spirv_builder` (if you have used rust-gpu before), it builds
169
-
GPU crates while passing everything needed by rustc.
165
+
which builds GPU crates while passing everything needed by rustc.
170
166
171
167
To use it you can simply add it as a build dependency in your CPU crate (the crate running the GPU kernels):
Don't forget to include the current `rust-toolchain.toml` at the top of your project.
232
216
233
217
## Docker
234
218
235
-
There is also a [Dockerfile](Dockerfile) prepared as a quickstart with all the necessary libraries for base cuda development.
219
+
There are also some [Dockerfiles](https://github.com/Rust-GPU/rust-cuda/tree/main/container) prepared as a quickstart with all the necessary libraries for base CUDA development.
236
220
237
221
You can use it as follows (assuming your clone of Rust CUDA is at the absolute path `RUST_CUDA`):
238
222
@@ -244,7 +228,7 @@ You can use it as follows (assuming your clone of Rust CUDA is at the absolute p
244
228
245
229
**Notes:**
246
230
247
-
1. refer to [rust-toolchain](#rust-toolchain) to ensure you are using the correct toolchain in your project.
231
+
1. refer to [rust-toolchain.toml](#rust-toolchain.toml) to ensure you are using the correct toolchain in your project.
248
232
2. despite using Docker, your machine will still need to be running a compatible driver, in this case for Cuda 11.4.1 it is >=470.57.02
249
233
3. if you have issues within the container, it can help to start ensuring your gpu is recognized
250
234
- ensure `nvidia-smi` provides meaningful output in the container
0 commit comments