Rust-GPU · nnethercote · Nov 19, 2025 · Nov 19, 2025 · Nov 19, 2025 · Nov 19, 2025
diff --git a/guide/book.toml b/guide/book.toml
@@ -2,5 +2,5 @@
 authors = ["Riccardo D'Ambrosio<[email protected]>"]
 language = "en"
 src = "src"
-title = "GPU Computing with Rust using CUDA"
-description = "Writing extremely fast GPU Computing code with rust using rustc_codegen_nvvm and CUDA"
+title = "The Rust CUDA Guide"
+description = "How to write GPU compute code with Rust using rustc_codegen_nvvm and CUDA"
diff --git a/guide/src/README.md b/guide/src/README.md
@@ -1,3 +1,3 @@
 # Introduction
 
-Welcome to the rust-cuda guide! Let's dive right in.
+Welcome to the Rust CUDA guide! Let's dive right in.
diff --git a/guide/src/SUMMARY.md b/guide/src/SUMMARY.md
@@ -12,9 +12,9 @@
 - [The CUDA Toolkit](cuda/README.md)
   - [GPU Computing](cuda/gpu_computing.md)
   - [The CUDA Pipeline](cuda/pipeline.md)
-- [rustc_codegen_nvvm](nvvm/README.md)
-  - [Custom Rustc Backends](nvvm/backends.md)
-  - [rustc_codegen_nvvm](nvvm/nvvm.md)
+- [`rustc_codegen_nvvm`](nvvm/README.md)
+  - [Custom rustc Backends](nvvm/backends.md)
+  - [`rustc_codegen_nvvm`](nvvm/nvvm.md)
   - [Types](nvvm/types.md)
   - [PTX Generation](nvvm/ptxgen.md)
   - [Debugging](nvvm/debugging.md)
diff --git a/guide/src/cuda/README.md b/guide/src/cuda/README.md
@@ -2,7 +2,7 @@
 
 The CUDA Toolkit is an ecosystem for executing extremely fast code on NVIDIA GPUs for the purpose of general computing.
 
-CUDA includes many libraries for this purpose, including the Driver API, Runtime API, the PTX ISA, libnvvm, etc. CUDA
+CUDA includes many libraries for this purpose, including the Driver API, Runtime API, the PTX ISA, libNVVM, etc. CUDA
 is currently the best option for computing in terms of libraries and control available, however, it unfortunately only works
 on NVIDIA GPUs.
 

diff --git a/guide/src/cuda/gpu_computing.md b/guide/src/cuda/gpu_computing.md
@@ -1,4 +1,4 @@
-# GPU Computing
+# GPU computing
 
 You probably already know what GPU computing is, but if you don't, it is utilizing the extremely parallel
 nature of GPUs for purposes other than rendering. It is widely used in many scientific and consumer fields.
@@ -13,41 +13,41 @@ of time and/or take different code paths.
 
 CUDA is currently one of the best choices for fast GPU computing for multiple reasons:
 - It offers deep control over how kernels are dispatched and how memory is managed.
-- It has a rich ecosystem of tutorials, guides, and libraries such as cuRand, cuBlas, libnvvm, optix, the PTX ISA, etc.
+- It has a rich ecosystem of tutorials, guides, and libraries such as cuRAND, cuBLAS, libNVVM, OptiX, the PTX ISA, etc.
 - It is mostly unmatched in performance because it is solely meant for computing and offers rich control.
 And more...
 
-However, CUDA can only run on NVIDIA GPUs, which precludes AMD gpus from tools that use it. However, this is a drawback that 
-is acceptable by many because of the significant developer cost of supporting both NVIDIA gpus with CUDA and 
-AMD gpus with OpenCL, since OpenCL is generally slower, clunkier, and lacks libraries and docs on par with CUDA.
+However, CUDA can only run on NVIDIA GPUs, which precludes AMD GPUs from tools that use it. However, this is a drawback that 
+is acceptable by many because of the significant developer cost of supporting both NVIDIA GPUs with CUDA and 
+AMD GPUs with OpenCL, since OpenCL is generally slower, clunkier, and lacks libraries and docs on par with CUDA.
 
 # Why Rust?
 
-Rust is a great choice for GPU programming, however, it has needed a kickstart, which is what rustc_codegen_nvvm tries to 
+Rust is a great choice for GPU programming, however, it has needed a kickstart, which is what `rustc_codegen_nvvm` tries to 
 accomplish; The initial hurdle of getting Rust to compile to something CUDA can run is over, now comes the design and 
 polish part. 
 
 On top of its rich language features (macros, enums, traits, proc macros, great errors, etc), Rust's safety guarantees
-can be applied in gpu programming too; A field that has historically been full of implied invariants and unsafety, such
+can be applied in GPU programming too; A field that has historically been full of implied invariants and unsafety, such
 as (but not limited to):
 - Expecting some amount of dynamic shared memory from the caller.
 - Expecting a certain layout for thread blocks/threads.
 - Manually handling the indexing of data, leaving code prone to data races if not managed correctly.
 - Forgetting to free memory, using uninitialized memory, etc.
 
-Not to mention the standardized tooling that makes the building, documentation, sharing, and linting of gpu kernel libraries easily possible.
-Most of the reasons for using rust on the CPU apply to using Rust for the GPU, these reasons have been stated countless times so
-i will not repeat them here. 
+Not to mention the standardized tooling that makes the building, documentation, sharing, and linting of GPU kernel libraries easily possible.
+Most of the reasons for using Rust on the CPU apply to using Rust for the GPU, these reasons have been stated countless times so
+I will not repeat them here. 
 
-A couple of particular rust features make writing CUDA code much easier: RAII and Results.
+A couple of particular Rust features make writing CUDA code much easier: RAII and Results.
 In `cust` everything uses RAII (through `Drop` impls) to manage freeing memory and returning handles, which 
 frees users from having to think about that, which yields safer, more reliable code.
 
-Results are particularly helpful, almost every single call in every CUDA library returns a status code in the form of a cuda result.
+Results are particularly helpful, almost every single call in every CUDA library returns a status code in the form of a CUDA result.
 Ignoring these statuses is very dangerous and can often lead to random segfaults and overall unreliable code. For this purpose,
 both the CUDA SDK, and other libraries provide macros to handle such statuses. This handling is not very reliable and causes
 dependency issues down the line. 
 
-Instead of an unreliable system of macros, we can leverage rust results for this. In cust we return special `CudaResult<T>`
-results that can be bubbled up using rust's `?` operator, or, similar to `CUDA_SAFE_CALL` can be unwrapped or expected if 
+Instead of an unreliable system of macros, we can leverage Rust results for this. In cust we return special `CudaResult<T>`
-Instead of an unreliable system of macros, we can leverage Rust results for this. In cust we return special `CudaResult<T>`
+Instead of an unreliable system of macros, we can leverage Rust results for this. In `cust` we return special `CudaResult<T>`
-Instead of an unreliable system of macros, we can leverage Rust results for this. In cust we return special `CudaResult<T>`
+Instead of an unreliable system of macros, we can leverage Rust results for this. In `cust` we return special `CudaResult<T>`
+results that can be bubbled up using Rust's `?` operator, or, similar to `CUDA_SAFE_CALL` can be unwrapped or expected if 
 proper error handling is not needed. 
diff --git a/guide/src/cuda/pipeline.md b/guide/src/cuda/pipeline.md
@@ -1,4 +1,4 @@
-# The CUDA Pipeline
+# The CUDA pipeline
 
 CUDA is traditionally used via CUDA C/C++ files which have a `.cu` extension. These files can be
 compiled using NVCC (NVIDIA CUDA Compiler) into an executable.
@@ -19,13 +19,13 @@ with additional restrictions including the following.
 - Some linkage types are not supported.
 - Function ABIs are ignored; everything uses the PTX calling convention.
 
-libnvvm is a closed source library which takes NVVM IR, optimizes it further, then converts it to
+libNVVM is a closed source library which takes NVVM IR, optimizes it further, then converts it to
 PTX. PTX is a low level, assembly-like format with an open specification which can be targeted by
 any language. For an assembly format, PTX is fairly user-friendly.
 - It is well formatted.
 - It is mostly fully specified (other than the iffy grammar specification).
 - It uses named registers/parameters.
-- It uses virtual registers. (Because gpus have thousands of registers, listing all of them out
+- It uses virtual registers. (Because GPUs have thousands of registers, listing all of them out
   would be unrealistic.)
 - It uses ASCII as a file encoding.