Clarify use of the word "codegen", and use sentence case for headings.

nnethercote · nnethercote · commit 8bc37a65b23f · 2025-11-21T21:47:04.000+11:00
[I accidentally squashed two commits, and can't be bothered separating
them.]

The existing text uses "codegen" frequently as a shorthand for "codegen
backend". I found this confusing and distracting. ("Codegens" is even
worse.) This commit replaces these uses with "codegen backend" (or
occasionally something else more appropriate).

The commit preserves the use of "codegen" for the act of code generation,
e.g. "during codegen we do XYZ", because that's more standard.

Also, currently headings are a mix of sentence case ("The quick brown
fox") and title case ("The Quick Brown Fox"). Title case is extremely
formal, so sentence case feels more natural here.
diff --git a/guide/src/cuda/gpu_computing.md b/guide/src/cuda/gpu_computing.md
@@ -1,4 +1,4 @@
-# GPU Computing
+# GPU computing
 
 You probably already know what GPU computing is, but if you don't, it is utilizing the extremely parallel
 nature of GPUs for purposes other than rendering. It is widely used in many scientific and consumer fields.
diff --git a/guide/src/cuda/pipeline.md b/guide/src/cuda/pipeline.md
@@ -1,4 +1,4 @@
-# The CUDA Pipeline
+# The CUDA pipeline
 
 CUDA is traditionally used via CUDA C/C++ files which have a `.cu` extension. These files can be
 compiled using NVCC (NVIDIA CUDA Compiler) into an executable.
diff --git a/guide/src/faq.md b/guide/src/faq.md
@@ -1,4 +1,4 @@
-# Frequently Asked Questions 
+# Frequently asked questions 
 
 This page will cover a lot of the questions people often have when they encounter this project,
 so they are addressed all at once.
@@ -14,8 +14,8 @@ This can be circumvented by building LLVM in a special way, but this is far beyo
 which yield considerable performance differences (especially on more complex kernels with more information in the IR).
 - For some reason (either rustc giving weird LLVM IR or the LLVM PTX backend being broken) the LLVM PTX backend often
 generates completely invalid PTX for trivial programs, so it is not an acceptable workflow for a production pipeline.
-- GPU and CPU codegen is fundamentally different, creating a codegen that is only for the GPU allows us to 
-seamlessly implement features which would have been impossible or very difficult to implement in the existing codegen, such as:
+- GPU and CPU codegen is fundamentally different, creating a codegen backend that is only for the GPU allows us to 
+seamlessly implement features which would have been impossible or very difficult to implement in the existing codegen backend, such as:
   - Shared memory, this requires some special generation of globals with custom addrspaces, its just not possible to do without backend explicit handling.
   - Custom linking logic to do dead code elimination so as to not end up with large PTX files full of dead functions/globals.
   - Stripping away everything we do not need, no complex ABI handling, no shared lib handling, control over how function calls are generated, etc.
@@ -33,7 +33,7 @@ Long answer, there are a couple of things that make this impossible:
 - NVVM IR is a __subset__ of LLVM IR, there are tons of things that NVVM will not accept. Such as a lot of function attrs not being allowed. 
 This is well documented and you can find the spec [here](https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html). Not to mention
 many bugs in libNVVM that I have found along the way, the most infuriating of which is nvvm not accepting integer types that arent `i1, i8, i16, i32, or i64`.
-This required special handling in the codegen to convert these "irregular" types into vector types.
+This required special handling in the codegen backend to convert these "irregular" types into vector types.
 
 ## What is the point of using Rust if a lot of things in kernels are unsafe?
 
diff --git a/guide/src/features.md b/guide/src/features.md
@@ -1,4 +1,4 @@
-# Supported Features 
+# Supported features 
 
 This page is used for tracking Cargo/Rust and CUDA features that are currently supported 
 or planned to be supported in the future. As well as tracking some information about how they could 
@@ -14,7 +14,7 @@ around to adding it yet.
 | ✔️ | Fully Supported |
 | 🟨 | Partially Supported |
 
-# Rust Features
+# Rust features
 
 | Feature Name | Support Level | Notes |
 | ------------ | ------------- | ----- |
@@ -40,7 +40,7 @@ around to adding it yet.
 | Float Ops | ✔️ | Maps to libdevice intrinsics, calls to libm are not intercepted though, which we may want to do in the future |
 | Atomics | ❌ | 
 
-# CUDA Libraries
+# CUDA libraries
 
 | Library Name | Support Level | Notes |
 | ------------ | ------------- | ----- |
@@ -54,9 +54,9 @@ around to adding it yet.
 | cuSPARSE | ❌ |
 | AmgX | ❌ |
 | cuTENSOR | ❌ |
-| OptiX | 🟨 | CPU OptiX is mostly complete, GPU OptiX is still heavily in-progress because it needs support from the codegen | 
+| OptiX | 🟨 | CPU OptiX is mostly complete, GPU OptiX is still heavily in-progress because it needs support from the codegen backend | 
 
-# GPU-side Features
+# GPU-side features
 
 Note: Most of these categories are used __very__ rarely in CUDA code, therefore
 do not be alarmed that it seems like many things are not supported. We just focus
diff --git a/guide/src/guide/compute_capabilities.md b/guide/src/guide/compute_capabilities.md
@@ -1,9 +1,9 @@
-# Compute Capability Gating
+# Compute capability gating
 
 This section covers how to write code that adapts to different CUDA compute capabilities
 using conditional compilation.
 
-## What are Compute Capabilities?
+## What are compute capabilities?
 
 CUDA GPUs have different "compute capabilities" that determine which features they
 support. Each capability is identified by a version number like `3.5`, `5.0`, `6.1`,
@@ -17,7 +17,7 @@ For example:
 
 For comprehensive details, see [NVIDIA's CUDA documentation on GPU architectures](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#gpu-compilation).
 
-## Virtual vs Real Architectures
+## Virtual vs real Architectures
 
 In CUDA terminology:
 
@@ -28,7 +28,7 @@ In CUDA terminology:
 Rust CUDA works exclusively with virtual architectures since it only generates PTX. The
 `NvvmArch::ComputeXX` enum values correspond to CUDA's virtual architectures.
 
-## Using Target Features
+## Using target features
 
 When building your kernel, the `NvvmArch::ComputeXX` variant you choose enables specific
 `target_feature` flags. These can be used with `#[cfg(...)]` to conditionally compile
@@ -51,12 +51,12 @@ which `NvvmArch::ComputeXX` is used to build the kernel, there is a different an
 These features let you write optimized code paths for specific GPU generations while
 still supporting older ones.
 
-## Specifying Compute Capabilites
+## Specifying compute capabilites
 
 Starting with CUDA 12.9, NVIDIA introduced architecture suffixes that affect
 compatibility.
 
-### Base Architecture (No Suffix)
+### Base architecture (no suffix)
 
 Example: `NvvmArch::Compute70`
 
@@ -79,7 +79,7 @@ CudaBuilder::new("kernels")
 #[cfg(target_feature = "compute_80")]  // ✗ Fail (higher base variant)
 ```
 
-### Family Suffix ('f')
+### Family suffix ('f')
 
 Example: `NvvmArch::Compute101f`
 
@@ -108,7 +108,7 @@ CudaBuilder::new("kernels")
 #[cfg(target_feature = "compute_110")]   // ✗ Fail (higher base variant)
 ```
 
-### Architecture Suffix ('a')
+### Architecture suffix ('a')
 
 Example: `NvvmArch::Compute100a`
 
@@ -142,7 +142,7 @@ Note: While the 'a' variant enables all these features during compilation (allow
 
 For more details on suffixes, see [NVIDIA's blog post on family-specific architecture features](https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/).
 
-### Manual Compilation (Without `cuda_builder`)
+### Manual compilation (without `cuda_builder`)
 
 If you're invoking `rustc` directly instead of using `cuda_builder`, you only need to specify the architecture through LLVM args:
 
@@ -162,11 +162,11 @@ cargo build --target nvptx64-nvidia-cuda
 
 The codegen backend automatically synthesizes target features based on the architecture type as described above.
 
-### Common Patterns for Base Architectures
+### Common patterns for base architectures
 
 These patterns work when using base architectures (no suffix), which enable all lower capabilities:
 
-#### At Least a Capability (Default)
+#### At least a capability (default)
 
 ```rust,no_run
 // Code that requires compute 6.0 or higher
@@ -176,7 +176,7 @@ These patterns work when using base architectures (no suffix), which enable all
 }
 ```
 
-#### Exactly One Capability
+#### Exactly one capability
 
 ```rust,no_run
 // Code that targets exactly compute 6.1 (not 6.2+)
@@ -186,7 +186,7 @@ These patterns work when using base architectures (no suffix), which enable all
 }
 ```
 
-#### Up To a Maximum Capability
+#### Up to a maximum capability
 
 ```rust,no_run
 // Code that works up to compute 6.0 (not 6.1+)
@@ -196,7 +196,7 @@ These patterns work when using base architectures (no suffix), which enable all
 }
 ```
 
-#### Targeting Specific Architecture Ranges
+#### Targeting specific architecture ranges
 
 ```rust,no_run
 // This block compiles when building for architectures >= 6.0 but < 8.0
@@ -206,7 +206,7 @@ These patterns work when using base architectures (no suffix), which enable all
 }
 ```
 
-## Debugging Capability Issues
+## Debugging capability issues
 
 If you encounter errors about missing functions or features:
 
@@ -215,7 +215,7 @@ If you encounter errors about missing functions or features:
 3. Use `nvidia-smi` to check your GPU's compute capability
 4. Add appropriate `#[cfg]` guards or increase the target architecture
 
-## Runtime Behavior
+## Runtime behavior
 
 Again, Rust CUDA **only generates PTX**, not pre-compiled GPU binaries
 ("[fatbinaries](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#fatbinaries)").
diff --git a/guide/src/guide/getting_started.md b/guide/src/guide/getting_started.md
@@ -1,16 +1,16 @@
-# Getting Started
+# Getting started
 
 This section covers how to get started writing GPU crates with `cuda_std` and `cuda_builder`.
 
-## Required Libraries
+## Required libraries
 
 Before you can use the project to write GPU crates, you will need a couple of prerequisites:
 
 - [The CUDA SDK](https://developer.nvidia.com/cuda-downloads), version 11.2 or later (and the appropriate driver - [see CUDA release notes](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html)).
 
   This is only for building GPU crates, to execute built PTX you only need CUDA `9+`.
 
-- LLVM 7.x (7.0 to 7.4), The codegen searches multiple places for LLVM:
+- LLVM 7.x (7.0 to 7.4), The codegen backend searches multiple places for LLVM:
 
   - If `LLVM_CONFIG` is present, it will use that path as `llvm-config`.
   - Or, if `llvm-config` is present as a binary, it will use that, assuming that `llvm-config --version` returns `7.x.x`.
@@ -60,7 +60,7 @@ We changed our crate's crate types to `cdylib` and `rlib`. We specified `cdylib`
 
 ## `lib.rs`
 
-Before we can write any GPU kernels, we must add a few directives to our `lib.rs` which are required by the codegen:
+Before we can write any GPU kernels, we must add a few directives to our `lib.rs` which are required by the codegen backend:
 
 ```rs
 #![cfg_attr(
@@ -76,7 +76,7 @@ This does a couple of things:
 
 - It only applies the attributes if we are compiling the crate for the GPU (target_os = "cuda").
 - It declares the crate to be `no_std` on CUDA targets.
-- It registers a special attribute required by the codegen for things like figuring out
+- It registers a special attribute required by the codegen backend for things like figuring out
   what functions are GPU kernels.
 - It explicitly includes `kernel` macro and `thread`
 
@@ -156,7 +156,7 @@ Internally what this does is it first checks that a couple of things are right i
 - The function is `unsafe`.
 - The function does not return anything.
 
-Then it declares this kernel to the codegen so that the codegen can tell CUDA this is a GPU kernel.
+Then it declares this kernel to the codegen backend so it can tell CUDA this is a GPU kernel.
 It also applies `#[no_mangle]` so the name of the kernel is the same as it is declared in the code.
 
 ## Building the GPU crate
diff --git a/guide/src/guide/kernel_abi.md b/guide/src/guide/kernel_abi.md
@@ -1,7 +1,7 @@
 # Kernel ABI
 
-This section details how parameters are passed to GPU kernels by the Codegen at the current time. 
-In other words, how the codegen expects you to pass different types to GPU kernels from the CPU.
+This section details how parameters are passed to GPU kernels by the codegen backend. In other
+words, how the codegen backend expects you to pass different types to GPU kernels from the CPU.
 
 ⚠️ If you find any bugs in the ABI please report them. ⚠️
 
@@ -15,7 +15,7 @@ other ABI we override purely to avoid footguns.
 
 Functions marked as `#[kernel]` are enforced to be `extern "C"` by the kernel macro, and it is expected
 that __all__ GPU kernels be `extern "C"`, not that you should be declaring any kernels without the `#[kernel]` macro,
-because the codegen/`cuda_std` is allowed to rely on the behavior of `#[kernel]` for correctness.
+because the codegen backend/`cuda_std` is allowed to rely on the behavior of `#[kernel]` for correctness.
 
 ## Structs 
 
@@ -119,7 +119,7 @@ unsafe {
 }
 ```
 
-You may get warnings about slices being an improper C-type, but the warnings are safe to ignore, the codegen guarantees 
+You may get warnings about slices being an improper C-type, but the warnings are safe to ignore, the codegen backend guarantees 
 that slices are passed as pairs of params.
 
 You cannot however pass mutable slices, this is because it would violate aliasing rules, each thread receiving a copy of the mutable
@@ -135,7 +135,7 @@ ZSTs (zero-sized types) are ignored and become nothing in the final PTX.
 Primitive types are passed directly by value, same as structs. They map to the special PTX types `.s8`, `.s16`, `.s32`, `.s64`, `.u8`, `.u16`, `.u32`, `.u64`, `.f32`, and `.f64`.
 With the exception that `u128` and `i128` are passed as byte arrays (but this has no impact on how they are passed from the CPU).
 
-## References And Pointers
+## References And pointers
 
 References and Pointers are both passed as expected, as pointers. It is therefore expected that you pass such parameters using device memory:
 
diff --git a/guide/src/guide/safety.md b/guide/src/guide/safety.md
@@ -90,7 +90,7 @@ Note however, that unified memory can be accessed by multiple GPUs and multiple
 takes care of copying and moving data automatically from GPUs/CPU when a page fault occurs. For this reason
 as well as general ease of use, we suggest that unified memory generally be used over regular device memory.
 
-### Kernel Launches
+### Kernel launches
 
 Kernel Launches are the most unsafe part of CUDA, many things must be checked by the developer to soundly launch a kernel.
 It is fundamentally impossible for us to verify a large portion of the invariants expected by the kernel/CUDA.
diff --git a/guide/src/guide/tips.md b/guide/src/guide/tips.md
@@ -4,7 +4,7 @@ This section contains some tips on what to do and what not to do using the proje
 
 ## GPU kernels
 
-- Generally don't derive `Debug` for structs in GPU crates. The codegen currently does not do much global
+- Generally don't derive `Debug` for structs in GPU crates. The codegen backend currently does not do much global
 DCE (dead code elimination) so debug can really slow down compile times and make the PTX gigantic. This
 will get much better in the future but currently it will cause some undesirable effects.
 
diff --git a/guide/src/nvvm/backends.md b/guide/src/nvvm/backends.md
@@ -1,20 +1,21 @@
-# Custom rustc Backends
+# Custom rustc backends
 
-Before we get into the details of `rustc_codegen_nvvm`, we obviously need to explain what a codegen is!
+Before we get into the details of `rustc_codegen_nvvm`, we obviously need to explain what a codegen
+backend is!
 
-Custom codegens are rustc's answer to "well what if I want Rust to compile to X?". This is a problem
+Custom codegen backends are rustc's answer to "well what if I want Rust to compile to X?". This is a problem
 that comes up in many situations, especially conversations of "well LLVM cannot target this, so we are screwed".
 To solve this problem, rustc decided to incrementally decouple itself from being attached/reliant on LLVM exclusively.
 
-Previously, rustc only had a single codegen, the LLVM codegen. The LLVM codegen translated MIR directly to LLVM IR.
+Previously, rustc only had a single codegen backend, the LLVM codegen backed. This translated MIR directly to LLVM IR.
 This is great if you just want to support LLVM, but LLVM is not perfect, and inevitably you will hit limits to what LLVM
 is able to do. Or, you may just want to stop using LLVM, LLVM is not without problems (it is often slow, clunky to deal with, 
 and does not support a lot of targets). 
 
-Nowadays, rustc is almost fully decoupled from LLVM and it is instead generic over the "codegen" backend used.
+Nowadays, rustc is almost fully decoupled from LLVM and it is instead generic over the codegen backend used.
 rustc instead uses a system of codegen backends that implement traits and then get loaded as dynamically linked libraries.
 This allows Rust to compile to virtually anything with a surprisingly small amount of work. At the time of writing, there are
-five publicly known codegens that exist:
+five publicly known codegen backends that exist:
 - `rustc_codegen_cranelift`
 - `rustc_codegen_llvm`
 - `rustc_codegen_gcc`
@@ -32,9 +33,9 @@ What NVVM IR/libNVVM are has been covered in the [CUDA section](../../cuda/pipel
 
 # `rustc_codegen_ssa`
 
-`rustc_codegen_ssa` is the central crate behind every single codegen and does much of the hard work.
-It abstracts away the MIR lowering logic so that custom codegens only have to implement some
-traits and the SSA codegen does everything else. For example:
+`rustc_codegen_ssa` is the central crate behind every single codegen backend and does much of the
+hard work. It abstracts away the MIR lowering logic so that custom codegen backends only have to
+implement some traits and the SSA codegen does everything else. For example:
 - A trait for getting a type like an integer type.
 - A trait for optimizing a module.
 - A trait for linking everything.
diff --git a/guide/src/nvvm/debugging.md b/guide/src/nvvm/debugging.md
@@ -1,4 +1,4 @@
-# Debugging The Codegen 
+# Debugging the codegen backend
 
 When you try to compile an entire language for a completely different type of hardware, stuff is bound to
 break. In this section we will cover how to debug 🧊, segfaults, and more.
@@ -10,10 +10,10 @@ Segfaults are usually caused in one of two ways:
 - From NVVM when linking (generating PTX). (more common)
 
 The first case can be debugged in two ways:
-- Building the codegen in debug mode and using `RUSTC_LOG="rustc_codegen_nvvm=trace"` (`$env:RUSTC_LOG = "rustc_codegen_nvvm=trace";` if using powershell).
+- Building the codegen backend in debug mode and using `RUSTC_LOG="rustc_codegen_nvvm=trace"` (`$env:RUSTC_LOG = "rustc_codegen_nvvm=trace";` if using powershell).
 Note that this will dump a LOT of output, and when I say a LOT, i am not joking, so please, pipe this to a file.
-This will give you a detailed summary of almost every action the codegen has done, you can examine the final few logs to 
-check what the last action the codegen was doing before segfaulting was. This is usually straightforward because the logs are detailed.
+This will give you a detailed summary of almost every action the codegen backend has done, you can examine the final few logs to 
+check what the last action the codegen backend was doing before segfaulting was. This is usually straightforward because the logs are detailed.
 
 - Building LLVM 7 with debug assertions. This, coupled with logging should give all the info needed to debug a segfault. It should 
 get LLVM to throw an exception whenever something bad happens.
@@ -47,7 +47,7 @@ If that doesn't work, then it might be a bug inside of CUDA itself, but that sho
 is to set up the crate for debug (and see if it still happens in debug). Then you can run your executable under NSight Compute, go to the source tab, and 
 examine the SASS (basically an assembly lower than PTX) to see if ptxas miscompiled it.
 
-If you set up the codegen for debug, it should give you a mapping from Rust code to SASS which should hopefully help to see what exactly is breaking.
+If you set up the codegen backend for debug, it should give you a mapping from Rust code to SASS which should hopefully help to see what exactly is breaking.
 
 Here is an example of the screen you should see:
 
diff --git a/guide/src/nvvm/nvvm.md b/guide/src/nvvm/nvvm.md
diff --git a/guide/src/nvvm/ptxgen.md b/guide/src/nvvm/ptxgen.md
diff --git a/guide/src/nvvm/types.md b/guide/src/nvvm/types.md

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-# GPU Computing`
	`1`	`+# GPU computing`
`2`	`2`
`3`	`3`	`You probably already know what GPU computing is, but if you don't, it is utilizing the extremely parallel`
`4`	`4`	`nature of GPUs for purposes other than rendering. It is widely used in many scientific and consumer fields.`
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-# The CUDA Pipeline`
	`1`	`+# The CUDA pipeline`
`2`	`2`
`3`	`3`	CUDA is traditionally used via CUDA C/C++ files which have a `.cu` extension. These files can be
`4`	`4`	`compiled using NVCC (NVIDIA CUDA Compiler) into an executable.`