Use sentence case for headings.

nnethercote · nnethercote · commit dab15bd22e0e · 2025-11-19T17:06:13.000+11:00
Currently headings are a mix of sentence case ("The quick brown fox")
and title case ("The Quick Brown Fox"). Title case is extremely formal,
so sentence case feels more natural here.
diff --git a/guide/src/cuda/gpu_computing.md b/guide/src/cuda/gpu_computing.md
@@ -1,4 +1,4 @@
-# GPU Computing
+# GPU computing
 
 You probably already know what GPU computing is, but if you don't, it is utilizing the extremely parallel
 nature of GPUs for purposes other than rendering. It is widely used in many scientific and consumer fields.
diff --git a/guide/src/cuda/pipeline.md b/guide/src/cuda/pipeline.md
@@ -1,4 +1,4 @@
-# The CUDA Pipeline
+# The CUDA pipeline
 
 CUDA is traditionally used via CUDA C/C++ files which have a `.cu` extension. These files can be
 compiled using NVCC (NVIDIA CUDA Compiler) into an executable.
diff --git a/guide/src/faq.md b/guide/src/faq.md
@@ -1,4 +1,4 @@
-# Frequently Asked Questions 
+# Frequently asked questions 
 
 This page will cover a lot of the questions people often have when they encounter this project,
 so they are addressed all at once.
diff --git a/guide/src/features.md b/guide/src/features.md
@@ -1,4 +1,4 @@
-# Supported Features 
+# Supported features 
 
 This page is used for tracking Cargo/Rust and CUDA features that are currently supported 
 or planned to be supported in the future. As well as tracking some information about how they could 
@@ -14,7 +14,7 @@ around to adding it yet.
 | ✔️ | Fully Supported |
 | 🟨 | Partially Supported |
 
-# Rust Features
+# Rust features
 
 | Feature Name | Support Level | Notes |
 | ------------ | ------------- | ----- |
@@ -40,7 +40,7 @@ around to adding it yet.
 | Float Ops | ✔️ | Maps to libdevice intrinsics, calls to libm are not intercepted though, which we may want to do in the future |
 | Atomics | ❌ | 
 
-# CUDA Libraries
+# CUDA libraries
 
 | Library Name | Support Level | Notes |
 | ------------ | ------------- | ----- |
@@ -56,7 +56,7 @@ around to adding it yet.
 | cuTENSOR | ❌ |
 | OptiX | 🟨 | CPU OptiX is mostly complete, GPU OptiX is still heavily in-progress because it needs support from the codegen backend | 
 
-# GPU-side Features
+# GPU-side features
 
 Note: Most of these categories are used __very__ rarely in CUDA code, therefore
 do not be alarmed that it seems like many things are not supported. We just focus
diff --git a/guide/src/guide/compute_capabilities.md b/guide/src/guide/compute_capabilities.md
@@ -1,9 +1,9 @@
-# Compute Capability Gating
+# Compute capability gating
 
 This section covers how to write code that adapts to different CUDA compute capabilities
 using conditional compilation.
 
-## What are Compute Capabilities?
+## What are compute capabilities?
 
 CUDA GPUs have different "compute capabilities" that determine which features they
 support. Each capability is identified by a version number like `3.5`, `5.0`, `6.1`,
@@ -17,7 +17,7 @@ For example:
 
 For comprehensive details, see [NVIDIA's CUDA documentation on GPU architectures](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#gpu-compilation).
 
-## Virtual vs Real Architectures
+## Virtual vs real Architectures
 
 In CUDA terminology:
 
@@ -28,7 +28,7 @@ In CUDA terminology:
 Rust CUDA works exclusively with virtual architectures since it only generates PTX. The
 `NvvmArch::ComputeXX` enum values correspond to CUDA's virtual architectures.
 
-## Using Target Features
+## Using target features
 
 When building your kernel, the `NvvmArch::ComputeXX` variant you choose enables specific
 `target_feature` flags. These can be used with `#[cfg(...)]` to conditionally compile
@@ -51,12 +51,12 @@ which `NvvmArch::ComputeXX` is used to build the kernel, there is a different an
 These features let you write optimized code paths for specific GPU generations while
 still supporting older ones.
 
-## Specifying Compute Capabilites
+## Specifying compute capabilites
 
 Starting with CUDA 12.9, NVIDIA introduced architecture suffixes that affect
 compatibility.
 
-### Base Architecture (No Suffix)
+### Base architecture (no suffix)
 
 Example: `NvvmArch::Compute70`
 
@@ -79,7 +79,7 @@ CudaBuilder::new("kernels")
 #[cfg(target_feature = "compute_80")]  // ✗ Fail (newer compute capability)
 ```
 
-### Family Suffix ('f')
+### Family suffix ('f')
 
 Example: `NvvmArch::Compute101f`
 
@@ -108,7 +108,7 @@ CudaBuilder::new("kernels")
 #[cfg(target_feature = "compute_110")]   // ✗ Fail (different major)
 ```
 
-### Architecture Suffix ('a')
+### Architecture suffix ('a')
 
 Example: `NvvmArch::Compute100a`
 
@@ -142,7 +142,7 @@ Note: While the 'a' variant enables all these features during compilation (allow
 
 For more details on suffixes, see [NVIDIA's blog post on family-specific architecture features](https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/).
 
-### Manual Compilation (Without cuda_builder)
+### Manual compilation (without cuda_builder)
 
 If you're invoking rustc directly instead of using cuda_builder, you only need to specify the architecture through LLVM args:
 
@@ -162,11 +162,11 @@ cargo build --target nvptx64-nvidia-cuda
 
 The codegen backend automatically synthesizes target features based on the architecture type as described above.
 
-### Common Patterns for Base Architectures
+### Common patterns for base architectures
 
 These patterns work when using base architectures (no suffix), which enable all lower capabilities:
 
-#### At Least a Capability (Default)
+#### At least a capability (default)
 
 ```rust,no_run
 // Code that requires compute 6.0 or higher
@@ -176,7 +176,7 @@ These patterns work when using base architectures (no suffix), which enable all
 }
 ```
 
-#### Exactly One Capability
+#### Exactly one capability
 
 ```rust,no_run
 // Code that targets exactly compute 6.1 (not 6.2+)
@@ -186,7 +186,7 @@ These patterns work when using base architectures (no suffix), which enable all
 }
 ```
 
-#### Up To a Maximum Capability
+#### Up to a maximum capability
 
 ```rust,no_run
 // Code that works up to compute 6.0 (not 6.1+)
@@ -196,7 +196,7 @@ These patterns work when using base architectures (no suffix), which enable all
 }
 ```
 
-#### Targeting Specific Architecture Ranges
+#### Targeting specific architecture ranges
 
 ```rust,no_run
 // This block compiles when building for architectures >= 6.0 but < 8.0
@@ -206,7 +206,7 @@ These patterns work when using base architectures (no suffix), which enable all
 }
 ```
 
-## Debugging Capability Issues
+## Debugging capability issues
 
 If you encounter errors about missing functions or features:
 
@@ -215,7 +215,7 @@ If you encounter errors about missing functions or features:
 3. Use `nvidia-smi` to check your GPU's compute capability
 4. Add appropriate `#[cfg]` guards or increase the target architecture
 
-## Runtime Behavior
+## Runtime behavior
 
 Again, Rust CUDA **only generates PTX**, not pre-compiled GPU binaries
 ("[fatbinaries](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#fatbinaries)").
diff --git a/guide/src/guide/getting_started.md b/guide/src/guide/getting_started.md
@@ -1,8 +1,8 @@
-# Getting Started
+# Getting started
 
 This section covers how to get started writing GPU crates with cuda_std and cuda_builder.
 
-## Required Libraries
+## Required libraries
 
 Before you can use the project to write GPU crates, you will need a couple of prerequisites:
 
@@ -60,7 +60,7 @@ We changed our crate's crate types to `cdylib` and `rlib`. We specified `cdylib`
 
 ## `lib.rs`
 
-Before we can write any GPU kernels, we must add a few directives to our `lib.rs` which are required by the codegen:
+Before we can write any GPU kernels, we must add a few directives to our `lib.rs` which are required by the codegen backend:
 
 ```rs
 #![cfg_attr(
@@ -76,7 +76,7 @@ This does a couple of things:
 
 - It only applies the attributes if we are compiling the crate for the GPU (target_os = "cuda").
 - It declares the crate to be `no_std` on CUDA targets.
-- It registers a special attribute required by the codegen for things like figuring out
+- It registers a special attribute required by the codegen backend for things like figuring out
   what functions are GPU kernels.
 - It explicitly includes `kernel` macro and `thread`
 
@@ -156,7 +156,7 @@ Internally what this does is it first checks that a couple of things are right i
 - The function is `unsafe`.
 - The function does not return anything.
 
-Then it declares this kernel to the codegen so that the codegen can tell CUDA this is a GPU kernel.
+Then it declares this kernel to the codegen backend so it can tell CUDA this is a GPU kernel.
 It also applies `#[no_mangle]` so the name of the kernel is the same as it is declared in the code.
 
 ## Building the GPU crate
diff --git a/guide/src/guide/kernel_abi.md b/guide/src/guide/kernel_abi.md
@@ -1,7 +1,7 @@
 # Kernel ABI
 
-This section details how parameters are passed to GPU kernels by the Codegen at the current time. 
-In other words, how the codegen expects you to pass different types to GPU kernels from the CPU.
+This section details how parameters are passed to GPU kernels by the codegen backend. In other
+words, how the codegen backend expects you to pass different types to GPU kernels from the CPU.
 
 ⚠️ If you find any bugs in the ABI please report them. ⚠️
 
@@ -15,7 +15,7 @@ other ABI we override purely to avoid footguns.
 
 Functions marked as `#[kernel]` are enforced to be `extern "C"` by the kernel macro, and it is expected
 that __all__ GPU kernels be `extern "C"`, not that you should be declaring any kernels without the `#[kernel]` macro,
-because the codegen/cuda_std is allowed to rely on the behavior of `#[kernel]` for correctness.
+because the codegen backend/cuda_std is allowed to rely on the behavior of `#[kernel]` for correctness.
 
 ## Structs 
 
@@ -119,7 +119,7 @@ unsafe {
 }
 ```
 
-You may get warnings about slices being an improper C-type, but the warnings are safe to ignore, the codegen guarantees 
+You may get warnings about slices being an improper C-type, but the warnings are safe to ignore, the codegen backend guarantees 
 that slices are passed as pairs of params.
 
 You cannot however pass mutable slices, this is because it would violate aliasing rules, each thread receiving a copy of the mutable
@@ -135,7 +135,7 @@ ZSTs (zero-sized types) are ignored and become nothing in the final PTX.
 Primitive types are passed directly by value, same as structs. They map to the special PTX types `.s8`, `.s16`, `.s32`, `.s64`, `.u8`, `.u16`, `.u32`, `.u64`, `.f32`, and `.f64`.
 With the exception that `u128` and `i128` are passed as byte arrays (but this has no impact on how they are passed from the CPU).
 
-## References And Pointers
+## References And pointers
 
 References and Pointers are both passed as expected, as pointers. It is therefore expected that you pass such parameters using device memory:
 
diff --git a/guide/src/guide/safety.md b/guide/src/guide/safety.md
@@ -90,7 +90,7 @@ Note however, that unified memory can be accessed by multiple GPUs and multiple
 takes care of copying and moving data automatically from GPUs/CPU when a page fault occurs. For this reason
 as well as general ease of use, we suggest that unified memory generally be used over regular device memory.
 
-### Kernel Launches
+### Kernel launches
 
 Kernel Launches are the most unsafe part of CUDA, many things must be checked by the developer to soundly launch a kernel.
 It is fundamentally impossible for us to verify a large portion of the invariants expected by the kernel/CUDA.
diff --git a/guide/src/guide/tips.md b/guide/src/guide/tips.md
@@ -4,7 +4,7 @@ This section contains some tips on what to do and what not to do using the proje
 
 ## GPU kernels
 
-- Generally don't derive `Debug` for structs in GPU crates. The codegen currently does not do much global
+- Generally don't derive `Debug` for structs in GPU crates. The codegen backend currently does not do much global
 DCE (dead code elimination) so debug can really slow down compile times and make the PTX gigantic. This
 will get much better in the future but currently it will cause some undesirable effects.
 
diff --git a/guide/src/nvvm/backends.md b/guide/src/nvvm/backends.md
@@ -1,25 +1,25 @@
-# Custom rustc Backends
+# Custom rustc backends
 
-Before we get into the details of rustc_codegen_nvvm, we obviously need to explain what a codegen is!
+Before we get into the details of rustc_codegen_nvvm, we obviously need to explain what a codegen backend is!
 
-Custom codegens are rustc's answer to "well what if I want Rust to compile to X?". This is a problem
+Custom codegen backends are rustc's answer to "well what if I want Rust to compile to X?". This is a problem
 that comes up in many situations, especially conversations of "well LLVM cannot target this, so we are screwed".
 To solve this problem, rustc decided to incrementally decouple itself from being attached/reliant on LLVM exclusively.
 
-Previously, rustc only had a single codegen, the LLVM codegen. The LLVM codegen translated MIR directly to LLVM IR.
+Previously, rustc only had a single codegen backend, the LLVM codegen backed. This translated MIR directly to LLVM IR.
 This is great if you just want to support LLVM, but LLVM is not perfect, and inevitably you will hit limits to what LLVM
 is able to do. Or, you may just want to stop using LLVM, LLVM is not without problems (it is often slow, clunky to deal with, 
 and does not support a lot of targets). 
 
-Nowadays, rustc is almost fully decoupled from LLVM and it is instead generic over the "codegen" backend used.
+Nowadays, rustc is almost fully decoupled from LLVM and it is instead generic over the codegen backend used.
 rustc instead uses a system of codegen backends that implement traits and then get loaded as dynamically linked libraries.
 This allows Rust to compile to virtually anything with a surprisingly small amount of work. At the time of writing, there are
-five publicly known codegens that exist:
+five publicly known codegen backends that exist:
 - rustc_codegen_cranelift
 - rustc_codegen_llvm
 - rustc_codegen_gcc
 - rustc_codegen_spirv
-- rustc_codegen_nvvm, obviously the best codegen ;)
+- rustc_codegen_nvvm, obviously the best backend ;)
 
 rustc_codegen_cranelift targets the cranelift backend, which is a codegen backend written in Rust that is faster than LLVM but does not have many optimizations
 compared to LLVM. rustc_codegen_llvm is obvious, it is the backend almost everybody uses which targets LLVM. rustc_codegen_gcc targets GCC (GNU Compiler Collection)
@@ -32,9 +32,9 @@ What NVVM IR/libNVVM are has been covered in the [CUDA section](../../cuda/pipel
 
 # rustc_codegen_ssa
 
-rustc_codegen_ssa is the central crate behind every single codegen and does much of the hard work.
-It abstracts away the MIR lowering logic so that custom codegens only have to implement some
-traits and the SSA codegen does everything else. For example:
+rustc_codegen_ssa is the central crate behind every single codegen backend and does much of the
+hard work. It abstracts away the MIR lowering logic so that custom codegen backends only have to
+implement some traits and the SSA codegen does everything else. For example:
 - A trait for getting a type like an integer type.
 - A trait for optimizing a module.
 - A trait for linking everything.
diff --git a/guide/src/nvvm/debugging.md b/guide/src/nvvm/debugging.md
@@ -1,4 +1,4 @@
-# Debugging The Codegen 
+# Debugging the codegen backend
 
 When you try to compile an entire language for a completely different type of hardware, stuff is bound to
 break. In this section we will cover how to debug 🧊, segfaults, and more.
@@ -10,10 +10,10 @@ Segfaults are usually caused in one of two ways:
 - From NVVM when linking (generating PTX). (more common)
 
 The first case can be debugged in two ways:
-- Building the codegen in debug mode and using `RUSTC_LOG="rustc_codegen_nvvm=trace"` (`$env:RUSTC_LOG = "rustc_codegen_nvvm=trace";` if using powershell).
+- Building the codegen backend in debug mode and using `RUSTC_LOG="rustc_codegen_nvvm=trace"` (`$env:RUSTC_LOG = "rustc_codegen_nvvm=trace";` if using powershell).
 Note that this will dump a LOT of output, and when I say a LOT, i am not joking, so please, pipe this to a file.
-This will give you a detailed summary of almost every action the codegen has done, you can examine the final few logs to 
-check what the last action the codegen was doing before segfaulting was. This is usually straightforward because the logs are detailed.
+This will give you a detailed summary of almost every action the codegen backend has done, you can examine the final few logs to 
+check what the last action the codegen backend was doing before segfaulting was. This is usually straightforward because the logs are detailed.
 
 - Building LLVM 7 with debug assertions. This, coupled with logging should give all the info needed to debug a segfault. It should 
 get LLVM to throw an exception whenever something bad happens.
@@ -47,7 +47,7 @@ If that doesn't work, then it might be a bug inside of CUDA itself, but that sho
 is to set up the crate for debug (and see if it still happens in debug). Then you can run your executable under NSight Compute, go to the source tab, and 
 examine the SASS (basically an assembly lower than PTX) to see if ptxas miscompiled it.
 
-If you set up the codegen for debug, it should give you a mapping from Rust code to SASS which should hopefully help to see what exactly is breaking.
+If you set up the codegen backend for debug, it should give you a mapping from Rust code to SASS which should hopefully help to see what exactly is breaking.
 
 Here is an example of the screen you should see:
 
diff --git a/guide/src/nvvm/nvvm.md b/guide/src/nvvm/nvvm.md
@@ -12,7 +12,7 @@ Source code -> Typechecking -> MIR -> SSA Codegen -> LLVM IR (NVVM IR) -> PTX ->
 ```
 
 Before we do anything, rustc does its normal job, it typechecks, converts everything to MIR, etc. Then, 
-rustc loads our codegen shared lib and invokes it to codegen the MIR. It creates an instance of
+rustc loads our codegen backend shared lib and invokes it to codegen the MIR. It creates an instance of
 `NvvmCodegenBackend` and it invokes `codegen_crate`. You could do anything inside `codegen_crate` but 
 we just defer back to rustc_codegen_ssa and tell it to do the job for us:
 
@@ -34,9 +34,9 @@ fn codegen_crate<'tcx>(
 ```
 
 After that, the codegen logic is kind of abstracted away from us, which is a good thing!
-We just need to provide the SSA codegen whatever it needs to do its thing. This is 
+We just need to provide the SSA codegen crate whatever it needs to do its thing. This is 
 done in the form of traits, lots and lots and lots of traits, more traits than you've ever seen, traits
-your subconscious has warned you of in nightmares, anyways. Because talking about how the SSA codegen
+your subconscious has warned you of in nightmares, anyways. Because talking about how the SSA codegen crate
 works is kind of useless, we will instead talk first about general concepts and terminology, then 
 dive into each trait. 
 
@@ -57,15 +57,15 @@ But first, let's talk about the end of the codegen, it is pretty simple, we do a
 
 We will cover the libNVVM steps in more detail later on.
 
-# Codegen Units (CGUs)
+# Codegen units (CGUs)
 
 Ah codegen units, the thing everyone just tells you to set to `1` in Cargo.toml, but what are they?
 Well, to put it simply, codegen units are rustc splitting up a crate into different modules to then 
 run LLVM in parallel over. For example, rustc can run LLVM over two different modules in parallel and 
 save time.
 
 This gets a little bit more complex with generics, because MIR is not monomorphized and monomorphized MIR is not a thing,
-the codegen monomorphizes instances on the fly. Therefore rustc needs to put any generic functions that one CGU relies on
+the compiler monomorphizes instances on the fly. Therefore rustc needs to put any generic functions that one CGU relies on
 inside of the same CGU because it needs to monomorphize them.
 
 # Rlibs
diff --git a/guide/src/nvvm/ptxgen.md b/guide/src/nvvm/ptxgen.md
diff --git a/guide/src/nvvm/types.md b/guide/src/nvvm/types.md

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-# GPU Computing`
	`1`	`+# GPU computing`
`2`	`2`
`3`	`3`	`You probably already know what GPU computing is, but if you don't, it is utilizing the extremely parallel`
`4`	`4`	`nature of GPUs for purposes other than rendering. It is widely used in many scientific and consumer fields.`
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-# The CUDA Pipeline`
	`1`	`+# The CUDA pipeline`
`2`	`2`
`3`	`3`	CUDA is traditionally used via CUDA C/C++ files which have a `.cu` extension. These files can be
`4`	`4`	`compiled using NVCC (NVIDIA CUDA Compiler) into an executable.`
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-# Frequently Asked Questions`
	`1`	`+# Frequently asked questions`
`2`	`2`
`3`	`3`	`This page will cover a lot of the questions people often have when they encounter this project,`
`4`	`4`	`so they are addressed all at once.`