Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,11 +228,22 @@
- [Debugging LLVM](./backend/debugging.md)
- [Backend Agnostic Codegen](./backend/backend-agnostic.md)
- [Implicit caller location](./backend/implicit-caller-location.md)
- [Debug Info](./debuginfo/intro.md)
- [Rust Codegen](./debuginfo/rust-codegen.md)
- [LLVM Codegen](./debuginfo/llvm-codegen.md)
- [Debugger Interanls](./debuginfo/debugger-internals.md)
- [LLDB Internals](./debuginfo/lldb-internals.md)
- [GDB Internals](./debuginfo/gdb-internals.md)
- [Debugger Visualizers](./debuginfo/debugger-visualizers.md)
- [LLDB - Python Providers](./debuginfo/lldb-visualizers.md)
- [GDB - Python Providers](./debuginfo/gdb-visualizers.md)
- [CDB - Natvis](./debuginfo/natvis-visualizers.md)
- [Testing](./debuginfo/testing.md)
- [(Lecture Notes) Debugging support in the Rust compiler](./debugging-support-in-rustc.md)
- [Libraries and metadata](./backend/libs-and-metadata.md)
- [Profile-guided optimization](./profile-guided-optimization.md)
- [LLVM source-based code coverage](./llvm-coverage-instrumentation.md)
- [Sanitizers support](./sanitizers.md)
- [Debugging support in the Rust compiler](./debugging-support-in-rustc.md)

---

Expand Down
Binary file added src/debuginfo/CodeView.pdf
Binary file not shown.
14 changes: 14 additions & 0 deletions src/debuginfo/debugger-internals.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Debugger Internals

It is the debugger's job to convert the debug info into an in-memory representation. Both the
interpretation of the debug info and the in-memory representation are arbitrary; anything will do
so long as meaningful information can be reconstructed while the program is running. The pipeline
from raw debug info to usable types can be quite complicated.

Once the information is in a workable format, the debugger front-end then must provide a way to
interpret and display the data, a way for users to interact with it, and an API for extensibility.

Debuggers are vast systems and cannot be covered completely here. This section will provide a brief
overview of the subsystems directly relevant to the Rust debugging experience.

Microsoft's debugging engine is closed source, so it will not be covered here.
62 changes: 62 additions & 0 deletions src/debuginfo/debugger-visualizers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Debugger Visualizers

These are typically the last step before the debugger displays the information, but the results may
be piped through a debug adapter such as an IDE's debugger API.

The term "Visualizer" is a bit of a misnomer. The real goal isn't just to prettify the output, but
to provide an interface for the user to interact with that is as useful as possible. In many cases
this means reconstructing the original type as closely as possible to its Rust representation, but
not always.

The visualizer interface allows generating "synthetic children" - fields that don't exist in the
debug info, but can be derived from invariants about the language and the type itself. A simple
example is allowing one to interact with the elements of a `Vec<T>` instead of just it's `*mut u8`
heap pointer, length, and capacity.

## Performance

Before tackling the visualizers themselves, it's important to note that these are part of a
performance-sensitive system. Please excuse the break in formality, but: if I have to spend
significant time debugging, I'm annoyed. If I have to *wait on my debugger*, I'm pissed.

Every millisecond spent in these visualizers is a millisecond longer for the user to see output.
This can be especially painful for large stackframes that contain many/large container types.
Debugger GUI's such as VSCode will request the whole stack frame at once, and this can result in
delays of tens of seconds (or even minutes) before being able to interact with any variables in the
frame.

There is a tendancy to balk at the idea of optimizing Python code, but it really can have a
substantial impact. Remember, there is no compiler to help keep the code fast. Even simple
transformations are not done for you. It can be difficult to find Python performance tips through
all the noise of people suggesting you don't bother optimizing Python, so here are some things to
keep in mind that are relevant to these scripts:

* Everything allocates, even `int`
* Use tuples when possible. `list` is effectively `Vec<Box<[Any]>>`, whereas tuples are equivalent
to `Box<[Any]>`. They have one less layer of indirection, don't carry extra capacity and can't
grow/shrink which can be advantageous in many cases. An additional benefit is that Python caches and
recycles the underlying allocations of all tuples up to size 20.
* Regexes are slow and should be avoided when simple string manipulation will do
* Strings are immutable, thus many string operations implictly copy the contents.
* When concatenating large lists of strings, `"".join(iterable_of_strings)` is typically the fastest
way to do it.
* f-strings are generally the fastest way to do small, simple string transformations such as
surrounding a string with parentheses.
* The act of calling a function is somewhat slow (even if the function is completely empty). If the
code section is very hot, consider inlining the function manually.
* Local variable access is significantly faster than global and built-in function access
* Member/method access via the `.` operator is also slow, consider reassigning deeply nested values
to local variables to avoid this cost (e.g. `h = a.b.c.d.e.f.g.h`).
* Accessing inherited methods and fields is about 2x slower than base-class methods and fields.
Avoid inheritance whenever possible.
* Use [`__slots__`](https://wiki.python.org/moin/UsingSlots) wherever possible. `__slots__` is a way
to indicate to Python that your class's fields won't change and speeds up field access by a
noticable amount. This does require you to name your fields in advance and initialize them in
`__init__`, but it's a small price to pay for the benefits.
* Match statements/if..elif..else are not optimized in any way. The conditions are checked in order,
1 by 1. If possible, use an alternative such as dictionary dispatch or a table of values
* Compute lazily when possible
* List comprehensions are typically faster than loops, generator comprehensions are a bit slower
than list comprehensions, but use less memory. You can think of comprehensions as equivalent to
Rust's `iter.map()`. List comprehensions effectively call `collect::<Vec<_>>` at the end, whereas
generator comprehensions do not.
4 changes: 4 additions & 0 deletions src/debuginfo/gdb-internals.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# (WIP) GDB Internals

GDB's Rust support lives at `gdb/rust-lang.h` and `gdb/rust-lang.c`. The expression parsing support
can be found in `gdb/rust-exp.h` and `gdb/rust-parse.c`
9 changes: 9 additions & 0 deletions src/debuginfo/gdb-visualizers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# (WIP) GDB - Python Providers

Below are links to relevant parts of the GDB documentation

* [Overview on writing a pretty printer](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Writing-a-Pretty_002dPrinter.html#Writing-a-Pretty_002dPrinter)
* [Pretty Printer API](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Pretty-Printing-API.html#Pretty-Printing-API) (equivalent to LLDB's `SyntheticProvider`)
* [Value API](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Values-From-Inferior.html#Values-From-Inferior) (equivalent to LLDB's `SBValue`)
* [Type API](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Types-In-Python.html#Types-In-Python) (equivalent to LLDB's `SBType`)
* [Type Printing API](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Type-Printing-API.html#Type-Printing-API) (equivalent to LLDB's `SyntheticProvider.get_type_name`)
114 changes: 114 additions & 0 deletions src/debuginfo/intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# Debug Info

Debug info is a collection of information generated by the compiler that allows debuggers to
correctly interpret the state of a program while it is running. That includes things like mapping
instruction addresses to lines of code in the source file, and type layout information so that
bytes in memory can be read and displayed in a meaningful way.

Debug info can be a slightly overloaded term, covering all the layers between Rust MIR, and the
end-user seeing the output of their debugger onscreen. In brief, the stack from beginning to end is
as follows:

1. Rustc inspects the MIR and communicates the relevant source, symbol, and type information to LLVM
2. LLVM translates this information into a target-specific debug info format during compilation
3. A debugger reads and interprets the debug info, mapping source-lines and allowing the debugee's
variables in memory to be located and read with the correct layout
4. Built-in debugger formatting and styling is applied to variables
5. User-defined scripts are run, formatting and styling the variables further
6. The debugger frontend displays the variable to the user, possibly through the means of additional
API layers (e.g. VSCode extension by way of the
[Debug Adapter Protocol](https://microsoft.github.io/debug-adapter-protocol/))


> NOTE: This subsection of the dev guide is perhaps more detailed than necessary. It aims to collect
> a large amount of scattered information into one place and equip the reader with as firm a grasp of
> the entire debug stack as possible.
>
> If you are only interested in working on the visualizer
> scripts, the information in the [debugger-visualizers](./debugger-visualizers.md) and
> [testing](./testing.md) will suffice. If you need to make changes to Rust's debug node generation,
> please see [rust-codegen](./rust-codegen.md). All other sections are supplementary, but can be
> vital to understanding some of the compromises the visualizers or codegen need to make. It can
> also be valuable to know when a problem might be better solved in LLVM or the debugger itself.

# DWARF

The is the primary debug info format for `*-gnu` targets. It is typically bundled in with the
binary, but it [can be generated as a separate file](https://gcc.gnu.org/wiki/DebugFission). The
DWARF standard is available [here](https://dwarfstd.org/).

> NOTE: To inspect DWARF debug info, [gimli](https://crates.io/crates/gimli) can be used
> programatically. If you prefer a GUI, the author recommends [DWEX](https://github.com/sevaa/dwex)

# PDB/CodeView

The primary debug info format for `*-msvc` targets. PDB is a proprietary container format created by
Microsoft that, unfortunately,
[has multiple meanings](https://docs.rs/ms-pdb/0.1.10/ms_pdb/taster/enum.Flavor.html).
We are concerned with ordinary PDB files, as Portable PDB is used mainly for .Net applications. PDB
files are separate from the compiled binary and use the `.pdb` extension.

PDB files contain CodeView objects, equivalent to DWARF's tags. CodeView, the debugger that
consumed CodeView objects, was originally released in 1985. Its original intent was for C debugging,
and was later extended to support Visual C++. There are still minor alterations to the format to
support modern architectures and languages, but many of these changes are undocumented and/or
sparsely used.

It is important to keep this context in mind when working with CodeView objects. Due to its origins,
the "feature-set" of these objects is very limited, and focused around the core features of C. It
does not have many of the convenience or features of modern DWARF standards. A fair number of
workarounds exist within the debug info stack to compensate for CodeView's shortcomings.

Due to its proprietary nature, it is very difficult to find information about PDB and CodeView. Many
of the sources were made at vastly different times and contain incomplete or somewhat contradictory
information. As such this page will aim to collect as many sources as possible.

* [CodeView 1.0 specification](./CodeView.pdf)
* LLVM
* [CodeView Overview](https://llvm.org/docs/SourceLevelDebugging.html#codeview-debug-info-format)
* [PDB Overview and technical details](https://llvm.org/docs/PDB/index.html)
* Microsoft
* [microsoft-pdb](https://github.com/microsoft/microsoft-pdb) - A C/C++ implementation of a PDB
reader. The implementation does not contain the full PDB or CodeView specification, but does
contain enough information for other PDB consumers to be written. At time of writing (Nov 2025),
this repo has been archived for several years.
* [pdb-rs](https://github.com/microsoft/pdb-rs/) - A Rust-based PDB reader and writer based on
other publicly-available information. Does not guarantee stability or spec compliance. Also
contains `pdbtool`, which can dump PDB files (`cargo install pdbtool`)
* [Debug Interface Access SDK](https://learn.microsoft.com/en-us/visualstudio/debugger/debug-interface-access/getting-started-debug-interface-access-sdk).
While it does not document the PDB format directly, details can be gleaned from the interface
itself.

# Debuggers

Rust supports 3 major debuggers: GDB, LLDB, and CDB. Each has its own set of requirements,
limitations, and quirks. This unfortunately creates a large surface area to account for.

> NOTE: CDB is a proprietary debugger created by Microsoft. The underlying engine also powers
>WinDbg, KD, the Microsoft C/C++ extension for VSCode, and part of the Visual Studio Debugger. In
>these docs, it will be referred to as CDB for consistency

While GDB and LLDB do offer facilities to natively support Rust's value layout, this isn't
completely necessary. Rust currently outputs debug info very similar to that of C++, allowing
debuggers without Rust support to work with a slightly degraded experience. More detail will be
included in later sections, but here is a quick reference for the capabilities of each debugger:

| Debugger | Debug Info Format | Native Rust support | Expression Style | Visualizer Scripts |
| --- | --- | --- | --- | --- |
| GDB | DWARF | Full | Rust | Python |
| LLDB | DWARF and PDB | Partial | C/C++ | Python |
| CDB | PDB | None | C/C++ | Natvis |

> IMPORTANT: CDB can be assumed to run only on Windows. No assumptions can be made about the OS
>running GDB or LLDB.

## Unsupported

Below, are several unsupported debuggers that are of particular note due to their potential impact
in the future.

* [Bugstalker](https://github.com/godzie44/BugStalker) is an x86-64 Linux debugger written in Rust,
specifically to debug Rust programs. While promising, it is still in early development.
* [RAD Debugger](https://github.com/EpicGamesExt/raddebugger) is a Windows-only GUI debugger. It has
a custom debug info format that PDB is translated into. The project also includes a linker that can
generate their new debug info format during the linking phase.
Loading
Loading