Skip to content

Commit 0d2136c

Browse files
committed
add debuginfo subsection
1 parent beaafed commit 0d2136c

13 files changed

+1288
-1
lines changed

src/SUMMARY.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -228,11 +228,22 @@
228228
- [Debugging LLVM](./backend/debugging.md)
229229
- [Backend Agnostic Codegen](./backend/backend-agnostic.md)
230230
- [Implicit caller location](./backend/implicit-caller-location.md)
231+
- [Debug Info](./debuginfo/intro.md)
232+
- [Rust Codegen](./debuginfo/rust-codegen.md)
233+
- [LLVM Codegen](./debuginfo/llvm-codegen.md)
234+
- [Debugger Interanls](./debuginfo/debugger-internals.md)
235+
- [LLDB Internals](./debuginfo/lldb-internals.md)
236+
- [GDB Internals](./debuginfo/gdb-internals.md)
237+
- [Debugger Visualizers](./debuginfo/debugger-visualizers.md)
238+
- [LLDB - Python Providers](./debuginfo/lldb-visualizers.md)
239+
- [GDB - Python Providers](./debuginfo/gdb-visualizers.md)
240+
- [CDB - Natvis](./debuginfo/natvis-visualizers.md)
241+
- [Testing](./debuginfo/testing.md)
242+
- [(Lecture Notes) Debugging support in the Rust compiler](./debugging-support-in-rustc.md)
231243
- [Libraries and metadata](./backend/libs-and-metadata.md)
232244
- [Profile-guided optimization](./profile-guided-optimization.md)
233245
- [LLVM source-based code coverage](./llvm-coverage-instrumentation.md)
234246
- [Sanitizers support](./sanitizers.md)
235-
- [Debugging support in the Rust compiler](./debugging-support-in-rustc.md)
236247

237248
---
238249

src/debuginfo/CodeView.pdf

209 KB
Binary file not shown.
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Debugger Internals
2+
3+
It is the debugger's job to convert the debug info into an in-memory representation. Both the
4+
interpretation of the debug info and the in-memory representation are arbitrary; anything will do
5+
so long as meaningful information can be reconstructed while the program is running. The pipeline
6+
from raw debug info to usable types can be quite complicated.
7+
8+
Once the information is in a workable format, the debugger front-end then must provide a way to
9+
interpret and display the data, a way for users to interact with it, and an API for extensibility.
10+
11+
Debuggers are vast systems and cannot be covered completely here. This section will provide a brief
12+
overview of the subsystems directly relevant to the Rust debugging experience.
13+
14+
Microsoft's debugging engine is closed source, so it will not be covered here.
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Debugger Visualizers
2+
3+
These are typically the last step before the debugger displays the information, but the results may
4+
be piped through a debug adapter such as an IDE's debugger API.
5+
6+
The term "Visualizer" is a bit of a misnomer. The real goal isn't just to prettify the output, but
7+
to provide an interface for the user to interact with that is as useful as possible. In many cases
8+
this means reconstructing the original type as closely as possible to its Rust representation, but
9+
not always.
10+
11+
The visualizer interface allows generating "synthetic children" - fields that don't exist in the
12+
debug info, but can be derived from invariants about the language and the type itself. A simple
13+
example is allowing one to interact with the elements of a `Vec<T>` instead of just it's `*mut u8`
14+
heap pointer, length, and capacity.
15+
16+
## Performance
17+
18+
Before tackling the visualizers themselves, it's important to note that these are part of a
19+
performance-sensitive system. Please excuse the break in formality, but: if I have to spend
20+
significant time debugging, I'm annoyed. If I have to *wait on my debugger*, I'm pissed.
21+
22+
Every millisecond spent in these visualizers is a millisecond longer for the user to see output.
23+
This can be especially painful for large stackframes that contain many/large container types.
24+
Debugger GUI's such as VSCode will request the whole stack frame at once, and this can result in
25+
delays of tens of seconds (or even minutes) before being able to interact with any variables in the
26+
frame.
27+
28+
There is a tendancy to balk at the idea of optimizing Python code, but it really can have a
29+
substantial impact. Remember, there is no compiler to help keep the code fast. Even simple
30+
transformations are not done for you. It can be difficult to find Python performance tips through
31+
all the noise of people suggesting you don't bother optimizing Python, so here are some things to
32+
keep in mind that are relevant to these scripts:
33+
34+
* Everything allocates, even `int`
35+
* Use tuples when possible. `list` is effectively `Vec<Box<[Any]>>`, whereas tuples are equivalent
36+
to `Box<[Any]>`. They have one less layer of indirection, don't carry extra capacity and can't
37+
grow/shrink which can be advantageous in many cases. An additional benefit is that Python caches and
38+
recycles the underlying allocations of all tuples up to size 20.
39+
* Regexes are slow and should be avoided when simple string manipulation will do
40+
* Strings are immutable, thus many string operations implictly copy the contents.
41+
* When concatenating large lists of strings, `"".join(iterable_of_strings)` is typically the fastest
42+
way to do it.
43+
* f-strings are generally the fastest way to do small, simple string transformations such as
44+
surrounding a string with parentheses.
45+
* The act of calling a function is somewhat slow (even if the function is completely empty). If the
46+
code section is very hot, consider inlining the function manually.
47+
* Local variable access is significantly faster than global and built-in function access
48+
* Member/method access via the `.` operator is also slow, consider reassigning deeply nested values
49+
to local variables to avoid this cost (e.g. `h = a.b.c.d.e.f.g.h`).
50+
* Accessing inherited methods and fields is about 2x slower than base-class methods and fields.
51+
Avoid inheritance whenever possible.
52+
* Use [`__slots__`](https://wiki.python.org/moin/UsingSlots) wherever possible. `__slots__` is a way
53+
to indicate to Python that your class's fields won't change and speeds up field access by a
54+
noticable amount. This does require you to name your fields in advance and initialize them in
55+
`__init__`, but it's a small price to pay for the benefits.
56+
* Match statements/if..elif..else are not optimized in any way. The conditions are checked in order,
57+
1 by 1. If possible, use an alternative such as dictionary dispatch or a table of values
58+
* Compute lazily when possible
59+
* List comprehensions are typically faster than loops, generator comprehensions are a bit slower
60+
than list comprehensions, but use less memory. You can think of comprehensions as equivalent to
61+
Rust's `iter.map()`. List comprehensions effectively call `collect::<Vec<_>>` at the end, whereas
62+
generator comprehensions do not.

src/debuginfo/gdb-internals.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# (WIP) GDB Internals
2+
3+
GDB's Rust support lives at `gdb/rust-lang.h` and `gdb/rust-lang.c`. The expression parsing support
4+
can be found in `gdb/rust-exp.h` and `gdb/rust-parse.c`

src/debuginfo/gdb-visualizers.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# (WIP) GDB - Python Providers
2+
3+
Below are links to relevant parts of the GDB documentation
4+
5+
* [Overview on writing a pretty printer](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Writing-a-Pretty_002dPrinter.html#Writing-a-Pretty_002dPrinter)
6+
* [Pretty Printer API](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Pretty-Printing-API.html#Pretty-Printing-API) (equivalent to LLDB's `SyntheticProvider`)
7+
* [Value API](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Values-From-Inferior.html#Values-From-Inferior) (equivalent to LLDB's `SBValue`)
8+
* [Type API](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Types-In-Python.html#Types-In-Python) (equivalent to LLDB's `SBType`)
9+
* [Type Printing API](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Type-Printing-API.html#Type-Printing-API) (equivalent to LLDB's `SyntheticProvider.get_type_name`)

src/debuginfo/intro.md

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# Debug Info
2+
3+
Debug info is a collection of information generated by the compiler that allows debuggers to
4+
correctly interpret the state of a program while it is running. That includes things like mapping
5+
instruction addresses to lines of code in the source file, and type layout information so that
6+
bytes in memory can be read and displayed in a meaningful way.
7+
8+
Debug info can be a slightly overloaded term, covering all the layers between Rust MIR, and the
9+
end-user seeing the output of their debugger onscreen. In brief, the stack from beginning to end is
10+
as follows:
11+
12+
1. Rustc inspects the MIR and communicates the relevant source, symbol, and type information to LLVM
13+
2. LLVM translates this information into a target-specific debug info format during compilation
14+
3. A debugger reads and interprets the debug info, mapping source-lines and allowing the debugee's
15+
variables in memory to be located and read with the correct layout
16+
4. Built-in debugger formatting and styling is applied to variables
17+
5. User-defined scripts are run, formatting and styling the variables further
18+
6. The debugger frontend displays the variable to the user, possibly through the means of additional
19+
API layers (e.g. VSCode extension by way of the
20+
[Debug Adapter Protocol](https://microsoft.github.io/debug-adapter-protocol/))
21+
22+
23+
> NOTE: This subsection of the dev guide is perhaps more detailed than necessary. It aims to collect
24+
> a large amount of scattered information into one place and equip the reader with as firm a grasp of
25+
> the entire debug stack as possible.
26+
>
27+
> If you are only interested in working on the visualizer
28+
> scripts, the information in the [debugger-visualizers](./debugger-visualizers.md) and
29+
> [testing](./testing.md) will suffice. If you need to make changes to Rust's debug node generation,
30+
> please see [rust-codegen](./rust-codegen.md). All other sections are supplementary, but can be
31+
> vital to understanding some of the compromises the visualizers or codegen need to make. It can
32+
> also be valuable to know when a problem might be better solved in LLVM or the debugger itself.
33+
34+
# DWARF
35+
36+
The is the primary debug info format for `*-gnu` targets. It is typically bundled in with the
37+
binary, but it [can be generated as a separate file](https://gcc.gnu.org/wiki/DebugFission). The
38+
DWARF standard is available [here](https://dwarfstd.org/).
39+
40+
> NOTE: To inspect DWARF debug info, [gimli](https://crates.io/crates/gimli) can be used
41+
> programatically. If you prefer a GUI, the author recommends [DWEX](https://github.com/sevaa/dwex)
42+
43+
# PDB/CodeView
44+
45+
The primary debug info format for `*-msvc` targets. PDB is a proprietary container format created by
46+
Microsoft that, unfortunately,
47+
[has multiple meanings](https://docs.rs/ms-pdb/0.1.10/ms_pdb/taster/enum.Flavor.html).
48+
We are concerned with ordinary PDB files, as Portable PDB is used mainly for .Net applications. PDB
49+
files are separate from the compiled binary and use the `.pdb` extension.
50+
51+
PDB files contain CodeView objects, equivalent to DWARF's tags. CodeView, the debugger that
52+
consumed CodeView objects, was originally released in 1985. Its original intent was for C debugging,
53+
and was later extended to support Visual C++. There are still minor alterations to the format to
54+
support modern architectures and languages, but many of these changes are undocumented and/or
55+
sparsely used.
56+
57+
It is important to keep this context in mind when working with CodeView objects. Due to its origins,
58+
the "feature-set" of these objects is very limited, and focused around the core features of C. It
59+
does not have many of the convenience or features of modern DWARF standards. A fair number of
60+
workarounds exist within the debug info stack to compensate for CodeView's shortcomings.
61+
62+
Due to its proprietary nature, it is very difficult to find information about PDB and CodeView. Many
63+
of the sources were made at vastly different times and contain incomplete or somewhat contradictory
64+
information. As such this page will aim to collect as many sources as possible.
65+
66+
* [CodeView 1.0 specification](./CodeView.pdf)
67+
* LLVM
68+
* [CodeView Overview](https://llvm.org/docs/SourceLevelDebugging.html#codeview-debug-info-format)
69+
* [PDB Overview and technical details](https://llvm.org/docs/PDB/index.html)
70+
* Microsoft
71+
* [microsoft-pdb](https://github.com/microsoft/microsoft-pdb) - A C/C++ implementation of a PDB
72+
reader. The implementation does not contain the full PDB or CodeView specification, but does
73+
contain enough information for other PDB consumers to be written. At time of writing (Nov 2025),
74+
this repo has been archived for several years.
75+
* [pdb-rs](https://github.com/microsoft/pdb-rs/) - A Rust-based PDB reader and writer based on
76+
other publicly-available information. Does not guarantee stability or spec compliance. Also
77+
contains `pdbtool`, which can dump PDB files (`cargo install pdbtool`)
78+
* [Debug Interface Access SDK](https://learn.microsoft.com/en-us/visualstudio/debugger/debug-interface-access/getting-started-debug-interface-access-sdk).
79+
While it does not document the PDB format directly, details can be gleaned from the interface
80+
itself.
81+
82+
# Debuggers
83+
84+
Rust supports 3 major debuggers: GDB, LLDB, and CDB. Each has its own set of requirements,
85+
limitations, and quirks. This unfortunately creates a large surface area to account for.
86+
87+
> NOTE: CDB is a proprietary debugger created by Microsoft. The underlying engine also powers
88+
>WinDbg, KD, the Microsoft C/C++ extension for VSCode, and part of the Visual Studio Debugger. In
89+
>these docs, it will be referred to as CDB for consistency
90+
91+
While GDB and LLDB do offer facilities to natively support Rust's value layout, this isn't
92+
completely necessary. Rust currently outputs debug info very similar to that of C++, allowing
93+
debuggers without Rust support to work with a slightly degraded experience. More detail will be
94+
included in later sections, but here is a quick reference for the capabilities of each debugger:
95+
96+
| Debugger | Debug Info Format | Native Rust support | Expression Style | Visualizer Scripts |
97+
| --- | --- | --- | --- | --- |
98+
| GDB | DWARF | Full | Rust | Python |
99+
| LLDB | DWARF and PDB | Partial | C/C++ | Python |
100+
| CDB | PDB | None | C/C++ | Natvis |
101+
102+
> IMPORTANT: CDB can be assumed to run only on Windows. No assumptions can be made about the OS
103+
>running GDB or LLDB.
104+
105+
## Unsupported
106+
107+
Below, are several unsupported debuggers that are of particular note due to their potential impact
108+
in the future.
109+
110+
* [Bugstalker](https://github.com/godzie44/BugStalker) is an x86-64 Linux debugger written in Rust,
111+
specifically to debug Rust programs. While promising, it is still in early development.
112+
* [RAD Debugger](https://github.com/EpicGamesExt/raddebugger) is a Windows-only GUI debugger. It has
113+
a custom debug info format that PDB is translated into. The project also includes a linker that can
114+
generate their new debug info format during the linking phase.

0 commit comments

Comments
 (0)