Skip to content

Commit d3bb29d

Browse files
krzysz00kuharScottTodd
authored andcommitted
[docs] Add LLVM debugging and some AMDGPU-specific tips (iree-org#22146)
Co-authored-by: Jakub Kuderski <[email protected]> Co-authored-by: Scott Todd <[email protected]> Signed-off-by: Philipp <[email protected]>
1 parent 8662459 commit d3bb29d

File tree

3 files changed

+237
-0
lines changed

3 files changed

+237
-0
lines changed

docs/website/docs/developers/debugging/gpu.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -175,6 +175,21 @@ investigate by comparing with different paths and inputs:
175175
* [vendor-specific tools](../performance/profiling-gpu-vulkan.md) to
176176
understand kernel internal counters to identify the bottleneck.
177177

178+
!!! tip "[correctness]"
179+
180+
Some targets support the `gpu.printf` operation for printing out values from
181+
within GPU code, and many of the targets that don't _could_ support it with
182+
some work in IREE or upstream MLIR.
183+
184+
!!! tip "[correctness]"
185+
186+
If you suspect an issue in an LLVM backend, check
187+
[the LLVM debugging playbook](./llvm.md) for general recommendations.
188+
189+
[:simple-amd:] An occasional source of failures has been disagreements about
190+
code object version. Ensure that both `amdhsa_code_object_version` metadata
191+
and `__oclc_ABI_version` are set and agree.
192+
178193
## Pinpointing runtime issues
179194

180195
On the other side, if we suspect that it's a runtime issue, here are some
@@ -236,3 +251,67 @@ useful approachs and tips:
236251
* [:simple-vulkan:] Use `--vulkan_robust_buffer_access=true` to `iree-run-module`
237252
especially when seeing undeterministic/corrupted contents in buffers and
238253
suspecting there are buffer allocation/indexing issues.
254+
255+
## Binary substiution for ROCm
256+
257+
[:simple-amd:] The AMD ROCm target supports binary substitution on HSA code objects
258+
(`.hsaco` files).
259+
260+
These files are, under the hood, ELF shared libraries containing kernel code.
261+
262+
If you have manually produced a binary you want to test, such as by manually
263+
running `llc` with different optimization flags, you can turn the `.o` into
264+
a `.hsaco` with
265+
266+
``` shell
267+
ld.lld -o [filename].hsaco -shared [filename].o
268+
```
269+
270+
In full, if you have a dispatch in `dispatch.mlir` and want to recompile it with
271+
while potentially making modifications, the process is
272+
273+
``` shell
274+
# A PATH edit is not strictly required. It is used here to point out that the
275+
# LLVM binaries used should me built from the same LLVM sources IREE uses.
276+
export PATH="[build-directory]/llvm-project/bin:[build-directory]/tools/bin:$PATH"
277+
278+
iree-compile dispatch.mlir \
279+
--iree-hal-target-device=hip \
280+
--iree-hip-target=<target> \
281+
-o original.vmfb \
282+
--iree-hal-dump-executable-files-to=odump
283+
# Opt flags are in dump/[...].optimized.ll to a file.
284+
opt -o - [opt flags] <dump/[...].linked.ll >altered.opt.ll
285+
# llc flags are in dump/[...].rocmsasm.
286+
# To produce an assembly file.
287+
llc [llc flags] altered.opt.ll -o altered.rocmasm
288+
# To produce an object file.
289+
llc [llc flags] altered.opt.ll -o altered.o --filetype=obj
290+
# Linking to an HSACO.
291+
ld.lld -o altered.hsaco -shared altered.o
292+
# Re-compile with substitution. [dispatch_name] is the name of the
293+
# `hal.executable` op symbol, not the variant within it. This can
294+
# be found by looking at the relevant configured_*.mlir file in dump/, for
295+
# example.
296+
iree-compile dispatch.mlir \
297+
--iree-hal-target-device=hip \
298+
--iree-hip-target=<target> \
299+
-o altered.vmfb \
300+
--iree-hal-substitute-executable-object=[dispatch_name]=altered.hsaco
301+
```
302+
303+
If successful, `iree-complie` will print a message stating
304+
305+
``` shell
306+
NOTE: hal.executable `[executable name]` substituted with object file at`altered.hsaco`
307+
```
308+
309+
During each of these steps, you can change the flags or manually edit the `.ll`
310+
(or even `.rocmasm`) files to attempt to get potentially-different behavior.
311+
312+
!!! note
313+
314+
The binary substitution process could be used to replace a dispatch with a
315+
completely foreign implementation, such as one written in C, so long as the
316+
function names and argument handling schemes agree. If you do this, please
317+
document the steps here.
Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
---
2+
hide:
3+
- tags
4+
tags:
5+
- CPU
6+
- CUDA
7+
- GPU
8+
- ROCm
9+
icon: octicons/bug-16
10+
---
11+
12+
# LLVM debugging playbook
13+
14+
This page aims to collect notes on how to debug or reduce issues that
15+
appear to arise from within LLVM itself and how to generate useful LLVM
16+
bug reports.
17+
18+
This guide contains platform-independent notes applicable to both
19+
CPU and GPU compilation. Additional GPU-specific notes (such as how
20+
to perform binary substitutions in an AMD GPU context) are contained
21+
within [the GPU debugging playbook](./gpu.md).
22+
23+
## Generating LLVM IR
24+
25+
Wthen bisecting, reducing, or debugging an issue that might manifest within
26+
LLVM, it can be helpful to use the
27+
`--iree-hal-dump-executable-intermediates-to=[directory]` (or the more general
28+
`--iree-hal-dump-executable-files-to=[directory]`) flags to `iree-compile`
29+
or `iree-opt`. These flags will cause IREE to write out the compiled LLVM module
30+
to the specified directory so you can operate on it directly.
31+
32+
Generally, there will be a `.linked` file, which contains the LLVM IR shortly after
33+
it was generated by MLIR (though after steps like bitcode library linking where
34+
applicable) and a `.optimized` file, which contains the IR after the `opt` passes
35+
have been run. The `.optimized` file may include reproduction instructions (if
36+
it doesn't, the relevant compiler plugin hasn't been updated to add them).
37+
38+
Similarly, the final generated assembly (a `.s` or `.rocmasm` or so on) may
39+
include reproduction instructions. Where those are present, they should be helpful
40+
in manually recreating the LLVM compilation so that you no longer have to
41+
route any changes through IREE.
42+
43+
For more details and other related flags, see
44+
[the documentation on dumping intermediate files](../general/developer-tips/#dumping-executable-files).
45+
46+
!!! tip
47+
48+
While `opt` is "target-independent", many passes (such as vectorization)
49+
have substantial dependencies on target information. Ensure your LLVM IR
50+
contains a `target triple` or that you're passing `-mtriple=` to your `opt`
51+
invocations. There are fewer dependencies on `-mcpu=`, but it should also be
52+
preserved to reduce debug variability.
53+
54+
## LLVM binaries
55+
56+
To create LLVM binaries that run on the same commit as your IREE checkout,
57+
use
58+
59+
``` shell
60+
cmake --build [build-directory] --tragtet opt llc
61+
```
62+
63+
to produce binaries in `[build-directory]/llvm-project/bin`. You can similarly
64+
produce other utility binaries such as `llvm-reduce`, which aren't built by default.
65+
66+
## Reducing optimization levels in IREE
67+
68+
If you suspect an LLVM bug, try disabling (or reducing) one or both optimization
69+
levels. LLVM has two places where a `-O[n]` is applied: the middle-end
70+
(`opt`) and the backend/codegen (`llc`). The backend optimization level
71+
is selected by values of the `llvm::CodeGenOptLevel` enum, which is passed to
72+
a `createTargetMachine` call in the compiler plugin. This level defaults to
73+
`-O3`. On the other hand, the generic/middle-end `opt` optimization level
74+
is controlled by `llvm::OptimizationLevel` and defaults to `-O2` currently.
75+
76+
Setting one or both of these values to the `-O0` or `-O1` equivalent
77+
and seeing the issue go away is an indicator that there may be a LLVM bug
78+
in play. It may, however, also indicate that there's a race condition or
79+
other correctness issue in the generated LLVM IR that is masked by a lack
80+
of compiler optimizations.
81+
82+
## Useful flags for `opt` and `llc`
83+
84+
- `-print-after-all` and `-print-before/after=[passname]` can help
85+
locate places were suspect IR is introduced or where crashes occur, just as their
86+
MLIR equivalents can be used in IREE.
87+
- `-print-module-scope` ensures IR dumps include attributes and metadata
88+
if those are relevant
89+
- The exact process for feeding a binary back into IREE after manually compiling
90+
it is target-specific, but will generally involve
91+
`--iree-hal-substitute-executable-object=[executable]=[filename]`.
92+
- `-global-isel=1` (changing the instruction selection system in `llc`)
93+
can be helpful in localizing a bug to instruction selection. If it solves
94+
your problem (or turns it into a different bug), you've substantially narrowed
95+
down the code that needs to be searched.
96+
- `opt` produces human-readable output whin passed the `-S` flag, and often
97+
needs a `-o -` to send its results to standard output. `llc` takes a
98+
`--filetype={asm,obj}` argument to control whether assembly or
99+
assembled objects are produced.
100+
101+
## `llvm-diff`
102+
103+
When adjusting an `opt` invocation to isolate misbehaving passes or when
104+
comparing LLVM IR from a working and a broken commit, you may be able to
105+
use the `llvm-diff` tool to compare two LLVM IR files without the noise that
106+
is induced by LLVM's IR numbering scheme. Note that `llvm-diff` output is written
107+
to standard error and should be redirected with a `2>&1`.
108+
109+
## `llvm-reduce`
110+
111+
In some cases - particularly compiler crashes, the `llvm-reduce` program
112+
(part of LLVM) may be useful. It takes a LLVM IR file and an "interestingness
113+
script" which returns 0 (success) if there **is** a problem with a proposed
114+
reduced input and fails otherwise.
115+
116+
When writing such a script, the `not` tool (especially its
117+
`not --crash [program] [args]` mode) and `FileCheck` from the LLVM test suite
118+
are often useful.
119+
120+
For example, the interestingness script used to reduce the crash in
121+
[issue #22001](https://github.com/iree-org/iree/issues/22001) was
122+
123+
``` shell
124+
#!/usr/bin/env bash
125+
126+
[llvm-bin]/not --crash [llvm-bin]/opt -passes='amdgpu-lower-buffer-fat-pointers' -disable-output "$@" 2>/dev/null
127+
```
128+
129+
which was used with `llvm-reduce -test=interesting.sh pre-buffer-loads.ll`
130+
on output created by adding
131+
`--print-before=amdgpu-lower-buffer-fat-pointers --print-module-scope`
132+
to the crashing `llc` invocation (the crashing pass was located through
133+
backtraces and `--print-after-all`).
134+
135+
[This is a helpful LLVM slide deck on how to operate `llvm-reduce`](https://www.llvm.org/devmtg/2025-04/slides/tutorial/arsenault_reduce.pdf).
136+
These slides include other useful flags and tips.
137+
138+
## Creating reproducers
139+
140+
If you're planning to file a bug against LLVM, it's helpful to create a small reproducer.
141+
142+
In many cases, an input that demonstrates the behavior you've identified as a bug
143+
can be either created with `llvm-reduce` or by hand.
144+
145+
However, in some cases (such as incorrect dispatch results that aren't
146+
clearly attributable to a particular change) all you can do is create a
147+
reproduction harness. The exact process for creating these is target-specific,
148+
but such a harness should be a piece of standalone code that links against / loads
149+
different versions of the misbehaving input, calls the function at issue,
150+
and reports the results (likely checking against a naive implementation).
151+
152+
This wrapper program should be accompanied by a simple build process that doesn't
153+
depend on IREE and instructions on how to run it. The build should produce binaries
154+
from LLVM IR - ideally, the post-`opt` IR, by calling `llc` (or, if needed, `opt`).
155+
156+
If the bug goes away at different optimization levels, you should build a working
157+
and a non-working binary.

docs/website/mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,7 @@ nav:
240240
- "developers/debugging/android-with-lldb.md"
241241
- "developers/debugging/compile-time-regressions.md"
242242
- "developers/debugging/gpu.md"
243+
- "developers/debugging/llvm.md"
243244
- "developers/debugging/integration-tests.md"
244245
- "developers/debugging/model-development.md"
245246
- "developers/debugging/releases.md"

0 commit comments

Comments
 (0)