Skip to content

Conversation

@miguelcsx
Copy link
Contributor

@miguelcsx miguelcsx commented Jun 24, 2025

Summary

Fixes an issue where kernel-info pass remarks were not being saved to YAML optimization record files when using -fsave-optimization-record, despite appearing in terminal output.

Problem

The kernel-info pass was registered using FullLinkTimeOptimizationLastEPCallback, which runs after the LTO pipeline completes and after the remark streamer has been finalized. This timing issue caused:

  • Kernel-info remarks appeared in terminal output
  • Kernel-info remarks were missing from YAML files

Solution

Move kernel-info pass registration from FullLinkTimeOptimizationLastEPCallback to OptimizerLastEPCallback, which runs during the LTO optimization pipeline while the remark streamer is still active.

Resulting Diff in YAML Output

Example of YAML output before and after this change:

image

Sorry for the light theme hehe

Targets Affected

  • NVPTX
  • AMDGPU

Testing

Tested with:

clang -O2 -g -fopenmp --offload-arch=native main.c -foffload-lto \
  -Rpass=kernel-info -fsave-optimization-record

The fix provides meaningful source locations by falling back to the
containing function's subprogram information instead of showing unknown
locations.
@miguelcsx miguelcsx force-pushed the refac/kernel-info branch from 06bb439 to 131546e Compare June 25, 2025 00:38
@miguelcsx miguelcsx changed the title [Analysis] Improve KernelInfo debug location handling for compiler-generated code [Target][KernelInfo] Fix kernel-info remarks missing from YAML optimization records Jun 25, 2025
@github-actions
Copy link

github-actions bot commented Jun 25, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

… for YAML remark output

The kernel-info pass was registered using FullLinkTimeOptimizationLastEPCallback,
which runs after the optimization record YAML files have been finalized. This
caused kernel-info remarks to appear in terminal output but not in YAML files
when using -fsave-optimization-record.

Move kernel-info registration to OptimizerLastEPCallback, which runs during
the LTO optimization pipeline while the remark streamer is still active.

This ensures kernel-info remarks (including NVVM GPU intrinsics like
@llvm.nvvm.read.ptx.sreg.tid.x) are captured in both terminal output and
YAML optimization record files.

Affects NVPTX and AMDGPU targets.
@miguelcsx miguelcsx force-pushed the refac/kernel-info branch from 131546e to 9f77d27 Compare June 25, 2025 00:46
@jdenny-ornl
Copy link
Collaborator

@jdoerfert had suggested I place KernelInfo as late in the pipeline as possible. I'm concerned that moving the pass earlier will change remarks not to as closely reflect the hardware instructions that will actually execute.

Instead, we can try to tell offload LTO to generate the yaml. @miguelcsx Do these clang command-line options work for you?

-Xoffload-linker --opt-remarks-filename -Xoffload-linker offload-remarks.yaml

That's a bit ugly. In the future, maybe we need some way for clang's -fsave-optimization-record to generate something like that for us. @jhuber6?

@jhuber6
Copy link
Contributor

jhuber6 commented Jul 15, 2025

That's a bit ugly. In the future, maybe we need some way for clang's -fsave-optimization-record to generate something like that for us. @jhuber6?

Shouldn't this be forwarded automatically via https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/Clang.cpp#L9117?

@jdenny-ornl
Copy link
Collaborator

Adding -save-temps to -fsave-optimization-record makes it produce an extra a.out.amdgcn.gfx906.img.opt.ld.yaml that has the remarks we need. I haven't looked into why yet.

@jdenny-ornl
Copy link
Collaborator

Well, it's generating -plugin-opt=opt-remarks-filename=/tmp/a.out.amdgcn.gfx906-b404c4.img.opt.ld.yaml without -save-temps, so we just need to get it to stop making it a tmp file.

jdenny-ornl added a commit to jdenny-ornl/llvm-project that referenced this pull request Jul 16, 2025
As discussed in PR llvm#145603, the following command fails to produce a
YAML remarks file for offload LTO passes and thus for kernel-info:

```
clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
  -Rpass=kernel-info -fsave-optimization-record
```

The problem is that, in clang-linker-wrapper's clang call, clang names
the file based on clang's main output file (from `-o`).  That is a
temporary file, so the YAML file becomes a temporary file, which the
user never sees.

This patch:
- Extends clang with a hidden `-foutput-file-base=BASE` option that
  overrides the main output file as the base for other output files.
- Makes clang honor that option only for the default YAML remarks
  file, but future patches could use it for other output files too.
- Extends clang-linker-wrapper to specify that option to clang.
jdenny-ornl added a commit that referenced this pull request Jul 30, 2025
As discussed in PR #145603, the following command seems to fail to
produce a YAML remarks file for offload LTO passes and thus for
kernel-info:

```
clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
  -Rpass=kernel-info -fsave-optimization-record
```

The problem is that, in clang-linker-wrapper's clang call, clang names
the file based on clang's main output file (from `-o`). That is a
temporary file, so the YAML file becomes a temporary file, which the
user never sees.

This patch:
- Makes clang honor `-dumpdir` for the default YAML remarks file in the
case of LTO.
- Extends clang-linker-wrapper to specify that option to clang.

To demonstrate the appeal of the generality of `-dumpdir` (as opposed to
a one-off `-fsave-optimization-record` solution in
clang-linker-wrapper), this patch also fixes `-gsplit-dwarf`. Without
this patch, when using `-gsplit-dwarf` and later debugging using rocgdb,
the dwo directory for offload is a temporary file, so temporary file
cleanup causes rocgdb to lose debug symbols for offload code.

WARNING: The clang driver passes `-dumpdir` to various clang frontend
calls. For LTO, that was previously being ignored, and now it's not.
That changes some auxiliary file names, as revealed by changes in some
existing tests' expected output: `clang/test/Driver/opt-record.c` and
`clang/test/Driver/lto-dwo.c`. Hopefully this change does not introduce
a backward compatibility issue for users.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants