Skip to content

Conversation

@shiltian
Copy link
Contributor

@shiltian shiltian commented Aug 16, 2025

The existing amdgpu-kernarg-preload-count can't be used as a switch to turn it off if it is set to 0. This PR adds an extra option to turn it off.

Fixes SWDEV-550147.

The existing `amdgpu-kernarg-preload-count` can't be used as a switch to turn it
off if it is set to 0. This PR adds an extra option to turn it off.

Fixes SWDEV-550147.
Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@llvmbot
Copy link
Member

llvmbot commented Aug 16, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Shilei Tian (shiltian)

Changes

The existing amdgpu-kernarg-preload-count can't be used as a switch to turn it
off if it is set to 0. This PR adds an extra option to turn it off.

Fixes SWDEV-550147.


Full diff: https://github.com/llvm/llvm-project/pull/153975.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUPreloadKernelArguments.cpp (+8)
  • (added) llvm/test/CodeGen/AMDGPU/disable-preload-kernargs.ll (+29)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPreloadKernelArguments.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPreloadKernelArguments.cpp
index 984c1ee89309e..a386fe621a553 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPreloadKernelArguments.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPreloadKernelArguments.cpp
@@ -37,6 +37,11 @@ static cl::opt<unsigned> KernargPreloadCount(
     "amdgpu-kernarg-preload-count",
     cl::desc("How many kernel arguments to preload onto SGPRs"), cl::init(0));
 
+static cl::opt<bool>
+    EnableKernargPreload("amdgpu-kernarg-preload",
+                         cl::desc("Enable preload kernel arguments to SGPRs"),
+                         cl::init(true));
+
 namespace {
 
 class AMDGPUPreloadKernelArgumentsLegacy : public ModulePass {
@@ -275,6 +280,9 @@ AMDGPUPreloadKernelArgumentsLegacy::AMDGPUPreloadKernelArgumentsLegacy(
     : ModulePass(ID), TM(TM) {}
 
 static bool markKernelArgsAsInreg(Module &M, const TargetMachine &TM) {
+  if (!EnableKernargPreload)
+    return false;
+
   SmallVector<Function *, 4> FunctionsToErase;
   bool Changed = false;
   for (auto &F : M) {
diff --git a/llvm/test/CodeGen/AMDGPU/disable-preload-kernargs.ll b/llvm/test/CodeGen/AMDGPU/disable-preload-kernargs.ll
new file mode 100644
index 0000000000000..75aaec6f1fa70
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/disable-preload-kernargs.ll
@@ -0,0 +1,29 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -passes=amdgpu-preload-kernel-arguments -amdgpu-kernarg-preload=0 %s -o - | FileCheck -check-prefix=NO-PRELOAD %s
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -passes=amdgpu-preload-kernel-arguments %s -o - | FileCheck -check-prefix=DEFAULT-PRELOAD %s
+
+@g1 = protected addrspace(1) externally_initialized global i16 0, align 2
+
+define amdgpu_kernel void @test_kernel_with_zero_kernel_arg() {
+; NO-PRELOAD-LABEL: define amdgpu_kernel void @test_kernel_with_zero_kernel_arg(
+; NO-PRELOAD-SAME: ) #[[ATTR0:[0-9]+]] {
+; NO-PRELOAD-NEXT:    [[IMPLICITARG_PTR:%.*]] = call ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
+; NO-PRELOAD-NEXT:    [[GEP:%.*]] = getelementptr inbounds i8, ptr addrspace(4) [[IMPLICITARG_PTR]], i64 12
+; NO-PRELOAD-NEXT:    [[GROUP_SIZE_X:%.*]] = load i16, ptr addrspace(4) [[GEP]], align 2
+; NO-PRELOAD-NEXT:    store i16 [[GROUP_SIZE_X]], ptr addrspace(1) @g1, align 2
+; NO-PRELOAD-NEXT:    ret void
+;
+; DEFAULT-PRELOAD-LABEL: define amdgpu_kernel void @test_kernel_with_zero_kernel_arg(
+; DEFAULT-PRELOAD-SAME: i32 inreg "amdgpu-hidden-argument" [[_HIDDEN_BLOCK_COUNT_X:%.*]], i32 inreg "amdgpu-hidden-argument" [[_HIDDEN_BLOCK_COUNT_Y:%.*]], i32 inreg "amdgpu-hidden-argument" [[_HIDDEN_BLOCK_COUNT_Z:%.*]], i16 inreg "amdgpu-hidden-argument" [[_HIDDEN_GROUP_SIZE_X:%.*]]) #[[ATTR0:[0-9]+]] {
+; DEFAULT-PRELOAD-NEXT:    [[IMPLICITARG_PTR:%.*]] = call ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
+; DEFAULT-PRELOAD-NEXT:    [[GEP:%.*]] = getelementptr inbounds i8, ptr addrspace(4) [[IMPLICITARG_PTR]], i64 12
+; DEFAULT-PRELOAD-NEXT:    [[GROUP_SIZE_X:%.*]] = load i16, ptr addrspace(4) [[GEP]], align 2
+; DEFAULT-PRELOAD-NEXT:    store i16 [[_HIDDEN_GROUP_SIZE_X]], ptr addrspace(1) @g1, align 2
+; DEFAULT-PRELOAD-NEXT:    ret void
+;
+  %implicitarg.ptr = call ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
+  %gep = getelementptr inbounds i8, ptr addrspace(4) %implicitarg.ptr, i64 12
+  %group_size_x = load i16, ptr addrspace(4) %gep
+  store i16 %group_size_x, ptr addrspace(1) @g1
+  ret void
+}

@tgymnich
Copy link
Member

Have you considered making amdgpu-kernarg-preload-count=0 work as intended instead? Would be less confusing.

@shiltian
Copy link
Contributor Author

Yes, I have. As I understand it, amdgpu-kernarg-preload-count only controls how many explicit kernel arguments are preloaded. It doesn't affect existing inreg arguments or implicit kernel arguments.

Copy link
Member

@tgymnich tgymnich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to come up with a name that would reflect this subtle difference. But I cannot find a better name either.

@shiltian shiltian merged commit e37eff5 into main Aug 18, 2025
11 checks passed
@shiltian shiltian deleted the users/shiltian/skip-amdgpu-preload-kernel-arguments branch August 18, 2025 13:44
@mikaelholmen
Copy link
Collaborator

Hi @shiltian

If built with EXPENSIVE_CHECKS, then running the new testcase fails like this:

LLVM ERROR: Module changed by AMDGPUPreloadKernelArgumentsPass without invalidating analyses
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /repo/llvm/build-all-expensive/bin/opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -passes=amdgpu-preload-kernel-arguments /repo/llvm/test/CodeGen/AMDGPU/disable-preload-kernargs.ll -o -
1.	Running pass "amdgpu-preload-kernel-arguments" on module "/repo/llvm/test/CodeGen/AMDGPU/disable-preload-kernargs.ll"
 #0 0x000056368faf3636 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/repo/llvm/build-all-expensive/bin/opt+0x4dab636)
 #1 0x000056368faf0bc5 llvm::sys::RunSignalHandlers() (/repo/llvm/build-all-expensive/bin/opt+0x4da8bc5)
 #2 0x000056368faf4809 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x00007fbfbdae3990 __restore_rt (/lib64/libpthread.so.0+0x12990)
 #4 0x00007fbfbcb8b52f raise (/lib64/libc.so.6+0x4e52f)
 #5 0x00007fbfbcb5ee65 abort (/lib64/libc.so.6+0x21e65)
 #6 0x000056368fab9254 llvm::report_fatal_error(llvm::Twine const&, bool) (/repo/llvm/build-all-expensive/bin/opt+0x4d71254)
 #7 0x000056369120b478 void llvm::detail::UniqueFunctionBase<void, llvm::StringRef, llvm::Any, llvm::PreservedAnalyses const&>::CallImpl<llvm::PreservedCFGCheckerInstrumentation::registerCallbacks(llvm::PassInstrumentationCallbacks&, llvm::AnalysisManager<llvm::Module>&)::$_2>(void*, llvm::StringRef, llvm::Any&, llvm::PreservedAnalyses const&) StandardInstrumentations.cpp:0:0
 #8 0x000056368fd43cae llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/repo/llvm/build-all-expensive/bin/opt+0x4ffbcae)
 #9 0x00005636911d96e7 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool, bool) (/repo/llvm/build-all-expensive/bin/opt+0x64916e7)
#10 0x000056368fa8dc9f optMain (/repo/llvm/build-all-expensive/bin/opt+0x4d45c9f)
#11 0x00007fbfbcb777e5 __libc_start_main (/lib64/libc.so.6+0x3a7e5)
#12 0x000056368fa8b2ee _start (/repo/llvm/build-all-expensive/bin/opt+0x4d432ee)
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /repo/llvm/build-all-expensive/bin/FileCheck -check-prefix=DEFAULT-PRELOAD /repo/llvm/test/CodeGen/AMDGPU/disable-preload-kernargs.ll

@mshockwave
Copy link
Member

Hi @shiltian

If built with EXPENSIVE_CHECKS, then running the new testcase fails like this:

LLVM ERROR: Module changed by AMDGPUPreloadKernelArgumentsPass without invalidating analyses
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /repo/llvm/build-all-expensive/bin/opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -passes=amdgpu-preload-kernel-arguments /repo/llvm/test/CodeGen/AMDGPU/disable-preload-kernargs.ll -o -
1.	Running pass "amdgpu-preload-kernel-arguments" on module "/repo/llvm/test/CodeGen/AMDGPU/disable-preload-kernargs.ll"
 #0 0x000056368faf3636 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/repo/llvm/build-all-expensive/bin/opt+0x4dab636)
 #1 0x000056368faf0bc5 llvm::sys::RunSignalHandlers() (/repo/llvm/build-all-expensive/bin/opt+0x4da8bc5)
 #2 0x000056368faf4809 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x00007fbfbdae3990 __restore_rt (/lib64/libpthread.so.0+0x12990)
 #4 0x00007fbfbcb8b52f raise (/lib64/libc.so.6+0x4e52f)
 #5 0x00007fbfbcb5ee65 abort (/lib64/libc.so.6+0x21e65)
 #6 0x000056368fab9254 llvm::report_fatal_error(llvm::Twine const&, bool) (/repo/llvm/build-all-expensive/bin/opt+0x4d71254)
 #7 0x000056369120b478 void llvm::detail::UniqueFunctionBase<void, llvm::StringRef, llvm::Any, llvm::PreservedAnalyses const&>::CallImpl<llvm::PreservedCFGCheckerInstrumentation::registerCallbacks(llvm::PassInstrumentationCallbacks&, llvm::AnalysisManager<llvm::Module>&)::$_2>(void*, llvm::StringRef, llvm::Any&, llvm::PreservedAnalyses const&) StandardInstrumentations.cpp:0:0
 #8 0x000056368fd43cae llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/repo/llvm/build-all-expensive/bin/opt+0x4ffbcae)
 #9 0x00005636911d96e7 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool, bool) (/repo/llvm/build-all-expensive/bin/opt+0x64916e7)
#10 0x000056368fa8dc9f optMain (/repo/llvm/build-all-expensive/bin/opt+0x4d45c9f)
#11 0x00007fbfbcb777e5 __libc_start_main (/lib64/libc.so.6+0x3a7e5)
#12 0x000056368fa8b2ee _start (/repo/llvm/build-all-expensive/bin/opt+0x4d432ee)
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /repo/llvm/build-all-expensive/bin/FileCheck -check-prefix=DEFAULT-PRELOAD /repo/llvm/test/CodeGen/AMDGPU/disable-preload-kernargs.ll

It was not directly caused by the changes made in this patch, but rather the test case itself triggers the error that was not previously caught. I sent a fix: #154645

mshockwave added a commit that referenced this pull request Aug 20, 2025
#154645)

#153975 added a new test,
`test/CodeGen/AMDGPU/disable-preload-kernargs.ll`, that triggers an
assertion under `LLVM_ENABLE_EXPENSIVE_CHECKS` complaining about not
invalidating analyses even when the Pass made changes. It was caused by
the fact that the Pass only invalidates the analyses when number of
explicit arguments is greater than zero, while it is possible that some
functions will be removed even when there isn't any explicit argument,
hence the missed invalidation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants