Skip to content

Conversation

@skc7
Copy link
Contributor

@skc7 skc7 commented Sep 24, 2025

  • LLVM IR of __asan_malloc_impl has generic pointers accessing private memory because of patch clang: Fix broken implicit cast to generic address space #138863

  • Call to __asan_malloc_impl is introduced late in the pass pipeline by "amdgpu-sw-lower-lds" pass to lowered kernels. Prior passes introduce "amdgpu-no-flat-scratch-init" to the kernels since this call is not present in the IR for kernel to start with.

  • So, to make sure "Flat Addressing for Scratch Memory" works, this PR removes "amdgpu-no-flat-scratch-init" from the lowered kernels.

@llvmbot
Copy link
Member

llvmbot commented Sep 24, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Chaitanya (skc7)

Changes
  • LLVM IR of __asan_malloc_impl has generic pointers accessing private memory because of patch #138863

  • Call to __asan_malloc_impl is introduced late in the pass pipeline by "amdgpu-sw-lower-lds" pass to lowered kernels. Prior passes introduce "amdgpu-no-flat-scratch-init" to the kernels since this call is not present in the IR for kernel to start with.

  • So, to make sure "Flat Addressing for Scratch Memory" works, this PR removes "amdgpu-no-flat-scratch-init" from the lowered kernels.


Full diff: https://github.com/llvm/llvm-project/pull/160541.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp (+2-1)
  • (added) llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-no-scratch-init.ll (+18)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp b/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
index 4a9437b37aa39..d610fbf9bb06b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
@@ -1213,7 +1213,8 @@ bool AMDGPUSwLowerLDS::run() {
       removeFnAttrFromReachable(
           CG, Func,
           {"amdgpu-no-workitem-id-x", "amdgpu-no-workitem-id-y",
-           "amdgpu-no-workitem-id-z", "amdgpu-no-heap-ptr"});
+           "amdgpu-no-workitem-id-z", "amdgpu-no-heap-ptr",
+           "amdgpu-no-flat-scratch-init"});
       if (!LDSParams.IndirectAccess.StaticLDSGlobals.empty() ||
           !LDSParams.IndirectAccess.DynamicLDSGlobals.empty())
         removeFnAttrFromReachable(CG, Func, {"amdgpu-no-lds-kernel-id"});
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-no-scratch-init.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-no-scratch-init.ll
new file mode 100644
index 0000000000000..660b0f32590ba
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-no-scratch-init.ll
@@ -0,0 +1,18 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals all --version 4
+; RUN: opt < %s -passes=amdgpu-sw-lower-lds -S -amdgpu-asan-instrument-lds=false -mtriple=amdgcn-amd-amdhsa | FileCheck %s
+; RUN: llc < %s -enable-new-pm -stop-after=amdgpu-sw-lower-lds -amdgpu-asan-instrument-lds=false -mtriple=amdgcn-amd-amdhsa | FileCheck %s
+
+; Test to check if amdgpu-no-flat-scratch-init is removed from the kernel.
+@lds_1 = internal addrspace(3) global [1 x i8] poison, align 4
+
+define amdgpu_kernel void @k0() #0 {
+  store i8 7, ptr addrspace(3) @lds_1, align 4
+  ret void
+}
+
+attributes #0 = { sanitize_address "amdgpu-no-flat-scratch-init"}
+
+!llvm.module.flags = !{!0}
+!0 = !{i32 4, !"nosanitize_address", i32 1}
+
+; CHECK: attributes #[[ATTR0:.*]] = { sanitize_address "amdgpu-lds-size"="8" }

@arsenm
Copy link
Contributor

arsenm commented Sep 24, 2025

LLVM IR of __asan_malloc_impl has generic pointers accessing private memory

I think it would be better to change the implementation to make sure that is not possible

@b-sumner
Copy link

LLVM IR of __asan_malloc_impl has generic pointers accessing private memory

I think it would be better to change the implementation to make sure that is not possible

I haven't any idea where that could be happening. It was certainly not deliberate. @skc7 do you know where it's coming from?

@arsenm
Copy link
Contributor

arsenm commented Sep 25, 2025

The runtime should be compiled as OpenCL 1.x to ensure this isn't possible. IIRC we are compiling as 2.x to work around the wrong address space numbering bug. I'm not sure what the behavior is if you use 1.x/3.x with the generic extension enabled. Can you try compiling the library in that mode and see what IR you get?

@skc7
Copy link
Contributor Author

skc7 commented Sep 25, 2025

LLVM IR of __asan_malloc_impl has generic pointers accessing private memory

I think it would be better to change the implementation to make sure that is not possible

I haven't any idea where that could be happening. It was certainly not deliberate. @skc7 do you know where it's coming from?

image

Above image is a diff of LLVM IR for get_current_slab method from asanrtl/src/dm.cl, before and after the #138863.
STORE were previously using the addrspace(5) ptrs, which are now using generic ptrs.
My assumption is these addrspacecasts will not vanish though we change the logic.

@arsenm
Copy link
Contributor

arsenm commented Sep 25, 2025

My assumption is these addrspacecasts will not vanish though we change the logic.

You need to change the language in the runtime build. We also could do a better job emitting cleaner IR in clang, but you need to change the language to make it structurally impossible to require the addrspacecast

@skc7 skc7 requested a review from ampandey-1995 September 25, 2025 08:07
@b-sumner
Copy link

I see now. That private pointer is probably the "expected" argument to the compare_exchange. It ought to be optimized away completely later. I don't understand why we would be casting it to generic. Is the builtin requiring it?

@b-sumner
Copy link

You need to change the language in the runtime build. We also could do a better job emitting cleaner IR in clang, but you need to change the language to make it structurally impossible to require the addrspacecast

We can't simply change the way the runtime is built. There are functions that must accept generic pointers. Removing an incorrectly added attribute seems like the right way to go. Do you disagree that the attribute has become incorrect?

@arsenm
Copy link
Contributor

arsenm commented Sep 26, 2025

We can't simply change the way the runtime is built. There are functions that must accept generic pointers.

Which can be managed to minimize the use of generic pointers to that interface. The actual library content should not be using generic to access scratch

Removing an incorrectly added attribute seems like the right way to go. Do you disagree that the attribute has become incorrect?

I think any compiler runtime library needs to have its properties carefully controlled to not cause trouble for the compiler. There is no reason for such a cast to exist in the library. We should have a custom verification that we can assume none of these functions rely on any implicit arguments.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be addressed in the compiler

Copy link

@b-sumner b-sumner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can separately and thoughtfully pursue changing the device library build and implementation. We need to fix this before the problem appears in yet another release. Let's land this for the time being.

@arsenm
Copy link
Contributor

arsenm commented Sep 26, 2025

Let's not. I have low confidence this will ever be removed

@arsenm
Copy link
Contributor

arsenm commented Sep 29, 2025

The correct fix of this is in the runtime library build. There are 4 plausible fixes:

  1. The runtime function unnecessarily uses optnone, which prevents optimizing out this cast. We should fix this regardless. However, using that as a "fix" isn't great since it's still relying on optimizations, just in the device library build. We could add custom validation to the library build to make sure it satisfies feature expectations.
  2. The runtime library should be managing its casts more closely by changing the language options to ensure these casts cannot be formed, this is the higher effort version of fixing the library build
  3. Stop inferring amdgpu-no-flat-scratch-init in sanitized functions. We do similar already for hostcall, this would be an extension of that logic in AMDGPUAttributor
  4. Undo the AMDGPUAttributor optimization later, which is what this does. This regresses on depending on whole program visibility, so if we're going to hack around this in the compiler I'd rather do option 3

@skc7
Copy link
Contributor Author

skc7 commented Sep 30, 2025

The correct fix of this is in the runtime library build. There are 4 plausible fixes:

  1. The runtime function unnecessarily uses optnone, which prevents optimizing out this cast. We should fix this regardless. However, using that as a "fix" isn't great since it's still relying on optimizations, just in the device library build. We could add custom validation to the library build to make sure it satisfies feature expectations.
  2. The runtime library should be managing its casts more closely by changing the language options to ensure these casts cannot be formed, this is the higher effort version of fixing the library build
  3. Stop inferring amdgpu-no-flat-scratch-init in sanitized functions. We do similar already for hostcall, this would be an extension of that logic in AMDGPUAttributor
  4. Undo the AMDGPUAttributor optimization later, which is what this does. This regresses on depending on whole program visibility, so if we're going to hack around this in the compiler I'd rather do option 3

Thanks @arsenm for feedback. Raised PR #161319 for option 3.

@skc7
Copy link
Contributor Author

skc7 commented Oct 3, 2025

Closing this PR since #161319 is merged.

@skc7 skc7 closed this Oct 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants