[AMDGPU][LowerBufferFatPointers] Handle addrspacecast null to p7 #140775

krzysz00 · 2025-05-20T18:12:04Z

Some application code operating on generic pointers (that then gete initialized to buffer fat pointers) may perform tests against nullptr. After address space inference, this results in comparisons against addrspacecast (ptr null to ptr addrspace(7)), which were crashing.

However, while general casts to ptr addrspace(7) from generic pointers aren't supposted, it is possible to cast null pointers to the all-zerose bufer resource and 0 offset, which this patch adds.

It also adds a TODO for casting out of buffer resources, which isn't implemented here but could be.

Some application code operating on generic pointers (that then gete initialized to buffer fat pointers) may perform tests against nullptr. After address space inference, this results in comparisons against `addrspacecast (ptr null to ptr addrspace(7))`, which were crashing. However, while general casts to ptr addrspace(7) from generic pointers aren't supposted, it is possible to cast null pointers to the all-zerose bufer resource and 0 offset, which this patch adds. It also adds a TODO for casting _out_ of buffer resources, which isn't implemented here but could be.

llvmbot · 2025-05-20T18:12:41Z

@llvm/pr-subscribers-backend-amdgpu

Author: Krzysztof Drewniak (krzysz00)

Changes

Some application code operating on generic pointers (that then gete initialized to buffer fat pointers) may perform tests against nullptr. After address space inference, this results in comparisons against addrspacecast (ptr null to ptr addrspace(7)), which were crashing.

However, while general casts to ptr addrspace(7) from generic pointers aren't supposted, it is possible to cast null pointers to the all-zerose bufer resource and 0 offset, which this patch adds.

It also adds a TODO for casting out of buffer resources, which isn't implemented here but could be.

Full diff: https://github.com/llvm/llvm-project/pull/140775.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp (+19-4)
(modified) llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll (+32)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp b/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
index 275b3a6f16e54..fe873508f46d2 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
@@ -1980,6 +1980,8 @@ PtrParts SplitPtrStructs::visitIntToPtrInst(IntToPtrInst &IP) {
 }
 
 PtrParts SplitPtrStructs::visitAddrSpaceCastInst(AddrSpaceCastInst &I) {
+  // TODO(krzysz00): handle casts from ptr addrspace(7) to global pointers
+  // by computing the effective address.
   if (!isSplitFatPtr(I.getType()))
     return {nullptr, nullptr};
   IRB.SetInsertPoint(&I);
@@ -1990,11 +1992,24 @@ PtrParts SplitPtrStructs::visitAddrSpaceCastInst(AddrSpaceCastInst &I) {
     SplitUsers.insert(&I);
     return {Rsrc, Off};
   }
-  if (I.getSrcAddressSpace() != AMDGPUAS::BUFFER_RESOURCE)
-    report_fatal_error("Only buffer resources (addrspace 8) can be cast to "
-                       "buffer fat pointers (addrspace 7)");
-  Type *OffTy = cast<StructType>(I.getType())->getElementType(1);
+
+  auto *ResTy = cast<StructType>(I.getType());
+  Type *RsrcTy = ResTy->getElementType(0);
+  Type *OffTy = ResTy->getElementType(1);
   Value *ZeroOff = Constant::getNullValue(OffTy);
+
+  // Special case for null pointers, which can be created by address space
+  // propagation.
+  auto *InConst = dyn_cast<Constant>(In);
+  if (InConst && InConst->isNullValue()) {
+    Value *NullRsrc = Constant::getNullValue(RsrcTy);
+    SplitUsers.insert(&I);
+    return {NullRsrc, ZeroOff};
+  }
+
+  if (I.getSrcAddressSpace() != AMDGPUAS::BUFFER_RESOURCE)
+    report_fatal_error("Only buffer resources (addrspace 8) and null pointers "
+                       "can be cast to buffer fat pointers (addrspace 7)");
   SplitUsers.insert(&I);
   return {In, ZeroOff};
 }
diff --git a/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll b/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll
index fd5bf579cdeb2..cc5e2c833cc39 100644
--- a/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll
+++ b/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll
@@ -332,6 +332,38 @@ define <2 x ptr addrspace(7)> @addrspacecast_vec(<2 x ptr addrspace(8)> %buf) {
   ret <2 x ptr addrspace(7)> %ret
 }
 
+define ptr addrspace(7) @addrspacecast_null() {
+; CHECK-LABEL: define { ptr addrspace(8), i32 } @addrspacecast_null
+; CHECK-SAME: () #[[ATTR0]] {
+; CHECK-NEXT:    ret { ptr addrspace(8), i32 } zeroinitializer
+;
+  %ret = addrspacecast ptr null to ptr addrspace(7)
+  ret ptr addrspace(7) %ret
+}
+
+define <2 x ptr addrspace(7)> @addrspacecast_null_vec() {
+; CHECK-LABEL: define { <2 x ptr addrspace(8)>, <2 x i32> } @addrspacecast_null_vec
+; CHECK-SAME: () #[[ATTR0]] {
+; CHECK-NEXT:    ret { <2 x ptr addrspace(8)>, <2 x i32> } zeroinitializer
+;
+  %ret = addrspacecast <2 x ptr> zeroinitializer to <2 x ptr addrspace(7)>
+  ret <2 x ptr addrspace(7)> %ret
+}
+
+define i1 @test_null(ptr addrspace(7) %p) {
+; CHECK-LABEL: define i1 @test_null
+; CHECK-SAME: ({ ptr addrspace(8), i32 } [[P:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[P_RSRC:%.*]] = extractvalue { ptr addrspace(8), i32 } [[P]], 0
+; CHECK-NEXT:    [[P_OFF:%.*]] = extractvalue { ptr addrspace(8), i32 } [[P]], 1
+; CHECK-NEXT:    [[IS_NULL_RSRC:%.*]] = icmp eq ptr addrspace(8) [[P_RSRC]], null
+; CHECK-NEXT:    [[IS_NULL_OFF:%.*]] = icmp eq i32 [[P_OFF]], 0
+; CHECK-NEXT:    [[IS_NULL:%.*]] = and i1 [[IS_NULL_RSRC]], [[IS_NULL_OFF]]
+; CHECK-NEXT:    ret i1 [[IS_NULL]]
+;
+  %is.null = icmp eq ptr addrspace(7) %p, addrspacecast (ptr null to ptr addrspace(7))
+  ret i1 %is.null
+}
+
 declare ptr addrspace(7) @llvm.amdgcn.make.buffer.rsrc.p7.p1(ptr addrspace(1), i16, i32, i32)
 
 define ptr addrspace(7) @make_buffer_rsrc(ptr addrspace(1) %buf, i16 %stride, i32 %numRecords, i32 %flags) {

llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll

llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp

shiltian · 2025-05-20T18:20:51Z

The reason we have addrspacecast (ptr null to ptr addrspace(7)) thingy is ConstantPointerNull doesn't really mean a nullptr; otherwise this could be folded to just ptr addrspace(7) null and then everything just pans out. I'm working on changing the semantics of ConstantPointerNull but current run into multiple issues. Hopefully I can get it through. :-D

For this particular case, a ptr addrspace(7) null does mean a nullptr though.

github-actions · 2025-05-20T18:29:32Z

⚠️ undef deprecator found issues in your code. ⚠️

You can test this locally with the following command:

git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 'HEAD~1' HEAD llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll

The following files introduce new uses of undef:

llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll

Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields undef. You should use poison values for placeholders instead.

In tests, avoid using undef and having tests that trigger undefined behavior. If you need an operand with some unimportant value, you can add a new argument to the function and use that instead.

For example, this is considered a bad practice:

define void @fn() {
  ...
  br i1 undef, ...
}

Please use the following instead:

define void @fn(i1 %cond) {
  ...
  br i1 %cond, ...
}

Please refer to the Undefined Behavior Manual for more information.

krzysz00 · 2025-05-20T23:12:53Z

Landing over the objections of the undef lint since this is a test that actually handles undef

This PR mainly add two changes relevant to ukernel support for argmax op. **Change 1: BF16 support** this PR adds the uKernels iree_uk_amdgpu_argmax_bf16i64 and iree_uk_amdgpu_argmax_bf16i32. These implementations are adapted from the existing float versions, as low-level bf16 support typically involves converting to float32 for computation. **Change 2: Support Returning the Maximum Value** Previously, the compiler restricted argmax lowering to pure-index-only cases using the following check: ```C++ // If max value is being used, it is not a pure argmax. if (!genericOp.getResults()[0].use_empty()) { return false; } ``` This check has been removed to enable the microkernel to support both: - Returning only the index (pure argmax) - Returning both the maximum value and its index To support this at the ukernel level, I added a writeValue boolean flag to control whether the value output should be written. Once this LLVM PR is integrated: llvm/llvm-project#140775, I plan to further simplify the implementation by checking whether outputBufferVal is nullptr instead of relying on an explicit flag. The PR also includes a corresponding test. In addition, I performed manual correctness checks to validate the behavior, I put my scripts here: https://github.com/bangtianliu/work-scripts/tree/master/argmax. Issue: #20650 --------- Signed-off-by: Bangtian Liu <[email protected]>

This PR mainly add two changes relevant to ukernel support for argmax op. **Change 1: BF16 support** this PR adds the uKernels iree_uk_amdgpu_argmax_bf16i64 and iree_uk_amdgpu_argmax_bf16i32. These implementations are adapted from the existing float versions, as low-level bf16 support typically involves converting to float32 for computation. **Change 2: Support Returning the Maximum Value** Previously, the compiler restricted argmax lowering to pure-index-only cases using the following check: ```C++ // If max value is being used, it is not a pure argmax. if (!genericOp.getResults()[0].use_empty()) { return false; } ``` This check has been removed to enable the microkernel to support both: - Returning only the index (pure argmax) - Returning both the maximum value and its index To support this at the ukernel level, I added a writeValue boolean flag to control whether the value output should be written. Once this LLVM PR is integrated: llvm/llvm-project#140775, I plan to further simplify the implementation by checking whether outputBufferVal is nullptr instead of relying on an explicit flag. The PR also includes a corresponding test. In addition, I performed manual correctness checks to validate the behavior, I put my scripts here: https://github.com/bangtianliu/work-scripts/tree/master/argmax. Issue: iree-org#20650 --------- Signed-off-by: Bangtian Liu <[email protected]> Signed-off-by: Thomas Ziereis <[email protected]>

llvmbot added the backend:AMDGPU label May 20, 2025

krzysz00 requested a review from arsenm May 20, 2025 18:12

krzysz00 requested review from lialan and shiltian May 20, 2025 18:12

arsenm reviewed May 20, 2025

View reviewed changes

llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll Show resolved Hide resolved

arsenm reviewed May 20, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp Outdated Show resolved Hide resolved

Undef and poison

68dac67

arsenm approved these changes May 20, 2025

View reviewed changes

lialan approved these changes May 20, 2025

View reviewed changes

bangtianliu mentioned this pull request May 20, 2025

[Codegen] Add ukernel support for argmax on BF16 and enable optional max value return iree-org/iree#20768

Merged

krzysz00 merged commit 6b9da28 into llvm:main May 20, 2025
10 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU][LowerBufferFatPointers] Handle addrspacecast null to p7 #140775

[AMDGPU][LowerBufferFatPointers] Handle addrspacecast null to p7 #140775

Uh oh!

krzysz00 commented May 20, 2025

Uh oh!

llvmbot commented May 20, 2025

Uh oh!

Uh oh!

Uh oh!

shiltian commented May 20, 2025

Uh oh!

github-actions bot commented May 20, 2025

Uh oh!

krzysz00 commented May 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[AMDGPU][LowerBufferFatPointers] Handle addrspacecast null to p7 #140775

[AMDGPU][LowerBufferFatPointers] Handle addrspacecast null to p7 #140775

Uh oh!

Conversation

krzysz00 commented May 20, 2025

Uh oh!

llvmbot commented May 20, 2025

Uh oh!

Uh oh!

Uh oh!

shiltian commented May 20, 2025

Uh oh!

github-actions bot commented May 20, 2025

Uh oh!

krzysz00 commented May 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants