Skip to content

Conversation

@krzysz00
Copy link
Contributor

Some application code operating on generic pointers (that then gete initialized to buffer fat pointers) may perform tests against nullptr. After address space inference, this results in comparisons against addrspacecast (ptr null to ptr addrspace(7)), which were crashing.

However, while general casts to ptr addrspace(7) from generic pointers aren't supposted, it is possible to cast null pointers to the all-zerose bufer resource and 0 offset, which this patch adds.

It also adds a TODO for casting out of buffer resources, which isn't implemented here but could be.

Some application code operating on generic pointers (that then gete
initialized to buffer fat pointers) may perform tests against nullptr.
After address space inference, this results in comparisons against
`addrspacecast (ptr null to ptr addrspace(7))`, which were crashing.

However, while general casts to ptr addrspace(7) from generic pointers
aren't supposted, it is possible to cast null pointers to the
all-zerose bufer resource and 0 offset, which this patch adds.

It also adds a TODO for casting _out_ of buffer resources, which isn't
implemented here but could be.
@krzysz00 krzysz00 requested a review from arsenm May 20, 2025 18:12
@llvmbot
Copy link
Member

llvmbot commented May 20, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Krzysztof Drewniak (krzysz00)

Changes

Some application code operating on generic pointers (that then gete initialized to buffer fat pointers) may perform tests against nullptr. After address space inference, this results in comparisons against addrspacecast (ptr null to ptr addrspace(7)), which were crashing.

However, while general casts to ptr addrspace(7) from generic pointers aren't supposted, it is possible to cast null pointers to the all-zerose bufer resource and 0 offset, which this patch adds.

It also adds a TODO for casting out of buffer resources, which isn't implemented here but could be.


Full diff: https://github.com/llvm/llvm-project/pull/140775.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp (+19-4)
  • (modified) llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll (+32)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp b/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
index 275b3a6f16e54..fe873508f46d2 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
@@ -1980,6 +1980,8 @@ PtrParts SplitPtrStructs::visitIntToPtrInst(IntToPtrInst &IP) {
 }
 
 PtrParts SplitPtrStructs::visitAddrSpaceCastInst(AddrSpaceCastInst &I) {
+  // TODO(krzysz00): handle casts from ptr addrspace(7) to global pointers
+  // by computing the effective address.
   if (!isSplitFatPtr(I.getType()))
     return {nullptr, nullptr};
   IRB.SetInsertPoint(&I);
@@ -1990,11 +1992,24 @@ PtrParts SplitPtrStructs::visitAddrSpaceCastInst(AddrSpaceCastInst &I) {
     SplitUsers.insert(&I);
     return {Rsrc, Off};
   }
-  if (I.getSrcAddressSpace() != AMDGPUAS::BUFFER_RESOURCE)
-    report_fatal_error("Only buffer resources (addrspace 8) can be cast to "
-                       "buffer fat pointers (addrspace 7)");
-  Type *OffTy = cast<StructType>(I.getType())->getElementType(1);
+
+  auto *ResTy = cast<StructType>(I.getType());
+  Type *RsrcTy = ResTy->getElementType(0);
+  Type *OffTy = ResTy->getElementType(1);
   Value *ZeroOff = Constant::getNullValue(OffTy);
+
+  // Special case for null pointers, which can be created by address space
+  // propagation.
+  auto *InConst = dyn_cast<Constant>(In);
+  if (InConst && InConst->isNullValue()) {
+    Value *NullRsrc = Constant::getNullValue(RsrcTy);
+    SplitUsers.insert(&I);
+    return {NullRsrc, ZeroOff};
+  }
+
+  if (I.getSrcAddressSpace() != AMDGPUAS::BUFFER_RESOURCE)
+    report_fatal_error("Only buffer resources (addrspace 8) and null pointers "
+                       "can be cast to buffer fat pointers (addrspace 7)");
   SplitUsers.insert(&I);
   return {In, ZeroOff};
 }
diff --git a/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll b/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll
index fd5bf579cdeb2..cc5e2c833cc39 100644
--- a/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll
+++ b/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll
@@ -332,6 +332,38 @@ define <2 x ptr addrspace(7)> @addrspacecast_vec(<2 x ptr addrspace(8)> %buf) {
   ret <2 x ptr addrspace(7)> %ret
 }
 
+define ptr addrspace(7) @addrspacecast_null() {
+; CHECK-LABEL: define { ptr addrspace(8), i32 } @addrspacecast_null
+; CHECK-SAME: () #[[ATTR0]] {
+; CHECK-NEXT:    ret { ptr addrspace(8), i32 } zeroinitializer
+;
+  %ret = addrspacecast ptr null to ptr addrspace(7)
+  ret ptr addrspace(7) %ret
+}
+
+define <2 x ptr addrspace(7)> @addrspacecast_null_vec() {
+; CHECK-LABEL: define { <2 x ptr addrspace(8)>, <2 x i32> } @addrspacecast_null_vec
+; CHECK-SAME: () #[[ATTR0]] {
+; CHECK-NEXT:    ret { <2 x ptr addrspace(8)>, <2 x i32> } zeroinitializer
+;
+  %ret = addrspacecast <2 x ptr> zeroinitializer to <2 x ptr addrspace(7)>
+  ret <2 x ptr addrspace(7)> %ret
+}
+
+define i1 @test_null(ptr addrspace(7) %p) {
+; CHECK-LABEL: define i1 @test_null
+; CHECK-SAME: ({ ptr addrspace(8), i32 } [[P:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[P_RSRC:%.*]] = extractvalue { ptr addrspace(8), i32 } [[P]], 0
+; CHECK-NEXT:    [[P_OFF:%.*]] = extractvalue { ptr addrspace(8), i32 } [[P]], 1
+; CHECK-NEXT:    [[IS_NULL_RSRC:%.*]] = icmp eq ptr addrspace(8) [[P_RSRC]], null
+; CHECK-NEXT:    [[IS_NULL_OFF:%.*]] = icmp eq i32 [[P_OFF]], 0
+; CHECK-NEXT:    [[IS_NULL:%.*]] = and i1 [[IS_NULL_RSRC]], [[IS_NULL_OFF]]
+; CHECK-NEXT:    ret i1 [[IS_NULL]]
+;
+  %is.null = icmp eq ptr addrspace(7) %p, addrspacecast (ptr null to ptr addrspace(7))
+  ret i1 %is.null
+}
+
 declare ptr addrspace(7) @llvm.amdgcn.make.buffer.rsrc.p7.p1(ptr addrspace(1), i16, i32, i32)
 
 define ptr addrspace(7) @make_buffer_rsrc(ptr addrspace(1) %buf, i16 %stride, i32 %numRecords, i32 %flags) {

@krzysz00 krzysz00 requested review from lialan and shiltian May 20, 2025 18:12
@shiltian
Copy link
Contributor

The reason we have addrspacecast (ptr null to ptr addrspace(7)) thingy is ConstantPointerNull doesn't really mean a nullptr; otherwise this could be folded to just ptr addrspace(7) null and then everything just pans out. I'm working on changing the semantics of ConstantPointerNull but current run into multiple issues. Hopefully I can get it through. :-D

For this particular case, a ptr addrspace(7) null does mean a nullptr though.

@github-actions
Copy link

⚠️ undef deprecator found issues in your code. ⚠️

You can test this locally with the following command:
git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 'HEAD~1' HEAD llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll

The following files introduce new uses of undef:

  • llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
  • llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll

Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields undef. You should use poison values for placeholders instead.

In tests, avoid using undef and having tests that trigger undefined behavior. If you need an operand with some unimportant value, you can add a new argument to the function and use that instead.

For example, this is considered a bad practice:

define void @fn() {
  ...
  br i1 undef, ...
}

Please use the following instead:

define void @fn(i1 %cond) {
  ...
  br i1 %cond, ...
}

Please refer to the Undefined Behavior Manual for more information.

@krzysz00
Copy link
Contributor Author

Landing over the objections of the undef lint since this is a test that actually handles undef

@krzysz00 krzysz00 merged commit 6b9da28 into llvm:main May 20, 2025
10 of 11 checks passed
bangtianliu added a commit to iree-org/iree that referenced this pull request May 21, 2025
This PR mainly add two changes relevant to ukernel support for argmax
op.

**Change 1: BF16 support**
this PR adds the uKernels iree_uk_amdgpu_argmax_bf16i64 and
iree_uk_amdgpu_argmax_bf16i32. These implementations are adapted from
the existing float versions, as low-level bf16 support typically
involves converting to float32 for computation.

**Change 2:  Support Returning the Maximum Value**
Previously, the compiler restricted argmax lowering to pure-index-only
cases using the following check:
```C++
// If max value is being used, it is not a pure argmax.
if (!genericOp.getResults()[0].use_empty()) {
  return false;
}
```
This check has been removed to enable the microkernel to support both:
- Returning only the index (pure argmax)
- Returning both the maximum value and its index

To support this at the ukernel level, I added a writeValue boolean flag
to control whether the value output should be written.

Once this LLVM PR is integrated:
llvm/llvm-project#140775, I plan to further
simplify the implementation by checking whether outputBufferVal is
nullptr instead of relying on an explicit flag.

The PR also includes a corresponding test. In addition, I performed
manual correctness checks to validate the behavior, I put my scripts
here: https://github.com/bangtianliu/work-scripts/tree/master/argmax.

Issue: #20650

---------

Signed-off-by: Bangtian Liu <[email protected]>
ziereis pushed a commit to ziereis/iree that referenced this pull request May 22, 2025
This PR mainly add two changes relevant to ukernel support for argmax
op.

**Change 1: BF16 support**
this PR adds the uKernels iree_uk_amdgpu_argmax_bf16i64 and
iree_uk_amdgpu_argmax_bf16i32. These implementations are adapted from
the existing float versions, as low-level bf16 support typically
involves converting to float32 for computation.

**Change 2:  Support Returning the Maximum Value**
Previously, the compiler restricted argmax lowering to pure-index-only
cases using the following check:
```C++
// If max value is being used, it is not a pure argmax.
if (!genericOp.getResults()[0].use_empty()) {
  return false;
}
```
This check has been removed to enable the microkernel to support both:
- Returning only the index (pure argmax)
- Returning both the maximum value and its index

To support this at the ukernel level, I added a writeValue boolean flag
to control whether the value output should be written.

Once this LLVM PR is integrated:
llvm/llvm-project#140775, I plan to further
simplify the implementation by checking whether outputBufferVal is
nullptr instead of relying on an explicit flag.

The PR also includes a corresponding test. In addition, I performed
manual correctness checks to validate the behavior, I put my scripts
here: https://github.com/bangtianliu/work-scripts/tree/master/argmax.

Issue: iree-org#20650

---------

Signed-off-by: Bangtian Liu <[email protected]>
Signed-off-by: Thomas Ziereis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants