-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[AMDGPU][LowerBufferFatPointers] Handle addrspacecast null to p7 #140775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Some application code operating on generic pointers (that then gete initialized to buffer fat pointers) may perform tests against nullptr. After address space inference, this results in comparisons against `addrspacecast (ptr null to ptr addrspace(7))`, which were crashing. However, while general casts to ptr addrspace(7) from generic pointers aren't supposted, it is possible to cast null pointers to the all-zerose bufer resource and 0 offset, which this patch adds. It also adds a TODO for casting _out_ of buffer resources, which isn't implemented here but could be.
|
@llvm/pr-subscribers-backend-amdgpu Author: Krzysztof Drewniak (krzysz00) ChangesSome application code operating on generic pointers (that then gete initialized to buffer fat pointers) may perform tests against nullptr. After address space inference, this results in comparisons against However, while general casts to ptr addrspace(7) from generic pointers aren't supposted, it is possible to cast null pointers to the all-zerose bufer resource and 0 offset, which this patch adds. It also adds a TODO for casting out of buffer resources, which isn't implemented here but could be. Full diff: https://github.com/llvm/llvm-project/pull/140775.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp b/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
index 275b3a6f16e54..fe873508f46d2 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
@@ -1980,6 +1980,8 @@ PtrParts SplitPtrStructs::visitIntToPtrInst(IntToPtrInst &IP) {
}
PtrParts SplitPtrStructs::visitAddrSpaceCastInst(AddrSpaceCastInst &I) {
+ // TODO(krzysz00): handle casts from ptr addrspace(7) to global pointers
+ // by computing the effective address.
if (!isSplitFatPtr(I.getType()))
return {nullptr, nullptr};
IRB.SetInsertPoint(&I);
@@ -1990,11 +1992,24 @@ PtrParts SplitPtrStructs::visitAddrSpaceCastInst(AddrSpaceCastInst &I) {
SplitUsers.insert(&I);
return {Rsrc, Off};
}
- if (I.getSrcAddressSpace() != AMDGPUAS::BUFFER_RESOURCE)
- report_fatal_error("Only buffer resources (addrspace 8) can be cast to "
- "buffer fat pointers (addrspace 7)");
- Type *OffTy = cast<StructType>(I.getType())->getElementType(1);
+
+ auto *ResTy = cast<StructType>(I.getType());
+ Type *RsrcTy = ResTy->getElementType(0);
+ Type *OffTy = ResTy->getElementType(1);
Value *ZeroOff = Constant::getNullValue(OffTy);
+
+ // Special case for null pointers, which can be created by address space
+ // propagation.
+ auto *InConst = dyn_cast<Constant>(In);
+ if (InConst && InConst->isNullValue()) {
+ Value *NullRsrc = Constant::getNullValue(RsrcTy);
+ SplitUsers.insert(&I);
+ return {NullRsrc, ZeroOff};
+ }
+
+ if (I.getSrcAddressSpace() != AMDGPUAS::BUFFER_RESOURCE)
+ report_fatal_error("Only buffer resources (addrspace 8) and null pointers "
+ "can be cast to buffer fat pointers (addrspace 7)");
SplitUsers.insert(&I);
return {In, ZeroOff};
}
diff --git a/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll b/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll
index fd5bf579cdeb2..cc5e2c833cc39 100644
--- a/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll
+++ b/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll
@@ -332,6 +332,38 @@ define <2 x ptr addrspace(7)> @addrspacecast_vec(<2 x ptr addrspace(8)> %buf) {
ret <2 x ptr addrspace(7)> %ret
}
+define ptr addrspace(7) @addrspacecast_null() {
+; CHECK-LABEL: define { ptr addrspace(8), i32 } @addrspacecast_null
+; CHECK-SAME: () #[[ATTR0]] {
+; CHECK-NEXT: ret { ptr addrspace(8), i32 } zeroinitializer
+;
+ %ret = addrspacecast ptr null to ptr addrspace(7)
+ ret ptr addrspace(7) %ret
+}
+
+define <2 x ptr addrspace(7)> @addrspacecast_null_vec() {
+; CHECK-LABEL: define { <2 x ptr addrspace(8)>, <2 x i32> } @addrspacecast_null_vec
+; CHECK-SAME: () #[[ATTR0]] {
+; CHECK-NEXT: ret { <2 x ptr addrspace(8)>, <2 x i32> } zeroinitializer
+;
+ %ret = addrspacecast <2 x ptr> zeroinitializer to <2 x ptr addrspace(7)>
+ ret <2 x ptr addrspace(7)> %ret
+}
+
+define i1 @test_null(ptr addrspace(7) %p) {
+; CHECK-LABEL: define i1 @test_null
+; CHECK-SAME: ({ ptr addrspace(8), i32 } [[P:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[P_RSRC:%.*]] = extractvalue { ptr addrspace(8), i32 } [[P]], 0
+; CHECK-NEXT: [[P_OFF:%.*]] = extractvalue { ptr addrspace(8), i32 } [[P]], 1
+; CHECK-NEXT: [[IS_NULL_RSRC:%.*]] = icmp eq ptr addrspace(8) [[P_RSRC]], null
+; CHECK-NEXT: [[IS_NULL_OFF:%.*]] = icmp eq i32 [[P_OFF]], 0
+; CHECK-NEXT: [[IS_NULL:%.*]] = and i1 [[IS_NULL_RSRC]], [[IS_NULL_OFF]]
+; CHECK-NEXT: ret i1 [[IS_NULL]]
+;
+ %is.null = icmp eq ptr addrspace(7) %p, addrspacecast (ptr null to ptr addrspace(7))
+ ret i1 %is.null
+}
+
declare ptr addrspace(7) @llvm.amdgcn.make.buffer.rsrc.p7.p1(ptr addrspace(1), i16, i32, i32)
define ptr addrspace(7) @make_buffer_rsrc(ptr addrspace(1) %buf, i16 %stride, i32 %numRecords, i32 %flags) {
|
|
The reason we have For this particular case, a |
You can test this locally with the following command:git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 'HEAD~1' HEAD llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.llThe following files introduce new uses of undef:
Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields In tests, avoid using For example, this is considered a bad practice: define void @fn() {
...
br i1 undef, ...
}Please use the following instead: define void @fn(i1 %cond) {
...
br i1 %cond, ...
}Please refer to the Undefined Behavior Manual for more information. |
|
Landing over the objections of the undef lint since this is a test that actually handles undef |
This PR mainly add two changes relevant to ukernel support for argmax
op.
**Change 1: BF16 support**
this PR adds the uKernels iree_uk_amdgpu_argmax_bf16i64 and
iree_uk_amdgpu_argmax_bf16i32. These implementations are adapted from
the existing float versions, as low-level bf16 support typically
involves converting to float32 for computation.
**Change 2: Support Returning the Maximum Value**
Previously, the compiler restricted argmax lowering to pure-index-only
cases using the following check:
```C++
// If max value is being used, it is not a pure argmax.
if (!genericOp.getResults()[0].use_empty()) {
return false;
}
```
This check has been removed to enable the microkernel to support both:
- Returning only the index (pure argmax)
- Returning both the maximum value and its index
To support this at the ukernel level, I added a writeValue boolean flag
to control whether the value output should be written.
Once this LLVM PR is integrated:
llvm/llvm-project#140775, I plan to further
simplify the implementation by checking whether outputBufferVal is
nullptr instead of relying on an explicit flag.
The PR also includes a corresponding test. In addition, I performed
manual correctness checks to validate the behavior, I put my scripts
here: https://github.com/bangtianliu/work-scripts/tree/master/argmax.
Issue: #20650
---------
Signed-off-by: Bangtian Liu <[email protected]>
This PR mainly add two changes relevant to ukernel support for argmax
op.
**Change 1: BF16 support**
this PR adds the uKernels iree_uk_amdgpu_argmax_bf16i64 and
iree_uk_amdgpu_argmax_bf16i32. These implementations are adapted from
the existing float versions, as low-level bf16 support typically
involves converting to float32 for computation.
**Change 2: Support Returning the Maximum Value**
Previously, the compiler restricted argmax lowering to pure-index-only
cases using the following check:
```C++
// If max value is being used, it is not a pure argmax.
if (!genericOp.getResults()[0].use_empty()) {
return false;
}
```
This check has been removed to enable the microkernel to support both:
- Returning only the index (pure argmax)
- Returning both the maximum value and its index
To support this at the ukernel level, I added a writeValue boolean flag
to control whether the value output should be written.
Once this LLVM PR is integrated:
llvm/llvm-project#140775, I plan to further
simplify the implementation by checking whether outputBufferVal is
nullptr instead of relying on an explicit flag.
The PR also includes a corresponding test. In addition, I performed
manual correctness checks to validate the behavior, I put my scripts
here: https://github.com/bangtianliu/work-scripts/tree/master/argmax.
Issue: iree-org#20650
---------
Signed-off-by: Bangtian Liu <[email protected]>
Signed-off-by: Thomas Ziereis <[email protected]>
Some application code operating on generic pointers (that then gete initialized to buffer fat pointers) may perform tests against nullptr. After address space inference, this results in comparisons against
addrspacecast (ptr null to ptr addrspace(7)), which were crashing.However, while general casts to ptr addrspace(7) from generic pointers aren't supposted, it is possible to cast null pointers to the all-zerose bufer resource and 0 offset, which this patch adds.
It also adds a TODO for casting out of buffer resources, which isn't implemented here but could be.