-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[AMDGPU][Attributor] Fix the merge of two range states of flat workgroup size #125889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU][Attributor] Fix the merge of two range states of flat workgroup size #125889
Conversation
…oup size This PR updates the logic for merging two flat workgroup size pairs. The current implementation, which uses `clampStateAndIndicateChange` to compute a union, is not ideal. If we think from ensure proper resource allocation perspective, for instance, if one pair specifies a minimum of 32 flat workgroup size, and another specifies a minimum of 64, we should guarantee that 64 flat workgroup size can be supported, as failing to do so could result in excessive resource allocation. A similar principle applies to the upper bound. Thus, the PR uses the following approach for merging two pairs, `lo_a,up_a and lo_b,up_b: max(lo_a, lo_b), max(up_a, up_b)`. This ensures that resource allocation adheres to the stricter constraints from both inputs. This PR only updates the merge for `AAAMDFlatWorkGroupSize`. `AAAMDWavesPerEU` will be rewritten in a follow-up.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
@llvm/pr-subscribers-backend-amdgpu Author: Shilei Tian (shiltian) ChangesThis PR updates the logic for merging two flat workgroup size pairs. The current
This ensures that resource allocation adheres to the stricter constraints from This PR only updates the merge for Full diff: https://github.com/llvm/llvm-project/pull/125889.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index 546db318c17d53..16be7d50127083 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -799,7 +799,31 @@ AAAMDAttributes &AAAMDAttributes::createForPosition(const IRPosition &IRP,
llvm_unreachable("AAAMDAttributes is only valid for function position");
}
-/// Base class to derive different size ranges.
+/// Merge two range states and indicate changes. The range can apply to
+/// flat-workgroup-size and waves-per-eu attributes. The growth/merge of these
+/// two attributes is neither a union nor an intersection. Instead, it follows
+/// the rule below:
+///
+/// Lower = max(S.Lower, R.Lower)
+/// Upper = max(S.Upper, R.Upper)
+ChangeStatus mergeRangeStateAndIndicateChange(IntegerRangeState &S,
+ const IntegerRangeState &R) {
+ auto Assumed = S.getAssumed();
+ unsigned Min = R.getAssumed().getLower().getZExtValue();
+ unsigned Max = R.getAssumed().getUpper().getZExtValue();
+ Min = std::max(Min, static_cast<unsigned>(Assumed.getLower().getZExtValue()));
+ Max = std::max(Max, static_cast<unsigned>(Assumed.getUpper().getZExtValue()));
+ ConstantRange Range(APInt(32, Min), APInt(32, Max));
+ IntegerRangeState RangeState(Range);
+ S = RangeState;
+ return S == Assumed ? ChangeStatus::UNCHANGED : ChangeStatus::CHANGED;
+}
+
+/// Base class for deriving different size ranges. This size range is
+/// specifically intended for flat-workgroup-size and waves-per-eu attributes.
+/// Therefore, updating the attribute should use
+/// \p mergeRangeStateAndIndicateChange. Given this, the attribute may not be
+/// applicable to other potential size ranges in the future.
struct AAAMDSizeRangeAttribute
: public StateWrapper<IntegerRangeState, AbstractAttribute, uint32_t> {
using Base = StateWrapper<IntegerRangeState, AbstractAttribute, uint32_t>;
@@ -826,8 +850,8 @@ struct AAAMDSizeRangeAttribute
if (!CallerInfo || !CallerInfo->isValidState())
return false;
- Change |=
- clampStateAndIndicateChange(this->getState(), CallerInfo->getState());
+ Change |= mergeRangeStateAndIndicateChange(this->getState(),
+ CallerInfo->getState());
return true;
};
@@ -903,15 +927,14 @@ struct AAAMDFlatWorkGroupSize : public AAAMDSizeRangeAttribute {
}
}
- // We don't want to directly clamp the state if it's the max range because
+ // We don't want to directly merge the state if it's the max range because
// that is basically the worst state.
if (Range == MaxRange)
return;
auto [Min, Max] = Range;
ConstantRange CR(APInt(32, Min), APInt(32, Max + 1));
- IntegerRangeState IRS(CR);
- clampStateAndIndicateChange(this->getState(), IRS);
+ this->getState() = IntegerRangeState(CR);
if (HasAttr || AMDGPU::isEntryFunctionCC(F->getCallingConv()))
indicateOptimisticFixpoint();
diff --git a/llvm/test/CodeGen/AMDGPU/propagate-flat-work-group-size.ll b/llvm/test/CodeGen/AMDGPU/propagate-flat-work-group-size.ll
index 1afd31c6d45e7d..0a3121f9de1676 100644
--- a/llvm/test/CodeGen/AMDGPU/propagate-flat-work-group-size.ll
+++ b/llvm/test/CodeGen/AMDGPU/propagate-flat-work-group-size.ll
@@ -86,7 +86,7 @@ define internal void @flat_group_512_1024() #4 {
define amdgpu_kernel void @kernel_128_512() #5 {
; CHECK-LABEL: define {{[^@]+}}@kernel_128_512
-; CHECK-SAME: () #[[ATTR2]] {
+; CHECK-SAME: () #[[ATTR6:[0-9]+]] {
; CHECK-NEXT: call void @default_to_128_512()
; CHECK-NEXT: call void @flat_group_64_64()
; CHECK-NEXT: ret void
@@ -98,7 +98,7 @@ define amdgpu_kernel void @kernel_128_512() #5 {
define amdgpu_kernel void @kernel_512_512() #6 {
; CHECK-LABEL: define {{[^@]+}}@kernel_512_512
-; CHECK-SAME: () #[[ATTR6:[0-9]+]] {
+; CHECK-SAME: () #[[ATTR2]] {
; CHECK-NEXT: call void @default_to_128_512()
; CHECK-NEXT: call void @flat_group_512_1024()
; CHECK-NEXT: ret void
@@ -111,7 +111,7 @@ define amdgpu_kernel void @kernel_512_512() #6 {
; Called from kernels with 128,256 and 64,128 => 64,256
define internal void @default_to_64_256() {
; CHECK-LABEL: define {{[^@]+}}@default_to_64_256
-; CHECK-SAME: () #[[ATTR7:[0-9]+]] {
+; CHECK-SAME: () #[[ATTR4]] {
; CHECK-NEXT: ret void
;
ret void
@@ -153,7 +153,7 @@ define internal void @merge_cycle_1() #3 {
define amdgpu_kernel void @kernel_64_256() #7 {
; CHECK-LABEL: define {{[^@]+}}@kernel_64_256
-; CHECK-SAME: () #[[ATTR7]] {
+; CHECK-SAME: () #[[ATTR7:[0-9]+]] {
; CHECK-NEXT: call void @merge_cycle_0()
; CHECK-NEXT: call void @default_captured_address()
; CHECK-NEXT: call void @externally_visible_default()
@@ -205,11 +205,11 @@ attributes #7 = { "amdgpu-flat-work-group-size"="64,256" }
;.
; CHECK: attributes #[[ATTR0]] = { "amdgpu-flat-work-group-size"="1,256" "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
; CHECK: attributes #[[ATTR1]] = { "amdgpu-flat-work-group-size"="64,128" "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR2]] = { "amdgpu-flat-work-group-size"="128,512" "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="2,10" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR2]] = { "amdgpu-flat-work-group-size"="512,512" "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="2,10" "uniform-work-group-size"="false" }
; CHECK: attributes #[[ATTR3]] = { "amdgpu-flat-work-group-size"="64,64" "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
; CHECK: attributes #[[ATTR4]] = { "amdgpu-flat-work-group-size"="128,256" "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
; CHECK: attributes #[[ATTR5]] = { "amdgpu-flat-work-group-size"="512,1024" "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR6]] = { "amdgpu-flat-work-group-size"="512,512" "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="2,10" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR6]] = { "amdgpu-flat-work-group-size"="128,512" "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="2,10" "uniform-work-group-size"="false" }
; CHECK: attributes #[[ATTR7]] = { "amdgpu-flat-work-group-size"="64,256" "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
; CHECK: attributes #[[ATTR8]] = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
;.
|
| ret void | ||
| } | ||
|
|
||
| ; Called from kernels with 128,256 and 64,128 => 64,256 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should also update the comments
|
This does not sound correct. You can't just start executing functions with a higher minimum group size than the callers. This is not just about resourcing, but will break known bits optimizations on the work item IDs |
|
If we have a function
Can you explain more? |
|

This PR updates the logic for merging two flat workgroup size pairs. The current
implementation, which uses
clampStateAndIndicateChangeto compute a union, isnot ideal. If we think from ensure proper resource allocation perspective, for
instance, if one pair specifies a minimum of 32 flat workgroup size, and another
specifies a minimum of 64, we should guarantee that 64 flat workgroup size can
be supported, as failing to do so could result in excessive resource allocation.
A similar principle applies to the upper bound. Thus, the PR uses the following
approach for merging two pairs,
lo_a,up_a and lo_b,up_b: max(lo_a, lo_b), max(up_a, up_b).This ensures that resource allocation adheres to the stricter constraints from
both inputs.
This PR only updates the merge for
AAAMDFlatWorkGroupSize.AAAMDWavesPerEUwill be rewritten in a follow-up.