-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[AMDGPU] Tests for unnecessary S_WAIT_XCNT insertion #145688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-backend-amdgpu Author: Jay Foad (jayfoad) ChangesFull diff: https://github.com/llvm/llvm-project/pull/145688.diff 1 Files Affected:
diff --git a/llvm/test/CodeGen/AMDGPU/wait-xcnt.mir b/llvm/test/CodeGen/AMDGPU/wait-xcnt.mir
index 73b994ab2ab8c..884f14094695d 100644
--- a/llvm/test/CodeGen/AMDGPU/wait-xcnt.mir
+++ b/llvm/test/CodeGen/AMDGPU/wait-xcnt.mir
@@ -920,3 +920,45 @@ body: |
GLOBAL_STORE_DWORD $vgpr0_vgpr1, $vgpr3, 0, 0, implicit $exec
$vgpr2 = V_MOV_B32_e32 1, implicit $exec
...
+
+# TODO: Unnecessary wait before overwriting vgpr0.
+---
+name: overwrite_vgpr_after_smem
+tracksRegLiveness: true
+machineFunctionInfo:
+ isEntryFunction: true
+body: |
+ bb.0:
+ liveins: $vgpr0_vgpr1, $sgpr0_sgpr1
+ ; GCN-LABEL: name: overwrite_vgpr_after_smem
+ ; GCN: liveins: $vgpr0_vgpr1, $sgpr0_sgpr1
+ ; GCN-NEXT: {{ $}}
+ ; GCN-NEXT: $vgpr2 = GLOBAL_LOAD_DWORD $vgpr0_vgpr1, 0, 0, implicit $exec
+ ; GCN-NEXT: $sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 0, 0
+ ; GCN-NEXT: S_WAIT_XCNT 0
+ ; GCN-NEXT: $vgpr0 = V_MOV_B32_e32 0, implicit $exec
+ $vgpr2 = GLOBAL_LOAD_DWORD $vgpr0_vgpr1, 0, 0, implicit $exec
+ $sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 0, 0
+ $vgpr0 = V_MOV_B32_e32 0, implicit $exec
+...
+
+# TODO: Unnecessary wait before overwriting sgpr0.
+---
+name: overwrite_sgpr_after_vmem
+tracksRegLiveness: true
+machineFunctionInfo:
+ isEntryFunction: true
+body: |
+ bb.0:
+ liveins: $vgpr0_vgpr1, $sgpr0_sgpr1
+ ; GCN-LABEL: name: overwrite_sgpr_after_vmem
+ ; GCN: liveins: $vgpr0_vgpr1, $sgpr0_sgpr1
+ ; GCN-NEXT: {{ $}}
+ ; GCN-NEXT: $sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 0, 0
+ ; GCN-NEXT: $vgpr2 = GLOBAL_LOAD_DWORD $vgpr0_vgpr1, 0, 0, implicit $exec
+ ; GCN-NEXT: S_WAIT_XCNT 0
+ ; GCN-NEXT: $sgpr0 = S_MOV_B32 0
+ $sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 0, 0
+ $vgpr2 = GLOBAL_LOAD_DWORD $vgpr0_vgpr1, 0, 0, implicit $exec
+ $sgpr0 = S_MOV_B32 0
+...
|
| ; GCN-NEXT: {{ $}} | ||
| ; GCN-NEXT: $vgpr2 = GLOBAL_LOAD_DWORD $vgpr0_vgpr1, 0, 0, implicit $exec | ||
| ; GCN-NEXT: $sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 0, 0 | ||
| ; GCN-NEXT: S_WAIT_XCNT 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it unnecessary? Because of the S_LOAD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no dependence afterwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it unnecessary?
Yeah, I should have explained: hardware does an implicit "S_WAIT_XCNT 0" between SMEM and VMEM instructions, so you never have outstanding address translations for both SMEM and VMEM at the same time.
| ; GCN-NEXT: {{ $}} | ||
| ; GCN-NEXT: $sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 0, 0 | ||
| ; GCN-NEXT: $vgpr2 = GLOBAL_LOAD_DWORD $vgpr0_vgpr1, 0, 0, implicit $exec | ||
| ; GCN-NEXT: S_WAIT_XCNT 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, why is it not needed?
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/138/builds/16485 Here is the relevant piece of the build log for the reference |
Hardware does an implicit "S_WAIT_XCNT 0" between SMEM and VMEM instructions, so there will never be outstanding address translations for both SMEM and VMEM at the same time.
Hardware does an implicit "S_WAIT_XCNT 0" between SMEM and VMEM instructions, so there will never be outstanding address translations for both SMEM and VMEM at the same time.