-
Notifications
You must be signed in to change notification settings - Fork 15.3k
Address Codegen bug related to marking subregister MachineOperand defines as undef #134929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@llvm/pr-subscribers-llvm-regalloc @llvm/pr-subscribers-backend-amdgpu Author: Ryan Buchner (bababuck) ChangesFixes #130884. In the failing case, the following COPIES are moved such that they swap in order: However, the This causes the liveness verifier to flag this as an error. The proper result code should be: Notes:
@arsenm Full diff: https://github.com/llvm/llvm-project/pull/134929.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/MachineScheduler.cpp b/llvm/lib/CodeGen/MachineScheduler.cpp
index 5086ee8829b25..4999e5af7f1e4 100644
--- a/llvm/lib/CodeGen/MachineScheduler.cpp
+++ b/llvm/lib/CodeGen/MachineScheduler.cpp
@@ -3979,6 +3979,31 @@ void GenericScheduler::reschedulePhysReg(SUnit *SU, bool isTop) {
continue;
LLVM_DEBUG(dbgs() << " Rescheduling physreg copy ";
DAG->dumpNode(*Dep.getSUnit()));
+
+ // Check to make sure that there are no subreg defintions of the given
+ // register between it's new and old location that are marked as undef. If
+ // so, mark the current instruction as undef instead.
+ SmallVector<MachineOperand*, 1> SubregDefs;
+ for (MachineOperand& MO : Copy->operands()) {
+ if (MO.isReg() && MO.isDef() && MO.getSubReg() != 0) {
+ SubregDefs.push_back(&MO);
+ }
+ }
+ if (SubregDefs.size()) {
+ for (auto CurrInst = InsertPos; CurrInst != Copy; ++CurrInst) {
+ for (MachineOperand& MO : CurrInst->operands()) {
+ if (MO.isReg() && MO.isDef() && MO.isUndef() && MO.getSubReg() != 0) {
+ for (auto *MISubregDef : SubregDefs) {
+ if (MISubregDef->getReg() == MO.getReg()) {
+ assert(!MISubregDef->isUndef() && "Register defined as undef twice.");
+ MO.setIsUndef(false);
+ MISubregDef->setIsUndef(true);
+ }
+ }
+ }
+ }
+ }
+ }
DAG->moveInstruction(Copy, InsertPos);
}
}
diff --git a/llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll b/llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll
index 936118750ff10..71e0e5afa81a0 100644
--- a/llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll
+++ b/llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll
@@ -1,8 +1,7 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
-; FIXME: Fails expensive checks, should re-enable verifier, see issue #130884
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs=0 < %s | FileCheck -check-prefixes=GFX9,GFX900 %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs=0 < %s | FileCheck -check-prefixes=GFX9,GFX90APLUS,GFX90A %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -verify-machineinstrs=0 < %s | FileCheck -check-prefixes=GFX9,GFX90APLUS,GFX940 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck -check-prefixes=GFX9,GFX900 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a < %s | FileCheck -check-prefixes=GFX9,GFX90APLUS,GFX90A %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 < %s | FileCheck -check-prefixes=GFX9,GFX90APLUS,GFX940 %s
; Test that we can form v_pk_mov_b32 in certain shuffles when they
; originate from 32-bit physreg copy sequences.
@@ -735,10 +734,10 @@ define i32 @shufflevector_v4i32_3210_physreg_even_vgpr_quad_copy_other_use_elt(p
; GFX900-NEXT: ;;#ASMSTART
; GFX900-NEXT: ; def v4, v5, v6, v7
; GFX900-NEXT: ;;#ASMEND
-; GFX900-NEXT: v_mov_b32_e32 v9, v5
-; GFX900-NEXT: v_mov_b32_e32 v8, v6
-; GFX900-NEXT: v_mov_b32_e32 v10, v4
-; GFX900-NEXT: global_store_dwordx4 v0, v[7:10], s[16:17]
+; GFX900-NEXT: v_mov_b32_e32 v3, v5
+; GFX900-NEXT: v_mov_b32_e32 v2, v6
+; GFX900-NEXT: v_mov_b32_e32 v1, v7
+; GFX900-NEXT: global_store_dwordx4 v0, v[1:4], s[16:17]
; GFX900-NEXT: v_mov_b32_e32 v0, v6
; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]
|
7dbb3b0 to
3308dfe
Compare
|
Update: Fixed clang-format errors. |
|
The format check is flagging the use of Is this MR is okay with respect to the deprecation of |
arsenm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should get a dedicated MIR test
| } | ||
| } | ||
| if (SubregDefs.size()) { | ||
| for (auto CurrInst = InsertPos; CurrInst != Copy; ++CurrInst) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need to loop through all the instructions here. You should rely on the LiveIntervals to check if the lanes are undefined or not, after they've been updated in moveInstruction (although actually, maybe LiveIntervals::handleMove ought to be taking care of this in the first place)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the new functionality into LiveIntervals::handleMove(), the code now uses the LiveRanges to find the instruction to compare with. I pushed this new version but I'm not sure it's fully correct.
The code looks for the case like the following (pseudo-IR):
undef %0.sub0 = SOME_FUNCTION %1
...
%0.sub1 = SOME_FUNCTION %2
gets changed to (by some pass):
%0.sub1 = SOME_FUNCTION %2
undef %0.sub0 = SOME_FUNCTION %1
and the new functionality changes it to :
undef %0.sub1 = SOME_FUNCTION %2
undef %0.sub0 = SOME_FUNCTION %1
However, it doesn't handle a case like the following:
%0.sub0 = SOME_FUNCTION %1
...
undef %0.sub1 = SOME_FUNCTION %2
%0.sub2 = SOME_FUNCTION %3
getting changed too (by some pass):
%0.sub2 = SOME_FUNCTION %3
%0.sub0 = SOME_FUNCTION %1
...
undef %0.sub1 = SOME_FUNCTION %2
The reasoning is that the starting IR of that case is already erroneous, but I'm not confident in that analysis.
An alternative would be to look at every LiveInterval::Segment for an undef between the old and new location of the instruction.
In addition, this only handled the Up case, not the case where an undef instruction is moved Down in the program to after a non-undef definition of a sub-register. Would it be desired to MR to include handling of that second case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The starting MIR is correct, the new output with 2 undefs is wrong. When the first undef subregister def moves past the second, the undef flag needs to move to the first subregister def. They both should not end up with the undef flag
| if (MO.isReg() && MO.isDef() && MO.isUndef() && MO.getSubReg() != 0) { | ||
| for (auto *MISubregDef : SubregDefs) { | ||
| if (MISubregDef->getReg() == MO.getReg()) { | ||
| assert(!MISubregDef->isUndef() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be a hazard with overlapping subregister defs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen in some of the tests (notably the test that changed with my latest change) situations like this where:
undef %0.sub0 = SOME_FUNCTION %1
undef %0.sub1 = SOME_FUNCTION %2
Why is this considered legal/desired. My understanding is that without the undef the other sub-registers are considered un-modified. However, with the undef the other sub-registers are now undefined, so after this sequence %0.sub0 would be undefined. Wouldn't that violate SSA constraints since %0.sub0 is assigned twice, once with a value and secondly as undef.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subregister defs only exist post-SSA, where the scheduler runs
Yes |
3308dfe to
a6c19ae
Compare
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
a6c19ae to
e591af5
Compare
llvm/lib/CodeGen/LiveIntervals.cpp
Outdated
|
|
||
| HMEditor HME(*this, *MRI, *TRI, OldIndex, NewIndex, UpdateFlags); | ||
| HME.updateAllRanges(&MI); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure if handleMove should be responsible for updating the actual MI. updateAllRanges conservatively strips kill flags (which is kind of annoying), but seems to not touch the MIR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would moving it back to ScheduleDagMI::moveInstruction be better in your opionion? We have access to a LiveIntervals object there, so the same logic could be used.
llvm/lib/CodeGen/LiveIntervals.cpp
Outdated
| // Check to make sure that there are no subreg defintions marked undef | ||
| // after the moved operation. If so, mark the current instruction as undef | ||
| // instead. | ||
| for (MachineOperand &MO : MI.operands()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory the LiveIntervals are correct at this point, and the MIR is out of sync. It would probably be easier to express this in terms of operand-is-not-undef-but-not-live-at-this-point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this simplifies the resulting code any if I'm understanding correctly. We can add the following:
if (!LI.liveAt(Index.getNextSlot()))
MO.setIsUndef(true);
but this only works for flagging the moved instruction as undef, we still have to execute the same logic as before to remove the undef from the next definition of that register (if present).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did do some more digging and was able to better utilize the MachineInst functions to clean up some of the code.
I think that expressing this in terms of operand-is-not-undef-but-not-live-at-this-point does make sense because that is more similar to how the verifier works.
This reverts commit bf2d1c4.
If a subregister def is moved such that it is not live prior to that instruction, mark as Undef. Also remove any Undef flag from the next def of this register. Fixes llvm#130884.
Fixes #130884.
In the failing case, the following COPIES are moved such that they swap in order:
However, the
undefflag is improperly left of the 112B instrcution, resulting in:This causes the liveness verifier to flag this as an error.
The proper result code should be:
Notes:
An alternative solution would be to prevent the re-ordering in this situation.
I made the change inside
reschedulePhysReg(), but we could instead move the change up the call tree tomoveInstruction(), thoughts? I avoided doing that to limit the scope of this change.Passes lit and unit tests.
@arsenm