Skip to content

Conversation

@bababuck
Copy link
Contributor

@bababuck bababuck commented Apr 8, 2025

Fixes #130884.

In the failing case, the following COPIES are moved such that they swap in order:

  96B INLINEASM &"; def $0, $1" [attdialect], $0:[regdef], implicit-def $vgpr4, $1:[regdef], implicit-def $vgpr5
  112B undef %18.sub1:vreg_64 = COPY killed $vgpr4
  128B %18.sub0:vreg_64 = COPY killed $vgpr5

However, the undef flag is improperly left of the 112B instrcution, resulting in:

  96B INLINEASM &"; def $0, $1" [attdialect], $0:[regdef], implicit-def $vgpr4, $1:[regdef], implicit-def $vgpr5
  128B %18.sub0:vreg_64 = COPY killed $vgpr5
  112B undef %18.sub1:vreg_64 = COPY killed $vgpr4

This causes the liveness verifier to flag this as an error.

The proper result code should be:

  96B INLINEASM &"; def $0, $1" [attdialect], $0:[regdef], implicit-def $vgpr4, $1:[regdef], implicit-def $vgpr5
  128B undef %18.sub0:vreg_64 = COPY killed $vgpr5
  112B %18.sub1:vreg_64 = COPY killed $vgpr4

Notes:

  • An alternative solution would be to prevent the re-ordering in this situation.

  • I made the change inside reschedulePhysReg(), but we could instead move the change up the call tree to moveInstruction(), thoughts? I avoided doing that to limit the scope of this change.

  • Passes lit and unit tests.

@arsenm

@llvmbot
Copy link
Member

llvmbot commented Apr 8, 2025

@llvm/pr-subscribers-llvm-regalloc

@llvm/pr-subscribers-backend-amdgpu

Author: Ryan Buchner (bababuck)

Changes

Fixes #130884.

In the failing case, the following COPIES are moved such that they swap in order:

  96B INLINEASM &"; def $0, $1" [attdialect], $0:[regdef], implicit-def $vgpr4, $1:[regdef], implicit-def $vgpr5
  112B undef %18.sub1:vreg_64 = COPY killed $vgpr4
  128B %18.sub0:vreg_64 = COPY killed $vgpr5

However, the undef flag is improperly left of the 112B instrcution, resulting in:

  96B INLINEASM &"; def $0, $1" [attdialect], $0:[regdef], implicit-def $vgpr4, $1:[regdef], implicit-def $vgpr5
  128B %18.sub0:vreg_64 = COPY killed $vgpr5
  112B undef %18.sub1:vreg_64 = COPY killed $vgpr4

This causes the liveness verifier to flag this as an error.

The proper result code should be:

  96B INLINEASM &"; def $0, $1" [attdialect], $0:[regdef], implicit-def $vgpr4, $1:[regdef], implicit-def $vgpr5
  128B undef %18.sub0:vreg_64 = COPY killed $vgpr5
  112B %18.sub1:vreg_64 = COPY killed $vgpr4

Notes:

  • An alternative solution would be to prevent the re-ordering in this situation.

  • I made the change inside reschedulePhysReg(), but we could instead move the change up the call tree to moveInstruction(), thoughts? I avoided doing that to limit the scope of this change.

  • Passes lit and unit tests.

@arsenm


Full diff: https://github.com/llvm/llvm-project/pull/134929.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/MachineScheduler.cpp (+25)
  • (modified) llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll (+7-8)
diff --git a/llvm/lib/CodeGen/MachineScheduler.cpp b/llvm/lib/CodeGen/MachineScheduler.cpp
index 5086ee8829b25..4999e5af7f1e4 100644
--- a/llvm/lib/CodeGen/MachineScheduler.cpp
+++ b/llvm/lib/CodeGen/MachineScheduler.cpp
@@ -3979,6 +3979,31 @@ void GenericScheduler::reschedulePhysReg(SUnit *SU, bool isTop) {
       continue;
     LLVM_DEBUG(dbgs() << "  Rescheduling physreg copy ";
                DAG->dumpNode(*Dep.getSUnit()));
+
+    // Check to make sure that there are no subreg defintions of the given
+    // register between it's new and old location that are marked as undef. If
+    // so, mark the current instruction as undef instead.
+    SmallVector<MachineOperand*, 1> SubregDefs;
+    for (MachineOperand& MO : Copy->operands()) {
+      if (MO.isReg() && MO.isDef() && MO.getSubReg() != 0) {
+        SubregDefs.push_back(&MO);
+      }
+    }
+    if (SubregDefs.size()) {
+      for (auto CurrInst = InsertPos; CurrInst != Copy; ++CurrInst) {
+        for (MachineOperand& MO : CurrInst->operands()) {
+          if (MO.isReg() && MO.isDef() && MO.isUndef() && MO.getSubReg() != 0) {
+            for (auto *MISubregDef : SubregDefs) {
+              if (MISubregDef->getReg() == MO.getReg()) {
+                assert(!MISubregDef->isUndef() && "Register defined as undef twice.");
+                MO.setIsUndef(false);
+                MISubregDef->setIsUndef(true);
+              }
+            }
+          }
+        }
+      }
+    }
     DAG->moveInstruction(Copy, InsertPos);
   }
 }
diff --git a/llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll b/llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll
index 936118750ff10..71e0e5afa81a0 100644
--- a/llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll
+++ b/llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll
@@ -1,8 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
-; FIXME: Fails expensive checks, should re-enable verifier, see issue #130884
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs=0 < %s | FileCheck -check-prefixes=GFX9,GFX900 %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs=0 < %s | FileCheck -check-prefixes=GFX9,GFX90APLUS,GFX90A %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -verify-machineinstrs=0 < %s | FileCheck -check-prefixes=GFX9,GFX90APLUS,GFX940 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck -check-prefixes=GFX9,GFX900 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a < %s | FileCheck -check-prefixes=GFX9,GFX90APLUS,GFX90A %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 < %s | FileCheck -check-prefixes=GFX9,GFX90APLUS,GFX940 %s
 
 ; Test that we can form v_pk_mov_b32 in certain shuffles when they
 ; originate from 32-bit physreg copy sequences.
@@ -735,10 +734,10 @@ define i32 @shufflevector_v4i32_3210_physreg_even_vgpr_quad_copy_other_use_elt(p
 ; GFX900-NEXT:    ;;#ASMSTART
 ; GFX900-NEXT:    ; def v4, v5, v6, v7
 ; GFX900-NEXT:    ;;#ASMEND
-; GFX900-NEXT:    v_mov_b32_e32 v9, v5
-; GFX900-NEXT:    v_mov_b32_e32 v8, v6
-; GFX900-NEXT:    v_mov_b32_e32 v10, v4
-; GFX900-NEXT:    global_store_dwordx4 v0, v[7:10], s[16:17]
+; GFX900-NEXT:    v_mov_b32_e32 v3, v5
+; GFX900-NEXT:    v_mov_b32_e32 v2, v6
+; GFX900-NEXT:    v_mov_b32_e32 v1, v7
+; GFX900-NEXT:    global_store_dwordx4 v0, v[1:4], s[16:17]
 ; GFX900-NEXT:    v_mov_b32_e32 v0, v6
 ; GFX900-NEXT:    s_waitcnt vmcnt(0)
 ; GFX900-NEXT:    s_setpc_b64 s[30:31]

@bababuck
Copy link
Contributor Author

bababuck commented Apr 8, 2025

Update: Fixed clang-format errors.

@bababuck
Copy link
Contributor Author

bababuck commented Apr 8, 2025

The format check is flagging the use of undef in some of my comments.

Is this MR is okay with respect to the deprecation of undef?

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should get a dedicated MIR test

}
}
if (SubregDefs.size()) {
for (auto CurrInst = InsertPos; CurrInst != Copy; ++CurrInst) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to loop through all the instructions here. You should rely on the LiveIntervals to check if the lanes are undefined or not, after they've been updated in moveInstruction (although actually, maybe LiveIntervals::handleMove ought to be taking care of this in the first place)

Copy link
Contributor Author

@bababuck bababuck Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the new functionality into LiveIntervals::handleMove(), the code now uses the LiveRanges to find the instruction to compare with. I pushed this new version but I'm not sure it's fully correct.

The code looks for the case like the following (pseudo-IR):

undef %0.sub0 = SOME_FUNCTION %1
...
%0.sub1 = SOME_FUNCTION %2

gets changed to (by some pass):

%0.sub1 = SOME_FUNCTION %2
undef %0.sub0 = SOME_FUNCTION %1

and the new functionality changes it to :

undef %0.sub1 = SOME_FUNCTION %2
undef %0.sub0 = SOME_FUNCTION %1

However, it doesn't handle a case like the following:

%0.sub0 = SOME_FUNCTION %1
...
undef %0.sub1 = SOME_FUNCTION %2
%0.sub2 = SOME_FUNCTION %3

getting changed too (by some pass):

%0.sub2 = SOME_FUNCTION %3
%0.sub0 = SOME_FUNCTION %1
...
undef %0.sub1 = SOME_FUNCTION %2

The reasoning is that the starting IR of that case is already erroneous, but I'm not confident in that analysis.

An alternative would be to look at every LiveInterval::Segment for an undef between the old and new location of the instruction.

In addition, this only handled the Up case, not the case where an undef instruction is moved Down in the program to after a non-undef definition of a sub-register. Would it be desired to MR to include handling of that second case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The starting MIR is correct, the new output with 2 undefs is wrong. When the first undef subregister def moves past the second, the undef flag needs to move to the first subregister def. They both should not end up with the undef flag

if (MO.isReg() && MO.isDef() && MO.isUndef() && MO.getSubReg() != 0) {
for (auto *MISubregDef : SubregDefs) {
if (MISubregDef->getReg() == MO.getReg()) {
assert(!MISubregDef->isUndef() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a hazard with overlapping subregister defs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen in some of the tests (notably the test that changed with my latest change) situations like this where:

undef %0.sub0 = SOME_FUNCTION %1
undef %0.sub1 = SOME_FUNCTION %2

Why is this considered legal/desired. My understanding is that without the undef the other sub-registers are considered un-modified. However, with the undef the other sub-registers are now undefined, so after this sequence %0.sub0 would be undefined. Wouldn't that violate SSA constraints since %0.sub0 is assigned twice, once with a value and secondly as undef.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subregister defs only exist post-SSA, where the scheduler runs

@arsenm
Copy link
Contributor

arsenm commented Apr 13, 2025

Is this MR is okay with respect to the deprecation of undef?

Yes

@github-actions
Copy link

github-actions bot commented Apr 15, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@bababuck bababuck requested a review from arsenm April 30, 2025 18:03

HMEditor HME(*this, *MRI, *TRI, OldIndex, NewIndex, UpdateFlags);
HME.updateAllRanges(&MI);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure if handleMove should be responsible for updating the actual MI. updateAllRanges conservatively strips kill flags (which is kind of annoying), but seems to not touch the MIR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would moving it back to ScheduleDagMI::moveInstruction be better in your opionion? We have access to a LiveIntervals object there, so the same logic could be used.

// Check to make sure that there are no subreg defintions marked undef
// after the moved operation. If so, mark the current instruction as undef
// instead.
for (MachineOperand &MO : MI.operands()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory the LiveIntervals are correct at this point, and the MIR is out of sync. It would probably be easier to express this in terms of operand-is-not-undef-but-not-live-at-this-point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this simplifies the resulting code any if I'm understanding correctly. We can add the following:

if (!LI.liveAt(Index.getNextSlot()))
   MO.setIsUndef(true);

but this only works for flagging the moved instruction as undef, we still have to execute the same logic as before to remove the undef from the next definition of that register (if present).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did do some more digging and was able to better utilize the MachineInst functions to clean up some of the code.

I think that expressing this in terms of operand-is-not-undef-but-not-live-at-this-point does make sense because that is more similar to how the verifier works.

bababuck added 3 commits May 1, 2025 14:38
If a subregister def is moved such that it is not live prior to
that instruction, mark as Undef. Also remove any Undef flag from the next
def of this register.

Fixes llvm#130884.
@bababuck bababuck force-pushed the rbuchner/130884 branch from e591af5 to cf5b0f7 Compare May 6, 2025 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll fails machine verifier after scheduling

3 participants