- 
                Notifications
    You must be signed in to change notification settings 
- Fork 15k
[CodeGen] Fix handling dead redefs in finalizeBundle #157427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
A dead redefinition should override any earlier non-dead definition inside a bundle. Also remove KilledDefSet since it can be folded into DeadDefSet.
| @llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-backend-aarch64 Author: Jay Foad (jayfoad) ChangesA dead redefinition should override any earlier non-dead definition Also remove KilledDefSet since it can be folded into DeadDefSet. Full diff: https://github.com/llvm/llvm-project/pull/157427.diff 3 Files Affected: 
 diff --git a/llvm/lib/CodeGen/MachineInstrBundle.cpp b/llvm/lib/CodeGen/MachineInstrBundle.cpp
index 5830ecfe4aa20..da29ffc9d2fed 100644
--- a/llvm/lib/CodeGen/MachineInstrBundle.cpp
+++ b/llvm/lib/CodeGen/MachineInstrBundle.cpp
@@ -133,7 +133,6 @@ void llvm::finalizeBundle(MachineBasicBlock &MBB,
   SmallSetVector<Register, 32> LocalDefs;
   BitVector LocalDefsP(TRI->getNumRegUnits());
   SmallSet<Register, 8> DeadDefSet;
-  SmallSet<Register, 16> KilledDefSet;
   SmallSetVector<Register, 8> ExternUses;
   SmallSet<Register, 8> KilledUseSet;
   SmallSet<Register, 8> UndefUseSet;
@@ -151,7 +150,7 @@ void llvm::finalizeBundle(MachineBasicBlock &MBB,
         MO.setIsInternalRead();
         if (MO.isKill()) {
           // Internal def is now killed.
-          KilledDefSet.insert(Reg);
+          DeadDefSet.insert(Reg);
         }
       } else {
         if (ExternUses.insert(Reg)) {
@@ -171,19 +170,18 @@ void llvm::finalizeBundle(MachineBasicBlock &MBB,
         continue;
 
       if (LocalDefs.insert(Reg)) {
-        if (MO.isDead())
-          DeadDefSet.insert(Reg);
-        else if (Reg.isPhysical())
+        if (!MO.isDead() && Reg.isPhysical()) {
           for (MCRegUnit Unit : TRI->regunits(Reg.asMCReg()))
             LocalDefsP.set(Unit);
+        }
       } else {
-        // Re-defined inside the bundle, it's no longer killed.
-        KilledDefSet.erase(Reg);
         if (!MO.isDead()) {
-          // Previously defined but dead.
+          // Re-defined inside the bundle, it's no longer dead.
           DeadDefSet.erase(Reg);
         }
       }
+      if (MO.isDead())
+        DeadDefSet.insert(Reg);
     }
 
     // Set FrameSetup/FrameDestroy for the bundle. If any of the instructions
@@ -196,7 +194,7 @@ void llvm::finalizeBundle(MachineBasicBlock &MBB,
 
   for (Register Reg : LocalDefs) {
     // If it's not live beyond end of the bundle, mark it dead.
-    bool isDead = DeadDefSet.contains(Reg) || KilledDefSet.contains(Reg);
+    bool isDead = DeadDefSet.contains(Reg);
     MIB.addReg(Reg, getDefRegState(true) | getDeadRegState(isDead) |
                         getImplRegState(true));
   }
diff --git a/llvm/test/CodeGen/AArch64/blr-bti-preserves-operands.mir b/llvm/test/CodeGen/AArch64/blr-bti-preserves-operands.mir
index f41e590c870a4..a4a392c41ec9a 100644
--- a/llvm/test/CodeGen/AArch64/blr-bti-preserves-operands.mir
+++ b/llvm/test/CodeGen/AArch64/blr-bti-preserves-operands.mir
@@ -8,7 +8,7 @@
 # The arguments to the call must become implicit arguments, because the branch
 # only expects to get 1 explicit operand which is the branch target.
 
-# CHECK:    BUNDLE implicit-def $lr, implicit-def $sp, implicit $sp, implicit $x0, implicit $w1 {
+# CHECK:    BUNDLE implicit-def dead $lr, implicit-def $sp, implicit $sp, implicit $x0, implicit $w1 {
 # CHECK:      BL @_setjmp, csr_aarch64_aapcs, implicit-def $lr, implicit $sp, implicit $x0, implicit $w1, implicit-def dead $lr, implicit $sp, implicit-def $sp
 # CHECK:      HINT 36
 # CHECK:    }
diff --git a/llvm/test/CodeGen/AMDGPU/finalizebundle.mir b/llvm/test/CodeGen/AMDGPU/finalizebundle.mir
index 46ca6d347781e..0548bcf304c32 100644
--- a/llvm/test/CodeGen/AMDGPU/finalizebundle.mir
+++ b/llvm/test/CodeGen/AMDGPU/finalizebundle.mir
@@ -16,3 +16,21 @@ body: |
     $vgpr2_vgpr3 = V_LSHLREV_B64_pseudo_e32 1, $vgpr0_vgpr1, implicit $exec
     $vgpr3_vgpr4 = V_LSHLREV_B64_pseudo_e32 1, $vgpr1_vgpr2, implicit $exec
 ...
+
+---
+name: test_dead_redef
+body: |
+  bb.0:
+    liveins: $vgpr0
+    ; CHECK-LABEL: name: test_dead_redef
+    ; CHECK: liveins: $vgpr0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: BUNDLE implicit-def dead $vgpr1, implicit-def $vgpr0, implicit $vgpr0, implicit $exec {
+    ; CHECK-NEXT:   $vgpr1 = V_MOV_B32_e32 $vgpr0, implicit $exec
+    ; CHECK-NEXT:   $vgpr0 = V_MOV_B32_e32 internal $vgpr1, implicit $exec
+    ; CHECK-NEXT:   dead $vgpr1 = V_MOV_B32_e32 internal $vgpr0, implicit $exec
+    ; CHECK-NEXT: }
+    $vgpr1 = V_MOV_B32_e32 $vgpr0, implicit $exec
+    $vgpr0 = V_MOV_B32_e32 $vgpr1, implicit $exec
+    dead $vgpr1 = V_MOV_B32_e32 $vgpr0, implicit $exec
+...
 | 
|  | ||
| # CHECK: BUNDLE implicit-def $lr, implicit-def $sp, implicit $sp, implicit $x0, implicit $w1 { | ||
| # CHECK: BUNDLE implicit-def dead $lr, implicit-def $sp, implicit $sp, implicit $x0, implicit $w1 { | ||
| # CHECK: BL @_setjmp, csr_aarch64_aapcs, implicit-def $lr, implicit $sp, implicit $x0, implicit $w1, implicit-def dead $lr, implicit $sp, implicit-def $sp | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an unexpected change. I am not sure what it means for BL to have both implicit-def $lr and implicit-def dead $lr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a definite lack of rules about how to interpret dead flags. In particular I don't know what dead means when you have aliasing register defs with mixed dead flags
| Perhaps a bit unrelated discussion (sorry), but some details of finalizeBundle has bothered me for some time. I haven't really understood the exact semantics that should be expected from finalizeBundle. And we have got a couple of tweaks downstream related to finalizeBundle to satisfy verifiers and out post ra scheduler etc. One question is if the instructions subject to finalizeBundle should be seen as executing in parallel when it comes to liveness. Should it be assumed that the xor is using the value of $reg1 defined by the add, or the value prior to the bundle (i.e. should the use of $reg1 in the xor be marked as "internal read" here)? When we expand pseudos etc after register allocation we want to create such bundles, and then we want reg1 to be seen as a "use" for the bundle. A special case of the above is that some instructions actually could be seen as having internal reads. Typical example would be a call instruction, where the callee would read the registers new value if bundling things like: The use of $reg0 in the call should be marked as internalRead, and there should be no use of $reg0 for the BUNDLE. Downstream we have a TargetInstrInfo hook that describes if a use operand would be "internal" or "external" based on the instruction that is bundled. Upstream it seems like the bundles instructions are semantically seen as a sequence of instructions, rather than a set of instructions executing in parallel. Another problem is related to the UndefUseSet which upstream tracks if there is at least one undef use of a register. Downstream we intead track NonUndefUsetSet to only mark a use as undef if all uses are undef. where upstream version would add "implicit undef $flag" on the BUNDLE, while our downstream implementation would add "implicit undef $flag" as we have at least on non-undef use. Is this perhaps something that could be seen as a flaw in the current implementation? Would it be interesting to have a finalizeBundle option to denote that the instructions being bundled should be seen as truly parallel and not as a sequence? | 
| I've always understood that the bundle should be equivalent to executing the contained instructions serially. | 
| 
 No, bundles have no ordering. It's as if they all executed at the same time | 
| 
 Citation? That's not what  | 
| AMDGPU backend uses BUNDLEs for things that definitely need to execute serially e.g.:  | 
| 
 Bundles have always been ordered. They are just sequences of instructions that need to be kept together. Thumb IT blocks are a sequence of instructions in program order, and were one of the original uses of bundles AFAIU. As are VPT blocks and the way that AArch64 nowadays uses bundles for MOVPRFX+SVE instruction. This use for BTI setjmps is a new one to me, but the added bti instruction occurs after the call returns. | 
| Bundles are supposed to represent VLIW constraints so they are supposed to be considered as being executed in parallel. | 
| There shouldn't be any data dependency within a bundle, hence the actual order of the lowering shouldn't matter | 
| 
 
 I am not super familiar with the history but it seems the initial implementation was by Evan Cheng, and it seems he was strongly in favour of bundles being designed in a way that could support "intra-bundle dependencies": https://discourse.llvm.org/t/vliw-ports/20741/23 In any case, surely at some point the original intention of how they should be used becomes less important than the reality of how they are used today? | 
| 
 The objective with finalizeBundle is to create the BUNDLE instruction, adding register defs/uses to the BUNDLE instruction in a way so that it represents the outcome of executing the bundled instructions. In order to do that it need to somehow consider the data dependencies between the bundled instructions (e.g. to understand which defs that are live out from the BUNDLE, or which uses that are "internal" in the bundle). Our target support lots of different things (even having multiple instructions defining the same register, or bundling sequences of instructions that both def and use the same register). The idea with the BUNDLE is to hide such internal dependencies by seeing those instructions as one VLIW bundle. It would however be really bad if we weren't allowed to have data dependencies between the instructions that are to be bundled. The current (upstream) implementation of finalizeBundle isn't supporting all our use cases, but it do treat the instructions being bundled as a sequence of instructions that are allowed to have data dependencies. It is for example detecting uses of registers defined by earlier instructions in the bundle, adding "internal read" on such uses. It does not consider those uses as being external uses (which I guess it would to do if seeing the set of instruction being bundled as strictly independent of each other). I got a feeling that these things could be better documented, so that backend developers using bundles would understand how to setup the instruction that are to be bundled in order to get the wanted semantics. Such as making sure that any external use must be prior to a def, since finalizeBundle isn't treating the instructions in the input to finalizeBundle as being independent (even though it might treat the final BUNDLE as being one big "parallel" VLIW instruction). For our target the bundled instructions will be seen as executing in parallel. The instructions may nevertheless have data dependencies. One example I mentioned earlier is when having a def that is feeding a use in a call instruction in the same bundle, but we also have situations when instructions wouldn't see defs from within the same bundle. To support such things we need to add a couple of hacks to finalizeBundle that are more or less easy to maintain. One problem is that the upstream version isn't very explicit about how these things are supposed to work (which IMHO both makes it a bit hard to review patches, but also to suggest improvements to finalizeBundle). | 
A dead redefinition should override any earlier non-dead definition
inside a bundle.
Also remove KilledDefSet since it can be folded into DeadDefSet.