Skip to content

Conversation

@arsenm
Copy link
Contributor

@arsenm arsenm commented Sep 23, 2025

This is essentially the same patch as 116ca95;
when trying to match a physreg hint, try to find a compatible physreg if there is
a subregister copy. This has the slight difference of using getSubReg on the hint
instead of getMatchingSuperReg (the other use should also use getSubReg instead,
it's faster).

At the moment this turns out to have very little effect. The adjacent code needs
better handling of subregisters, so continue adding this piecemeal. The X86 test
shows a net reduction in real instructions, plus a few new kills.

This is essentially the same patch as 116ca95;
when trying to match a physreg hint, try to find a compatible physreg if there is
a subregister copy. This has the slight difference of using getSubReg on the hint
instead of getMatchingSuperReg (the other use should also use getSubReg instead,
it's faster).

At the moment this turns out to have very little effect. The adjacent code needs
better handling of subregisters, so continue adding this piecemeal. The X86 test
shows a net reduction in real instructions, plus a few new kills.
Copy link
Contributor Author

arsenm commented Sep 23, 2025

@llvmbot
Copy link
Member

llvmbot commented Sep 23, 2025

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-regalloc

Author: Matt Arsenault (arsenm)

Changes

This is essentially the same patch as 116ca95;
when trying to match a physreg hint, try to find a compatible physreg if there is
a subregister copy. This has the slight difference of using getSubReg on the hint
instead of getMatchingSuperReg (the other use should also use getSubReg instead,
it's faster).

At the moment this turns out to have very little effect. The adjacent code needs
better handling of subregisters, so continue adding this piecemeal. The X86 test
shows a net reduction in real instructions, plus a few new kills.


Full diff: https://github.com/llvm/llvm-project/pull/160294.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/RegAllocGreedy.cpp (+22-8)
  • (modified) llvm/test/CodeGen/X86/atomic-bit-test.ll (+22-18)
diff --git a/llvm/lib/CodeGen/RegAllocGreedy.cpp b/llvm/lib/CodeGen/RegAllocGreedy.cpp
index d004815d2c17a..76ed9dad3456d 100644
--- a/llvm/lib/CodeGen/RegAllocGreedy.cpp
+++ b/llvm/lib/CodeGen/RegAllocGreedy.cpp
@@ -1383,21 +1383,35 @@ bool RAGreedy::trySplitAroundHintReg(MCPhysReg Hint,
   // Compute the cost of assigning a non Hint physical register to VirtReg.
   // We define it as the total frequency of broken COPY instructions to/from
   // Hint register, and after split, they can be deleted.
-  for (const MachineInstr &Instr : MRI->reg_nodbg_instructions(Reg)) {
-    if (!TII->isFullCopyInstr(Instr))
+
+  // FIXME: This is miscounting the costs with subregisters. In particular, this
+  // should support recognizing SplitKit formed copy bundles instead of direct
+  // copy instructions.
+  for (const MachineOperand &Opnd : MRI->reg_nodbg_operands(Reg)) {
+    const MachineInstr &Instr = *Opnd.getParent();
+    if (!Instr.isCopy() || Opnd.isImplicit())
       continue;
-    Register OtherReg = Instr.getOperand(1).getReg();
-    if (OtherReg == Reg) {
-      OtherReg = Instr.getOperand(0).getReg();
-      if (OtherReg == Reg)
-        continue;
+
+    // Look for the other end of the copy.
+    const bool IsDef = Opnd.isDef();
+    const MachineOperand &OtherOpnd = Instr.getOperand(IsDef);
+    Register OtherReg = OtherOpnd.getReg();
+    assert(Reg == Opnd.getReg());
+    if (OtherReg == Reg)
+      continue;
+
+    unsigned SubReg = Opnd.getSubReg();
+    if (!IsDef) {
       // Check if VirtReg interferes with OtherReg after this COPY instruction.
       if (VirtReg.liveAt(LIS->getInstructionIndex(Instr).getRegSlot()))
         continue;
     }
+
     MCRegister OtherPhysReg =
         OtherReg.isPhysical() ? OtherReg.asMCReg() : VRM->getPhys(OtherReg);
-    if (OtherPhysReg == Hint)
+    MCRegister ThisHint =
+        SubReg ? TRI->getSubReg(Hint, SubReg) : MCRegister(Hint);
+    if (OtherPhysReg == ThisHint)
       Cost += MBFI->getBlockFreq(Instr.getParent());
   }
 
diff --git a/llvm/test/CodeGen/X86/atomic-bit-test.ll b/llvm/test/CodeGen/X86/atomic-bit-test.ll
index 8f91f4120842b..b06bef44a5e9e 100644
--- a/llvm/test/CodeGen/X86/atomic-bit-test.ll
+++ b/llvm/test/CodeGen/X86/atomic-bit-test.ll
@@ -469,52 +469,56 @@ entry:
 define i16 @use_in_diff_bb() nounwind {
 ; X86-LABEL: use_in_diff_bb:
 ; X86:       # %bb.0: # %entry
-; X86-NEXT:    pushl %esi
-; X86-NEXT:    movzwl v16, %esi
+; X86-NEXT:    movzwl v16, %eax
 ; X86-NEXT:    .p2align 4
 ; X86-NEXT:  .LBB17_1: # %atomicrmw.start
 ; X86-NEXT:    # =>This Inner Loop Header: Depth=1
-; X86-NEXT:    movl %esi, %ecx
+; X86-NEXT:    movl %eax, %ecx
 ; X86-NEXT:    orl $1, %ecx
-; X86-NEXT:    movl %esi, %eax
+; X86-NEXT:    # kill: def $ax killed $ax killed $eax
 ; X86-NEXT:    lock cmpxchgw %cx, v16
-; X86-NEXT:    movl %eax, %esi
+; X86-NEXT:    # kill: def $ax killed $ax def $eax
 ; X86-NEXT:    jne .LBB17_1
 ; X86-NEXT:  # %bb.2: # %atomicrmw.end
-; X86-NEXT:    xorl %eax, %eax
-; X86-NEXT:    testb %al, %al
+; X86-NEXT:    xorl %ecx, %ecx
+; X86-NEXT:    testb %cl, %cl
 ; X86-NEXT:    jne .LBB17_4
 ; X86-NEXT:  # %bb.3:
+; X86-NEXT:    pushl %esi
+; X86-NEXT:    movl %eax, %esi
 ; X86-NEXT:    calll foo@PLT
-; X86-NEXT:  .LBB17_4:
-; X86-NEXT:    andl $1, %esi
 ; X86-NEXT:    movl %esi, %eax
 ; X86-NEXT:    popl %esi
+; X86-NEXT:  .LBB17_4:
+; X86-NEXT:    andl $1, %eax
+; X86-NEXT:    # kill: def $ax killed $ax killed $eax
 ; X86-NEXT:    retl
 ;
 ; X64-LABEL: use_in_diff_bb:
 ; X64:       # %bb.0: # %entry
-; X64-NEXT:    pushq %rbx
-; X64-NEXT:    movzwl v16(%rip), %ebx
+; X64-NEXT:    movzwl v16(%rip), %eax
 ; X64-NEXT:    .p2align 4
 ; X64-NEXT:  .LBB17_1: # %atomicrmw.start
 ; X64-NEXT:    # =>This Inner Loop Header: Depth=1
-; X64-NEXT:    movl %ebx, %ecx
+; X64-NEXT:    movl %eax, %ecx
 ; X64-NEXT:    orl $1, %ecx
-; X64-NEXT:    movl %ebx, %eax
+; X64-NEXT:    # kill: def $ax killed $ax killed $eax
 ; X64-NEXT:    lock cmpxchgw %cx, v16(%rip)
-; X64-NEXT:    movl %eax, %ebx
+; X64-NEXT:    # kill: def $ax killed $ax def $eax
 ; X64-NEXT:    jne .LBB17_1
 ; X64-NEXT:  # %bb.2: # %atomicrmw.end
-; X64-NEXT:    xorl %eax, %eax
-; X64-NEXT:    testb %al, %al
+; X64-NEXT:    xorl %ecx, %ecx
+; X64-NEXT:    testb %cl, %cl
 ; X64-NEXT:    jne .LBB17_4
 ; X64-NEXT:  # %bb.3:
+; X64-NEXT:    pushq %rbx
+; X64-NEXT:    movl %eax, %ebx
 ; X64-NEXT:    callq foo@PLT
-; X64-NEXT:  .LBB17_4:
-; X64-NEXT:    andl $1, %ebx
 ; X64-NEXT:    movl %ebx, %eax
 ; X64-NEXT:    popq %rbx
+; X64-NEXT:  .LBB17_4:
+; X64-NEXT:    andl $1, %eax
+; X64-NEXT:    # kill: def $ax killed $ax killed $eax
 ; X64-NEXT:    retq
 entry:
   %0 = atomicrmw or ptr @v16, i16 1 monotonic, align 2

@arsenm arsenm marked this pull request as ready for review September 23, 2025 13:10
const MachineOperand &OtherOpnd = Instr.getOperand(IsDef);
Register OtherReg = OtherOpnd.getReg();
assert(Reg == Opnd.getReg());
if (OtherReg == Reg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there instructions which copy from one subreg to another subreg on the same instruction? (Asking from ignorance, I don't know if this is possible.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the same instruction
What do you mean by same instruction?

Copies can generally move registers from one subreg to another. It just depends how the super registers are formed, but I'm not sure this is what you are asking.

E.g., this is legal

v1.sub1 = COPY v2.sub0

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly the case I was wondering about. If I'm reading it correctly, the check I commented on would ignore this copy, even though it likely needs to be materialized to a real instruction. I think we need to check that the subreg on the two operands are the same too don't we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this check originally but it doesn't appear to have any effect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure why this needs any of the pre-filtering though; instead could just directly use the liveAt check?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the check I commented on would ignore this copy, even though it likely needs to be materialized to a real instruction

Not necessarily. E.g., let's say we have a v2x64 and a 64 and we copy the low 32-bit, the copy would be:
v1.vsub0_sub0 = v2.sub0

The copy uses different subregs indices, but it can still be a no-op if the low 64 bit of v1 is coalesced on v2.

That said, I haven't checked the code yet, so I don't know if there's a problem at this point :).

Copy link
Contributor Author

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

For the record, not convinced the sub-reg handling is optimal here, but I think this is an improvement over the prior code, so I'm fine with that being addressed later on.

@arsenm arsenm merged commit 129394e into main Sep 26, 2025
9 checks passed
@arsenm arsenm deleted the users/arsenm/greedy/trySplitAroundHintReg-subreg-liveness-checks branch September 26, 2025 15:14
@arsenm
Copy link
Contributor Author

arsenm commented Sep 26, 2025

For the record, not convinced the sub-reg handling is optimal here, but I think this is an improvement over the prior code, so I'm fine with that being addressed later on.

Yes, it's all in pretty bad shape and extremely time consuming to fix each piece

mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
…ies (llvm#160294)

This is essentially the same patch as
116ca95;
when trying to match a physreg hint, try to find a compatible physreg if
there is
a subregister copy. This has the slight difference of using getSubReg on
the hint
instead of getMatchingSuperReg (the other use should also use getSubReg
instead,
it's faster).

At the moment this turns out to have very little effect. The adjacent
code needs
better handling of subregisters, so continue adding this piecemeal. The
X86 test
shows a net reduction in real instructions, plus a few new kills.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants