-
Notifications
You must be signed in to change notification settings - Fork 15k
Greedy: Make trySplitAroundHintReg try to match hints with subreg copies #160294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Greedy: Make trySplitAroundHintReg try to match hints with subreg copies #160294
Conversation
This is essentially the same patch as 116ca95; when trying to match a physreg hint, try to find a compatible physreg if there is a subregister copy. This has the slight difference of using getSubReg on the hint instead of getMatchingSuperReg (the other use should also use getSubReg instead, it's faster). At the moment this turns out to have very little effect. The adjacent code needs better handling of subregisters, so continue adding this piecemeal. The X86 test shows a net reduction in real instructions, plus a few new kills.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
@llvm/pr-subscribers-backend-x86 @llvm/pr-subscribers-llvm-regalloc Author: Matt Arsenault (arsenm) ChangesThis is essentially the same patch as 116ca95; At the moment this turns out to have very little effect. The adjacent code needs Full diff: https://github.com/llvm/llvm-project/pull/160294.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/RegAllocGreedy.cpp b/llvm/lib/CodeGen/RegAllocGreedy.cpp
index d004815d2c17a..76ed9dad3456d 100644
--- a/llvm/lib/CodeGen/RegAllocGreedy.cpp
+++ b/llvm/lib/CodeGen/RegAllocGreedy.cpp
@@ -1383,21 +1383,35 @@ bool RAGreedy::trySplitAroundHintReg(MCPhysReg Hint,
// Compute the cost of assigning a non Hint physical register to VirtReg.
// We define it as the total frequency of broken COPY instructions to/from
// Hint register, and after split, they can be deleted.
- for (const MachineInstr &Instr : MRI->reg_nodbg_instructions(Reg)) {
- if (!TII->isFullCopyInstr(Instr))
+
+ // FIXME: This is miscounting the costs with subregisters. In particular, this
+ // should support recognizing SplitKit formed copy bundles instead of direct
+ // copy instructions.
+ for (const MachineOperand &Opnd : MRI->reg_nodbg_operands(Reg)) {
+ const MachineInstr &Instr = *Opnd.getParent();
+ if (!Instr.isCopy() || Opnd.isImplicit())
continue;
- Register OtherReg = Instr.getOperand(1).getReg();
- if (OtherReg == Reg) {
- OtherReg = Instr.getOperand(0).getReg();
- if (OtherReg == Reg)
- continue;
+
+ // Look for the other end of the copy.
+ const bool IsDef = Opnd.isDef();
+ const MachineOperand &OtherOpnd = Instr.getOperand(IsDef);
+ Register OtherReg = OtherOpnd.getReg();
+ assert(Reg == Opnd.getReg());
+ if (OtherReg == Reg)
+ continue;
+
+ unsigned SubReg = Opnd.getSubReg();
+ if (!IsDef) {
// Check if VirtReg interferes with OtherReg after this COPY instruction.
if (VirtReg.liveAt(LIS->getInstructionIndex(Instr).getRegSlot()))
continue;
}
+
MCRegister OtherPhysReg =
OtherReg.isPhysical() ? OtherReg.asMCReg() : VRM->getPhys(OtherReg);
- if (OtherPhysReg == Hint)
+ MCRegister ThisHint =
+ SubReg ? TRI->getSubReg(Hint, SubReg) : MCRegister(Hint);
+ if (OtherPhysReg == ThisHint)
Cost += MBFI->getBlockFreq(Instr.getParent());
}
diff --git a/llvm/test/CodeGen/X86/atomic-bit-test.ll b/llvm/test/CodeGen/X86/atomic-bit-test.ll
index 8f91f4120842b..b06bef44a5e9e 100644
--- a/llvm/test/CodeGen/X86/atomic-bit-test.ll
+++ b/llvm/test/CodeGen/X86/atomic-bit-test.ll
@@ -469,52 +469,56 @@ entry:
define i16 @use_in_diff_bb() nounwind {
; X86-LABEL: use_in_diff_bb:
; X86: # %bb.0: # %entry
-; X86-NEXT: pushl %esi
-; X86-NEXT: movzwl v16, %esi
+; X86-NEXT: movzwl v16, %eax
; X86-NEXT: .p2align 4
; X86-NEXT: .LBB17_1: # %atomicrmw.start
; X86-NEXT: # =>This Inner Loop Header: Depth=1
-; X86-NEXT: movl %esi, %ecx
+; X86-NEXT: movl %eax, %ecx
; X86-NEXT: orl $1, %ecx
-; X86-NEXT: movl %esi, %eax
+; X86-NEXT: # kill: def $ax killed $ax killed $eax
; X86-NEXT: lock cmpxchgw %cx, v16
-; X86-NEXT: movl %eax, %esi
+; X86-NEXT: # kill: def $ax killed $ax def $eax
; X86-NEXT: jne .LBB17_1
; X86-NEXT: # %bb.2: # %atomicrmw.end
-; X86-NEXT: xorl %eax, %eax
-; X86-NEXT: testb %al, %al
+; X86-NEXT: xorl %ecx, %ecx
+; X86-NEXT: testb %cl, %cl
; X86-NEXT: jne .LBB17_4
; X86-NEXT: # %bb.3:
+; X86-NEXT: pushl %esi
+; X86-NEXT: movl %eax, %esi
; X86-NEXT: calll foo@PLT
-; X86-NEXT: .LBB17_4:
-; X86-NEXT: andl $1, %esi
; X86-NEXT: movl %esi, %eax
; X86-NEXT: popl %esi
+; X86-NEXT: .LBB17_4:
+; X86-NEXT: andl $1, %eax
+; X86-NEXT: # kill: def $ax killed $ax killed $eax
; X86-NEXT: retl
;
; X64-LABEL: use_in_diff_bb:
; X64: # %bb.0: # %entry
-; X64-NEXT: pushq %rbx
-; X64-NEXT: movzwl v16(%rip), %ebx
+; X64-NEXT: movzwl v16(%rip), %eax
; X64-NEXT: .p2align 4
; X64-NEXT: .LBB17_1: # %atomicrmw.start
; X64-NEXT: # =>This Inner Loop Header: Depth=1
-; X64-NEXT: movl %ebx, %ecx
+; X64-NEXT: movl %eax, %ecx
; X64-NEXT: orl $1, %ecx
-; X64-NEXT: movl %ebx, %eax
+; X64-NEXT: # kill: def $ax killed $ax killed $eax
; X64-NEXT: lock cmpxchgw %cx, v16(%rip)
-; X64-NEXT: movl %eax, %ebx
+; X64-NEXT: # kill: def $ax killed $ax def $eax
; X64-NEXT: jne .LBB17_1
; X64-NEXT: # %bb.2: # %atomicrmw.end
-; X64-NEXT: xorl %eax, %eax
-; X64-NEXT: testb %al, %al
+; X64-NEXT: xorl %ecx, %ecx
+; X64-NEXT: testb %cl, %cl
; X64-NEXT: jne .LBB17_4
; X64-NEXT: # %bb.3:
+; X64-NEXT: pushq %rbx
+; X64-NEXT: movl %eax, %ebx
; X64-NEXT: callq foo@PLT
-; X64-NEXT: .LBB17_4:
-; X64-NEXT: andl $1, %ebx
; X64-NEXT: movl %ebx, %eax
; X64-NEXT: popq %rbx
+; X64-NEXT: .LBB17_4:
+; X64-NEXT: andl $1, %eax
+; X64-NEXT: # kill: def $ax killed $ax killed $eax
; X64-NEXT: retq
entry:
%0 = atomicrmw or ptr @v16, i16 1 monotonic, align 2
|
| const MachineOperand &OtherOpnd = Instr.getOperand(IsDef); | ||
| Register OtherReg = OtherOpnd.getReg(); | ||
| assert(Reg == Opnd.getReg()); | ||
| if (OtherReg == Reg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there instructions which copy from one subreg to another subreg on the same instruction? (Asking from ignorance, I don't know if this is possible.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on the same instruction
What do you mean by same instruction?
Copies can generally move registers from one subreg to another. It just depends how the super registers are formed, but I'm not sure this is what you are asking.
E.g., this is legal
v1.sub1 = COPY v2.sub0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly the case I was wondering about. If I'm reading it correctly, the check I commented on would ignore this copy, even though it likely needs to be materialized to a real instruction. I think we need to check that the subreg on the two operands are the same too don't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had this check originally but it doesn't appear to have any effect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really sure why this needs any of the pre-filtering though; instead could just directly use the liveAt check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the check I commented on would ignore this copy, even though it likely needs to be materialized to a real instruction
Not necessarily. E.g., let's say we have a v2x64 and a 64 and we copy the low 32-bit, the copy would be:
v1.vsub0_sub0 = v2.sub0
The copy uses different subregs indices, but it can still be a no-op if the low 64 bit of v1 is coalesced on v2.
That said, I haven't checked the code yet, so I don't know if there's a problem at this point :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
For the record, not convinced the sub-reg handling is optimal here, but I think this is an improvement over the prior code, so I'm fine with that being addressed later on.
Yes, it's all in pretty bad shape and extremely time consuming to fix each piece |
…ies (llvm#160294) This is essentially the same patch as 116ca95; when trying to match a physreg hint, try to find a compatible physreg if there is a subregister copy. This has the slight difference of using getSubReg on the hint instead of getMatchingSuperReg (the other use should also use getSubReg instead, it's faster). At the moment this turns out to have very little effect. The adjacent code needs better handling of subregisters, so continue adding this piecemeal. The X86 test shows a net reduction in real instructions, plus a few new kills.

This is essentially the same patch as 116ca95;
when trying to match a physreg hint, try to find a compatible physreg if there is
a subregister copy. This has the slight difference of using getSubReg on the hint
instead of getMatchingSuperReg (the other use should also use getSubReg instead,
it's faster).
At the moment this turns out to have very little effect. The adjacent code needs
better handling of subregisters, so continue adding this piecemeal. The X86 test
shows a net reduction in real instructions, plus a few new kills.