Skip to content

Conversation

@arsenm
Copy link
Contributor

@arsenm arsenm commented Sep 4, 2025

My goal here was to eliminate the use of getGPRsForTailCall.
I don't think it makes sense to use here, considering this function
handles other non-tail call cases. It seems to have been used to
find a non-callee saved register in a roundabout way. We can be more
precise by letting LiveRegUnits figure out the CSR set for the current
function.

There are a few things I find confusing about this function; the API
isn't right. The callers ideally would be maintaining LiveRegUnits
in the context where they need the free register. It also doesn't
provide a RegisterClass to use. Also, this should work for any
block and this shouldn't need to special case this set of return
opcodes. I'm also not sure why there is a special case for RIP. It was introduced in
fe5e5dc but doesn't appear to
be tested.

There are a few codegen test changes. One of them I think is a correct
improvement since the old code didn't consider undef uses as available.
The others I think are just to allocation order changes, since we're now
using the broader GPR_*NOSP classes.

@arsenm arsenm added the backend:X86 label Sep 4, 2025 — with Graphite App
Copy link
Contributor Author

arsenm commented Sep 4, 2025

@llvmbot
Copy link
Member

llvmbot commented Sep 4, 2025

@llvm/pr-subscribers-backend-x86

Author: Matt Arsenault (arsenm)

Changes

My goal here was to eliminate the use of getGPRsForTailCall.
I don't think it makes sense to use here, considering this function
handles other non-tail call cases. It seems to have been used to
find a non-callee saved register in a roundabout way. We can be more
precise by letting LiveRegUnits figure out the CSR set for the current
function.

There are a few things I find confusing about this function; the API
isn't right. The callers ideally would be maintaining LiveRegUnits
in the context where they need the free register. It also doesn't
provide a RegisterClass to use. Also, this should work for any
block and this shouldn't need to special case this set of return
opcodes.

There are a few codegen test changes. One of them I think is a correct
improvement since the old code didn't consider undef uses as available.
The others I think are just to allocation order changes, since we're now
using the broader GPR_*NOSP classes.


Full diff: https://github.com/llvm/llvm-project/pull/156817.diff

5 Files Affected:

  • (modified) llvm/lib/Target/X86/X86RegisterInfo.cpp (+13-15)
  • (modified) llvm/test/CodeGen/X86/apx/push2-pop2-cfi-seh.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/lvi-hardening-ret.ll (+6-6)
  • (modified) llvm/test/CodeGen/X86/pr40289-64bit.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr40289.ll (+1-1)
diff --git a/llvm/lib/Target/X86/X86RegisterInfo.cpp b/llvm/lib/Target/X86/X86RegisterInfo.cpp
index b79e508df7c97..3f4955f28e68b 100644
--- a/llvm/lib/Target/X86/X86RegisterInfo.cpp
+++ b/llvm/lib/Target/X86/X86RegisterInfo.cpp
@@ -1002,8 +1002,6 @@ unsigned X86RegisterInfo::findDeadCallerSavedReg(
   if (MF->callsEHReturn())
     return 0;
 
-  const TargetRegisterClass &AvailableRegs = *getGPRsForTailCall(*MF);
-
   if (MBBI == MBB.end())
     return 0;
 
@@ -1025,20 +1023,20 @@ unsigned X86RegisterInfo::findDeadCallerSavedReg(
   case X86::TCRETURNmi64:
   case X86::EH_RETURN:
   case X86::EH_RETURN64: {
-    SmallSet<uint16_t, 8> Uses;
-    for (MachineOperand &MO : MBBI->operands()) {
-      if (!MO.isReg() || MO.isDef())
-        continue;
-      Register Reg = MO.getReg();
-      if (!Reg)
-        continue;
-      for (MCRegAliasIterator AI(Reg, this, true); AI.isValid(); ++AI)
-        Uses.insert(*AI);
+    LiveRegUnits LRU(*this);
+    LRU.addLiveOuts(MBB);
+    LRU.stepBackward(*MBBI);
+
+    // FIXME: Why do we need to special case this register? Is it missing from
+    // return implicit uses?
+    LRU.removeReg(X86::RIP);
+
+    const TargetRegisterClass &RC =
+        Is64Bit ? X86::GR64_NOSPRegClass : X86::GR32_NOSPRegClass;
+    for (MCRegister Reg : RC) {
+      if (LRU.available(Reg))
+        return Reg;
     }
-
-    for (auto CS : AvailableRegs)
-      if (!Uses.count(CS) && CS != X86::RIP && CS != X86::RSP && CS != X86::ESP)
-        return CS;
   }
   }
 
diff --git a/llvm/test/CodeGen/X86/apx/push2-pop2-cfi-seh.ll b/llvm/test/CodeGen/X86/apx/push2-pop2-cfi-seh.ll
index ad24608d338ad..d6d4db3509103 100644
--- a/llvm/test/CodeGen/X86/apx/push2-pop2-cfi-seh.ll
+++ b/llvm/test/CodeGen/X86/apx/push2-pop2-cfi-seh.ll
@@ -81,7 +81,7 @@ define i32 @csr6_alloc16(ptr %argv) {
 ; LIN-NEXT:    .cfi_def_cfa_offset 32
 ; LIN-NEXT:    pop2 %rbp, %r15
 ; LIN-NEXT:    .cfi_def_cfa_offset 16
-; LIN-NEXT:    popq %rcx
+; LIN-NEXT:    popq %rax
 ; LIN-NEXT:    .cfi_def_cfa_offset 8
 ; LIN-NEXT:    retq
 ;
@@ -116,7 +116,7 @@ define i32 @csr6_alloc16(ptr %argv) {
 ; LIN-PPX-NEXT:    .cfi_def_cfa_offset 32
 ; LIN-PPX-NEXT:    pop2p %rbp, %r15
 ; LIN-PPX-NEXT:    .cfi_def_cfa_offset 16
-; LIN-PPX-NEXT:    popq %rcx
+; LIN-PPX-NEXT:    popq %rax
 ; LIN-PPX-NEXT:    .cfi_def_cfa_offset 8
 ; LIN-PPX-NEXT:    retq
 ;
@@ -180,7 +180,7 @@ define i32 @csr6_alloc16(ptr %argv) {
 ; WIN-NEXT:    pop2 %rbp, %rbx
 ; WIN-NEXT:    pop2 %r13, %r12
 ; WIN-NEXT:    pop2 %r15, %r14
-; WIN-NEXT:    popq %rcx
+; WIN-NEXT:    popq %rax
 ; WIN-NEXT:    .seh_endepilogue
 ; WIN-NEXT:    retq
 ; WIN-NEXT:    .seh_endproc
@@ -211,7 +211,7 @@ define i32 @csr6_alloc16(ptr %argv) {
 ; WIN-PPX-NEXT:    pop2p %rbp, %rbx
 ; WIN-PPX-NEXT:    pop2p %r13, %r12
 ; WIN-PPX-NEXT:    pop2p %r15, %r14
-; WIN-PPX-NEXT:    popq %rcx
+; WIN-PPX-NEXT:    popq %rax
 ; WIN-PPX-NEXT:    .seh_endepilogue
 ; WIN-PPX-NEXT:    retq
 ; WIN-PPX-NEXT:    .seh_endproc
diff --git a/llvm/test/CodeGen/X86/lvi-hardening-ret.ll b/llvm/test/CodeGen/X86/lvi-hardening-ret.ll
index faa8bff8f0943..954985a3798b7 100644
--- a/llvm/test/CodeGen/X86/lvi-hardening-ret.ll
+++ b/llvm/test/CodeGen/X86/lvi-hardening-ret.ll
@@ -41,9 +41,9 @@ entry:
   %add = add nsw i32 %0, %1
   ret i32 %add
 ; CHECK-NOT:  retq
-; CHECK:      popq %rcx
+; CHECK:      popq %rsi
 ; CHECK-NEXT: lfence
-; CHECK-NEXT: jmpq *%rcx
+; CHECK-NEXT: jmpq *%rsi
 }
 
 ; Function Attrs: noinline nounwind optnone uwtable
@@ -52,9 +52,9 @@ define dso_local preserve_mostcc void @preserve_most() #0 {
 entry:
   ret void
 ; CHECK-NOT:  retq
-; CHECK:      popq %rax
+; CHECK:      popq %r11
 ; CHECK-NEXT: lfence
-; CHECK-NEXT: jmpq *%rax
+; CHECK-NEXT: jmpq *%r11
 }
 
 ; Function Attrs: noinline nounwind optnone uwtable
@@ -63,9 +63,9 @@ define dso_local preserve_allcc void @preserve_all() #0 {
 entry:
   ret void
 ; CHECK-NOT:  retq
-; CHECK:      popq %rax
+; CHECK:      popq %r11
 ; CHECK-NEXT: lfence
-; CHECK-NEXT: jmpq *%rax
+; CHECK-NEXT: jmpq *%r11
 }
 
 define { i64, i128 } @ret_i64_i128() #0 {
diff --git a/llvm/test/CodeGen/X86/pr40289-64bit.ll b/llvm/test/CodeGen/X86/pr40289-64bit.ll
index 58da5258a6703..a83674ac81998 100644
--- a/llvm/test/CodeGen/X86/pr40289-64bit.ll
+++ b/llvm/test/CodeGen/X86/pr40289-64bit.ll
@@ -6,5 +6,5 @@ define cc 92 < 9 x i64 > @clobber() {
   ret < 9 x i64 > undef
   ; CHECK-LABEL: clobber:
   ; CHECK-NOT: popq %rsp
-  ; CHECK: addq $8, %rsp
+  ; CHECK: pushq %rax
 }
diff --git a/llvm/test/CodeGen/X86/pr40289.ll b/llvm/test/CodeGen/X86/pr40289.ll
index 851b23c002bdb..21e50931b40f2 100644
--- a/llvm/test/CodeGen/X86/pr40289.ll
+++ b/llvm/test/CodeGen/X86/pr40289.ll
@@ -6,5 +6,5 @@ define < 3 x i32 > @clobber() {
   ret < 3 x i32 > undef
   ; CHECK-LABEL: clobber:
   ; CHECK-NOT: popl %esp
-  ; CHECK: addl $4, %esp
+  ; CHECK: popl %eax
 }

@arsenm arsenm marked this pull request as ready for review September 4, 2025 07:17
My goal here was to eliminate the use of getGPRsForTailCall.
I don't think it makes sense to use here, considering this function
handles other non-tail call cases. It seems to have been used to
find a non-callee saved register in a roundabout way. We can be more
precise by letting LiveRegUnits figure out the CSR set for the current
function.

There are a few things I find confusing about this function; the API
isn't right. The callers ideally would be maintaining LiveRegUnits
in the context where they need the free register. It also doesn't
provide a RegisterClass to use. Also, this should work for any
block and this shouldn't need to special case this set of return
opcodes.

There are a few codegen test changes. One of them I think is a correct
improvement since the old code didn't consider undef uses as available.
The others I think are just to allocation order changes, since we're now
using the broader GPR_*NOSP classes.
@arsenm arsenm force-pushed the users/arsenm/x86/use-liveregunits-findDeadCallerSavedReg branch from 73e2257 to c831a2a Compare September 4, 2025 12:46
@arsenm
Copy link
Contributor Author

arsenm commented Sep 8, 2025

ping

Comment on lines 1030 to 1032
// FIXME: Why do we need to special case this register? Is it missing from
// return implicit uses?
LRU.removeReg(X86::RIP);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't X86::GR64_NOSPRegClass excluding RIP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if that's the class constraint an all of these instructions. This is attempting to reproduce the existing logic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@arsenm arsenm enabled auto-merge (squash) September 9, 2025 03:35
@arsenm arsenm merged commit 6076b07 into main Sep 9, 2025
9 checks passed
@arsenm arsenm deleted the users/arsenm/x86/use-liveregunits-findDeadCallerSavedReg branch September 9, 2025 03:52
phoebewang added a commit that referenced this pull request Sep 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants