-
Notifications
You must be signed in to change notification settings - Fork 15.3k
x86: fix musttail sibcall miscompilation #168956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@llvm/pr-subscribers-backend-x86 Author: Folkert de Vries (folkertdev) ChangesCurrently the x86 backend miscompiles straightforward tail calls when the stack is used for argument passing. This program segfaults on any optimization level: https://godbolt.org/z/5xr99jr4v typedef struct {
uint64_t x;
uint64_t y;
uint64_t z;
} S;
__attribute__((noinline))
uint64_t callee(S s) {
return s.x + s.y + s.z;
}
__attribute__((noinline))
uint64_t caller(S s) {
[[clang::musttail]]
return callee(s);
}The immediate issue is that caller:
mov rax, qword ptr [rsp + 24]
mov qword ptr [rsp + 16], rax
movaps xmm0, xmmword ptr [rsp + 8]
movups xmmword ptr [rsp], xmm0 ; <-- that is just all kinds of wrong
movaps xmmword ptr [rsp + 8], xmm0
mov qword ptr [rsp + 24], rax
jmp calleeHowever, I think the actual problem is that the x86 backend never considers This PR essentially copies https://reviews.llvm.org/D131034, but this time I hope we can actually land this, and solve this problem. The aarch64 backend also miscompiled this example, but they appear to have fixed it in LLVM 20. Tail calls just not working for any sort of non-trivial argument types is a blocker for tail call support in rust, see rust-lang/rust#144855 (comment). Full diff: https://github.com/llvm/llvm-project/pull/168956.diff 4 Files Affected:
diff --git a/llvm/lib/Target/X86/X86ISelLoweringCall.cpp b/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
index 37d77728882b1..e53e5a2a02cb4 100644
--- a/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
+++ b/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
@@ -2098,15 +2098,18 @@ X86TargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
isTailCall = false;
}
- if (isTailCall && !IsMustTail) {
+ if (isTailCall) {
// Check if it's really possible to do a tail call.
- isTailCall = IsEligibleForTailCallOptimization(CLI, CCInfo, ArgLocs,
- IsCalleePopSRet);
+ IsSibcall = IsEligibleForTailCallOptimization(CLI, CCInfo, ArgLocs,
+ IsCalleePopSRet);
+
+ if (!IsMustTail) {
+ isTailCall = IsSibcall;
- // Sibcalls are automatically detected tailcalls which do not require
- // ABI changes.
- if (!IsGuaranteeTCO && isTailCall)
- IsSibcall = true;
+ // Sibcalls are automatically detected tailcalls which do not require
+ // ABI changes.
+ IsSibcall = IsSibcall && !IsGuaranteeTCO;
+ }
if (isTailCall)
++NumTailCalls;
@@ -2128,8 +2131,9 @@ X86TargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
else if (IsGuaranteeTCO && canGuaranteeTCO(CallConv))
NumBytes = GetAlignedArgumentStackSize(NumBytes, DAG);
+ // A sibcall is ABI-compatible and does not need to adjust the stack pointer.
int FPDiff = 0;
- if (isTailCall &&
+ if (isTailCall && !IsSibcall &&
shouldGuaranteeTCO(CallConv,
MF.getTarget().Options.GuaranteedTailCallOpt)) {
// Lower arguments at fp - stackoffset + fpdiff.
diff --git a/llvm/test/CodeGen/X86/musttail-struct.ll b/llvm/test/CodeGen/X86/musttail-struct.ll
new file mode 100644
index 0000000000000..0ae6b91d4fc02
--- /dev/null
+++ b/llvm/test/CodeGen/X86/musttail-struct.ll
@@ -0,0 +1,57 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown | FileCheck %s
+; RUN: llc < %s -mtriple=i686-unknown-unknown | FileCheck %s
+
+; Test correct handling of a musttail call with a byval struct argument.
+
+%struct.1xi32 = type { [1 x i32] }
+%struct.3xi32 = type { [3 x i32] }
+%struct.5xi32 = type { [5 x i32] }
+
+declare dso_local i32 @Func1(ptr byval(%struct.1xi32) %0)
+declare dso_local i32 @Func3(ptr byval(%struct.3xi32) %0)
+declare dso_local i32 @Func5(ptr byval(%struct.5xi32) %0)
+declare dso_local i32 @FuncManyArgs(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5, i8 %6, ptr byval(%struct.5xi32) %7)
+
+define dso_local i32 @test1(ptr byval(%struct.1xi32) %0) {
+; CHECK-LABEL: test1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: jmp Func1 # TAILCALL
+ %r = musttail call i32 @Func1(ptr byval(%struct.1xi32) %0)
+ ret i32 %r
+}
+
+define dso_local i32 @test3(ptr byval(%struct.3xi32) %0) {
+; CHECK-LABEL: test3:
+; CHECK: # %bb.0:
+; CHECK-NEXT: jmp Func3 # TAILCALL
+ %r = musttail call i32 @Func3(ptr byval(%struct.3xi32) %0)
+ ret i32 %r
+}
+
+; sizeof(%struct.5xi32) > 16, in x64 this is passed on stack.
+define dso_local i32 @test5(ptr byval(%struct.5xi32) %0) {
+; CHECK-LABEL: test5:
+; CHECK: # %bb.0:
+; CHECK-NEXT: jmp Func5 # TAILCALL
+ %r = musttail call i32 @Func5(ptr byval(%struct.5xi32) %0)
+ ret i32 %r
+}
+
+; Test passing multiple arguments with different sizes on stack. In x64 Linux
+; the first 6 are passed by register.
+define dso_local i32 @testManyArgs(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5, i8 %6, ptr byval(%struct.5xi32) %7) {
+; CHECK-LABEL: testManyArgs:
+; CHECK: # %bb.0:
+; CHECK-NEXT: jmp FuncManyArgs # TAILCALL
+ %r = musttail call i32 @FuncManyArgs(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5, i8 %6, ptr byval(%struct.5xi32) %7)
+ ret i32 %r
+}
+
+define dso_local i32 @testRecursion(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5, i8 %6, ptr byval(%struct.5xi32) %7) {
+; CHECK-LABEL: testRecursion:
+; CHECK: # %bb.0:
+; CHECK-NEXT: jmp testRecursion # TAILCALL
+ %r = musttail call i32 @testRecursion(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5, i8 %6, ptr byval(%struct.5xi32) %7)
+ ret i32 %r
+}
diff --git a/llvm/test/CodeGen/X86/musttail-tailcc.ll b/llvm/test/CodeGen/X86/musttail-tailcc.ll
index fae698d53b927..f1ffbcb1142c5 100644
--- a/llvm/test/CodeGen/X86/musttail-tailcc.ll
+++ b/llvm/test/CodeGen/X86/musttail-tailcc.ll
@@ -55,15 +55,6 @@ define dso_local tailcc void @void_test(i32, i32, i32, i32) {
;
; X86-LABEL: void_test:
; X86: # %bb.0: # %entry
-; X86-NEXT: pushl %esi
-; X86-NEXT: .cfi_def_cfa_offset 8
-; X86-NEXT: .cfi_offset %esi, -8
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: movl %esi, {{[0-9]+}}(%esp)
-; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
-; X86-NEXT: popl %esi
-; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: jmp void_test # TAILCALL
entry:
musttail call tailcc void @void_test( i32 %0, i32 %1, i32 %2, i32 %3)
@@ -77,15 +68,6 @@ define dso_local tailcc i1 @i1test(i32, i32, i32, i32) {
;
; X86-LABEL: i1test:
; X86: # %bb.0: # %entry
-; X86-NEXT: pushl %esi
-; X86-NEXT: .cfi_def_cfa_offset 8
-; X86-NEXT: .cfi_offset %esi, -8
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: movl %esi, {{[0-9]+}}(%esp)
-; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
-; X86-NEXT: popl %esi
-; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: jmp i1test # TAILCALL
entry:
%4 = musttail call tailcc i1 @i1test( i32 %0, i32 %1, i32 %2, i32 %3)
diff --git a/llvm/test/CodeGen/X86/swifttailcc-store-ret-address-aliasing-stack-slot.ll b/llvm/test/CodeGen/X86/swifttailcc-store-ret-address-aliasing-stack-slot.ll
index cd669768705e5..b901d22f66392 100644
--- a/llvm/test/CodeGen/X86/swifttailcc-store-ret-address-aliasing-stack-slot.ll
+++ b/llvm/test/CodeGen/X86/swifttailcc-store-ret-address-aliasing-stack-slot.ll
@@ -1,3 +1,4 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
; RUN: llc %s -o - | FileCheck %s
target triple = "x86_64-apple-macosx"
@@ -24,9 +25,7 @@ define swifttailcc void @test(ptr %0, ptr swiftasync %1, i64 %2, i64 %3, ptr %4,
; CHECK-NEXT: movq {{[0-9]+}}(%rsp), %r15
; CHECK-NEXT: callq _foo
; CHECK-NEXT: movq %r14, (%rax)
-; CHECK-NEXT: movl [[OFF:[0-9]+]](%rsp), %edx
-; CHECK-NEXT: movq {{[0-9]+}}(%rsp), %rcx
-; CHECK-NEXT: movq %rcx, [[OFF]](%rsp)
+; CHECK-NEXT: movl {{[0-9]+}}(%rsp), %edx
; CHECK-NEXT: movq %rax, %r14
; CHECK-NEXT: movq %r13, %rdi
; CHECK-NEXT: movq %r15, %rsi
@@ -34,7 +33,6 @@ define swifttailcc void @test(ptr %0, ptr swiftasync %1, i64 %2, i64 %3, ptr %4,
; CHECK-NEXT: addq $8, %rsp
; CHECK-NEXT: popq %rbx
; CHECK-NEXT: popq %r15
-; CHECK-NEXT: addq $16, %rsp
; CHECK-NEXT: jmp _tc_fn ## TAILCALL
entry:
%res = tail call ptr @foo()
|
🐧 Linux x64 Test Results
|
|
As discussed in https://reviews.llvm.org/D131034, this is still broken in cases where the argument lists don't match. If you want an example of how to handle this case, see #109943. |
Interesting, I had been looking for something like that. However, #168506 appears to use a different, simpler approach to order the reads and writes. Does that look right (and perhaps preferable) to you? |
|
As far as I can tell, LoongArchTargetLowering::LowerCall always copies byval arguments to a temporary, then copies them into the argument list. Which is a legal implementation, but not really optimized. |
|
Allright, so generally you'd like to see an x86 implementation that resembles the arm one more? Because currently the x86 implementation is quite different (and, as established, very broken). This will be tough (for me) but I'm willing to give that a go. |
|
I'd prefer the more optimized implementation... but really, I'd be okay with any complete implementation that doesn't regress non-musttail cases. I just don't want to leave a lurking miscompile. |
92b22b4 to
839564d
Compare
| // If we are doing a tail-call, any byval arguments will be written to stack | ||
| // space which was used for incoming arguments. If any the values being used | ||
| // are incoming byval arguments to this function, then they might be | ||
| // overwritten by the stores of the outgoing arguments. To avoid this, we | ||
| // need to make a temporary copy of them in local stack space, then copy back | ||
| // to the argument area. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sort of as a general question: wouldn't more code sharing between backends be a good idea? each backend reinventing this wheel just seems kind of unfortunate.
Anyway, this is my best effort of translating this from the arm backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's been continual effort to refactor call lowering to share various bits of code... but it's hard because every calling convention has its own weird quirks. Not sure how you'd extract out byval lowering, specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From rustc's perspective it is unfortunate that tail call support is so inconsistent between backends, so I'm trying to see if there is some more structural way to fix that.
But yeah I can totally see how it would get messy. I guess we'll see how this PR works out and if there is anything that seems shareable.
839564d to
4636011
Compare
|
Allright, I gave this a go, but I'll need some help polishing this. |
| // If we are doing a tail-call, any byval arguments will be written to stack | ||
| // space which was used for incoming arguments. If any the values being used | ||
| // are incoming byval arguments to this function, then they might be | ||
| // overwritten by the stores of the outgoing arguments. To avoid this, we | ||
| // need to make a temporary copy of them in local stack space, then copy back | ||
| // to the argument area. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's been continual effort to refactor call lowering to share various bits of code... but it's hard because every calling convention has its own weird quirks. Not sure how you'd extract out byval lowering, specifically.
folkertdev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was very helpful, thanks!
05af31d to
238805f
Compare
238805f to
346a911
Compare
Currently this logic is only actually used for musttail calls.
In particular, we don't need to bother with shuffling arguments around on the stack, because in the x86 backend, only functions that do not need to move arguments around on the stack are considered sibcalls.
346a911 to
e5dbe7b
Compare
efriedma-quic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know when you're ready for another round of review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of weird; do you know what's happening?
|
Ah yeah sorry for kind of thinking out loud here. Barring CI failures on some test I missed, I think this should be good. So to summarize, I think we could now make |
|
CI got further than it has so far, so I think the codegen tests are good. I've now added the x32 tests, nothing jumps out to me as incorrect. |
fixes #56891
fixes #72390
Currently the x86 backend miscompiles straightforward tail calls when the stack is used for argument passing. This program segfaults on any optimization level:
https://godbolt.org/z/5xr99jr4v
The immediate issue is that
callerdecides to shuffle values around on the stack, and in the process writes to*rsp, which contains the return address. With the return address trashed, theretincalleejumps to an invalid address.However, I think the actual problem is that the x86 backend never considers
musttailcalls to be sibcalls. For sibcalls, no stack reshuffling is required at all, circumventing the problem here.This PR essentially copies https://reviews.llvm.org/D131034 (cc @huangjd), but this time I hope we can actually land this, and solve this problem.
The aarch64 backend also miscompiled this example, but they appear to have fixed it in LLVM 20.
Tail calls just not working for any sort of non-trivial argument types is a blocker for tail call support in rust, see rust-lang/rust#144855 (comment).