-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[ARM] Use correct ABI for atomic functions #128891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
In AAPCS64, the high bits of registers used to pass small arguments are unspecified, and it is the callee's job to sign- or zero-extend them if needed. This means that we don't need to do this extension in the caller.
The AArch32 PCS requires the caller to sign- or zero-extend small integer types to 32-bit before passing them to a function. For most calls this is handled by clang by setting the zeroext or signext parameter attributes, but we were adding calls to library functions in AtomicExpandPass without doing this. Fixes llvm#61880
The AArch32 PCS passes small integer arguments in registers by zero- or sign-extending them in the caller, but we were previously generating calls to the __atomic and __sync functions which left other values in the high bits. This is important in practice for the atomic min/max functions, which have signed versions which expect the value to have been correctly sign-extended. Fixes llvm#61880.
|
@llvm/pr-subscribers-llvm-selectiondag @llvm/pr-subscribers-backend-arm Author: Oliver Stannard (ostannard) ChangesThe AArch32 PCS passes small integer arguments in registers by zero- or sign-extending them in the caller, but we were previously generating calls to the __atomic and __sync functions which left other values in the high bits. This is important in practice for the atomic min/max functions, which have signed versions which expect the value to have been correctly sign-extended. This bug existed in two places: the AtomicExpand IR pass, and in DAG ISel. For the firrst, the fix is to add the While testing this, I noticed that AArch64 is doing this extension in some cases, but that isn't needed because in the AArch64 PCS the high bits are unspecified, the callee does the extension. There was already a target lowering hook to configure this, so I've set it correctly for AArch64. Fixes #61880 Patch is 39.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/128891.diff 11 Files Affected:
diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index a75fa688d87a8..b13d380ae7c7e 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -1999,6 +1999,19 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
Value *IntValue =
Builder.CreateBitOrPointerCast(ValueOperand, SizedIntTy);
Args.push_back(IntValue);
+
+ // Set the zeroext/signext attributes on the parameter if needed to match
+ // the target's ABI.
+ if (TLI->shouldExtendTypeInLibCall(
+ TLI->getMemValueType(DL, SizedIntTy))) {
+ // The only atomic operations affected by signedness are min/max, and
+ // we don't have __atomic_ libcalls for them, so IsSigned is always
+ // false.
+ if (TLI->shouldSignExtendTypeInLibCall(SizedIntTy, false /*IsSigned*/))
+ Attr = Attr.addParamAttribute(Ctx, Args.size() - 1, Attribute::SExt);
+ else
+ Attr = Attr.addParamAttribute(Ctx, Args.size() - 1, Attribute::ZExt);
+ }
} else {
AllocaValue = AllocaBuilder.CreateAlloca(ValueOperand->getType());
AllocaValue->setAlignment(AllocaAlignment);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index f56097fdbb51a..9cd4e42cfd062 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -4386,23 +4386,46 @@ void SelectionDAGLegalize::ConvertNodeToLibcall(SDNode *Node) {
AtomicOrdering Order = cast<AtomicSDNode>(Node)->getMergedOrdering();
RTLIB::Libcall LC = RTLIB::getOUTLINE_ATOMIC(Opc, Order, VT);
EVT RetVT = Node->getValueType(0);
+ SDValue ChainIn = Node->getOperand(0);
+ SDValue Pointer = Node->getOperand(1);
+ SDLoc dl(Node);
SmallVector<SDValue, 4> Ops;
+
+ // Zero/sign extend small operands if required by the target's ABI.
+ SmallVector<SDValue, 4> ExtendedOps;
+ for (auto Op = Node->op_begin() + 2, E = Node->op_end(); Op != E; ++Op) {
+ if (TLI.shouldExtendTypeInLibCall(VT)) {
+ bool IsSigned =
+ Opc == ISD::ATOMIC_LOAD_MIN || Opc == ISD::ATOMIC_LOAD_MAX;
+ if (TLI.shouldSignExtendTypeInLibCall(
+ EVT(VT).getTypeForEVT(*DAG.getContext()), IsSigned))
+ ExtendedOps.push_back(DAG.getNode(ISD::SIGN_EXTEND_INREG, dl,
+ Op->getValueType(), *Op,
+ DAG.getValueType(VT)));
+ else
+ ExtendedOps.push_back(DAG.getZeroExtendInReg(*Op, dl, VT));
+
+ } else {
+ ExtendedOps.push_back(*Op);
+ }
+ }
+
if (TLI.getLibcallName(LC)) {
// If outline atomic available, prepare its arguments and expand.
- Ops.append(Node->op_begin() + 2, Node->op_end());
- Ops.push_back(Node->getOperand(1));
+ Ops.append(ExtendedOps.begin(), ExtendedOps.end());
+ Ops.push_back(Pointer);
} else {
LC = RTLIB::getSYNC(Opc, VT);
assert(LC != RTLIB::UNKNOWN_LIBCALL &&
"Unexpected atomic op or value type!");
// Arguments for expansion to sync libcall
- Ops.append(Node->op_begin() + 1, Node->op_end());
+ Ops.push_back(Pointer);
+ Ops.append(ExtendedOps.begin(), ExtendedOps.end());
}
- std::pair<SDValue, SDValue> Tmp = TLI.makeLibCall(DAG, LC, RetVT,
- Ops, CallOptions,
- SDLoc(Node),
- Node->getOperand(0));
+
+ std::pair<SDValue, SDValue> Tmp =
+ TLI.makeLibCall(DAG, LC, RetVT, Ops, CallOptions, dl, ChainIn);
Results.push_back(Tmp.first);
Results.push_back(Tmp.second);
break;
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index 1987c892ac080..9a009f0eb6980 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -1378,6 +1378,10 @@ class AArch64TargetLowering : public TargetLowering {
bool shouldScalarizeBinop(SDValue VecOp) const override {
return VecOp.getOpcode() == ISD::SETCC;
}
+
+ bool shouldExtendTypeInLibCall(EVT Type) const override {
+ return false;
+ }
};
namespace AArch64 {
diff --git a/llvm/test/CodeGen/AArch64/atomicrmw-fadd.ll b/llvm/test/CodeGen/AArch64/atomicrmw-fadd.ll
index 21729b9dfd101..b650040617ecd 100644
--- a/llvm/test/CodeGen/AArch64/atomicrmw-fadd.ll
+++ b/llvm/test/CodeGen/AArch64/atomicrmw-fadd.ll
@@ -58,13 +58,13 @@ define half @test_atomicrmw_fadd_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: .LBB0_2: // %atomicrmw.start
; SOFTFP-NOLSE-NEXT: // =>This Loop Header: Depth=1
; SOFTFP-NOLSE-NEXT: // Child Loop BB0_3 Depth 2
-; SOFTFP-NOLSE-NEXT: mov w22, w0
-; SOFTFP-NOLSE-NEXT: and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w21, w0
-; SOFTFP-NOLSE-NEXT: and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w20
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT: mov w1, w21
+; SOFTFP-NOLSE-NEXT: mov w22, w0
+; SOFTFP-NOLSE-NEXT: mov w0, w21
+; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT: mov w1, w22
; SOFTFP-NOLSE-NEXT: bl __addsf3
; SOFTFP-NOLSE-NEXT: bl __truncsfhf2
; SOFTFP-NOLSE-NEXT: mov w8, w0
@@ -72,7 +72,7 @@ define half @test_atomicrmw_fadd_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: // Parent Loop BB0_2 Depth=1
; SOFTFP-NOLSE-NEXT: // => This Inner Loop Header: Depth=2
; SOFTFP-NOLSE-NEXT: ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT: cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT: cmp w0, w21, uxth
; SOFTFP-NOLSE-NEXT: b.ne .LBB0_1
; SOFTFP-NOLSE-NEXT: // %bb.4: // %cmpxchg.trystore
; SOFTFP-NOLSE-NEXT: // in Loop: Header=BB0_3 Depth=2
@@ -146,13 +146,13 @@ define half @test_atomicrmw_fadd_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: .LBB1_2: // %atomicrmw.start
; SOFTFP-NOLSE-NEXT: // =>This Loop Header: Depth=1
; SOFTFP-NOLSE-NEXT: // Child Loop BB1_3 Depth 2
-; SOFTFP-NOLSE-NEXT: mov w22, w0
-; SOFTFP-NOLSE-NEXT: and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w21, w0
-; SOFTFP-NOLSE-NEXT: and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w20
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT: mov w1, w21
+; SOFTFP-NOLSE-NEXT: mov w22, w0
+; SOFTFP-NOLSE-NEXT: mov w0, w21
+; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT: mov w1, w22
; SOFTFP-NOLSE-NEXT: bl __addsf3
; SOFTFP-NOLSE-NEXT: bl __truncsfhf2
; SOFTFP-NOLSE-NEXT: mov w8, w0
@@ -160,7 +160,7 @@ define half @test_atomicrmw_fadd_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: // Parent Loop BB1_2 Depth=1
; SOFTFP-NOLSE-NEXT: // => This Inner Loop Header: Depth=2
; SOFTFP-NOLSE-NEXT: ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT: cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT: cmp w0, w21, uxth
; SOFTFP-NOLSE-NEXT: b.ne .LBB1_1
; SOFTFP-NOLSE-NEXT: // %bb.4: // %cmpxchg.trystore
; SOFTFP-NOLSE-NEXT: // in Loop: Header=BB1_3 Depth=2
@@ -711,19 +711,19 @@ define <2 x half> @test_atomicrmw_fadd_v2f16_seq_cst_align4(ptr %ptr, <2 x half>
; SOFTFP-NOLSE-NEXT: .LBB7_2: // %atomicrmw.start
; SOFTFP-NOLSE-NEXT: // =>This Loop Header: Depth=1
; SOFTFP-NOLSE-NEXT: // Child Loop BB7_3 Depth 2
-; SOFTFP-NOLSE-NEXT: and w0, w19, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w19
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w24, w0
-; SOFTFP-NOLSE-NEXT: and w0, w23, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w23
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w1, w24
; SOFTFP-NOLSE-NEXT: bl __addsf3
; SOFTFP-NOLSE-NEXT: bl __truncsfhf2
; SOFTFP-NOLSE-NEXT: mov w24, w0
-; SOFTFP-NOLSE-NEXT: and w0, w21, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w21
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w25, w0
-; SOFTFP-NOLSE-NEXT: and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w22
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w1, w25
; SOFTFP-NOLSE-NEXT: bl __addsf3
diff --git a/llvm/test/CodeGen/AArch64/atomicrmw-fmax.ll b/llvm/test/CodeGen/AArch64/atomicrmw-fmax.ll
index 9b5e48d2b4217..41c5afe0f64a9 100644
--- a/llvm/test/CodeGen/AArch64/atomicrmw-fmax.ll
+++ b/llvm/test/CodeGen/AArch64/atomicrmw-fmax.ll
@@ -60,13 +60,13 @@ define half @test_atomicrmw_fmax_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: .LBB0_2: // %atomicrmw.start
; SOFTFP-NOLSE-NEXT: // =>This Loop Header: Depth=1
; SOFTFP-NOLSE-NEXT: // Child Loop BB0_3 Depth 2
-; SOFTFP-NOLSE-NEXT: mov w22, w0
-; SOFTFP-NOLSE-NEXT: and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w21, w0
-; SOFTFP-NOLSE-NEXT: and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w20
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT: mov w1, w21
+; SOFTFP-NOLSE-NEXT: mov w22, w0
+; SOFTFP-NOLSE-NEXT: mov w0, w21
+; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT: mov w1, w22
; SOFTFP-NOLSE-NEXT: bl fmaxf
; SOFTFP-NOLSE-NEXT: bl __truncsfhf2
; SOFTFP-NOLSE-NEXT: mov w8, w0
@@ -74,7 +74,7 @@ define half @test_atomicrmw_fmax_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: // Parent Loop BB0_2 Depth=1
; SOFTFP-NOLSE-NEXT: // => This Inner Loop Header: Depth=2
; SOFTFP-NOLSE-NEXT: ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT: cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT: cmp w0, w21, uxth
; SOFTFP-NOLSE-NEXT: b.ne .LBB0_1
; SOFTFP-NOLSE-NEXT: // %bb.4: // %cmpxchg.trystore
; SOFTFP-NOLSE-NEXT: // in Loop: Header=BB0_3 Depth=2
@@ -148,13 +148,13 @@ define half @test_atomicrmw_fmax_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: .LBB1_2: // %atomicrmw.start
; SOFTFP-NOLSE-NEXT: // =>This Loop Header: Depth=1
; SOFTFP-NOLSE-NEXT: // Child Loop BB1_3 Depth 2
-; SOFTFP-NOLSE-NEXT: mov w22, w0
-; SOFTFP-NOLSE-NEXT: and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w21, w0
-; SOFTFP-NOLSE-NEXT: and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w20
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT: mov w1, w21
+; SOFTFP-NOLSE-NEXT: mov w22, w0
+; SOFTFP-NOLSE-NEXT: mov w0, w21
+; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT: mov w1, w22
; SOFTFP-NOLSE-NEXT: bl fmaxf
; SOFTFP-NOLSE-NEXT: bl __truncsfhf2
; SOFTFP-NOLSE-NEXT: mov w8, w0
@@ -162,7 +162,7 @@ define half @test_atomicrmw_fmax_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: // Parent Loop BB1_2 Depth=1
; SOFTFP-NOLSE-NEXT: // => This Inner Loop Header: Depth=2
; SOFTFP-NOLSE-NEXT: ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT: cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT: cmp w0, w21, uxth
; SOFTFP-NOLSE-NEXT: b.ne .LBB1_1
; SOFTFP-NOLSE-NEXT: // %bb.4: // %cmpxchg.trystore
; SOFTFP-NOLSE-NEXT: // in Loop: Header=BB1_3 Depth=2
@@ -591,19 +591,19 @@ define <2 x half> @test_atomicrmw_fmax_v2f16_seq_cst_align4(ptr %ptr, <2 x half>
; SOFTFP-NOLSE-NEXT: .LBB6_2: // %atomicrmw.start
; SOFTFP-NOLSE-NEXT: // =>This Loop Header: Depth=1
; SOFTFP-NOLSE-NEXT: // Child Loop BB6_3 Depth 2
-; SOFTFP-NOLSE-NEXT: and w0, w19, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w19
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w24, w0
-; SOFTFP-NOLSE-NEXT: and w0, w23, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w23
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w1, w24
; SOFTFP-NOLSE-NEXT: bl fmaxf
; SOFTFP-NOLSE-NEXT: bl __truncsfhf2
; SOFTFP-NOLSE-NEXT: mov w24, w0
-; SOFTFP-NOLSE-NEXT: and w0, w21, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w21
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w25, w0
-; SOFTFP-NOLSE-NEXT: and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w22
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w1, w25
; SOFTFP-NOLSE-NEXT: bl fmaxf
diff --git a/llvm/test/CodeGen/AArch64/atomicrmw-fmin.ll b/llvm/test/CodeGen/AArch64/atomicrmw-fmin.ll
index f6c542fe7d407..a01bd182e61e6 100644
--- a/llvm/test/CodeGen/AArch64/atomicrmw-fmin.ll
+++ b/llvm/test/CodeGen/AArch64/atomicrmw-fmin.ll
@@ -60,13 +60,13 @@ define half @test_atomicrmw_fmin_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: .LBB0_2: // %atomicrmw.start
; SOFTFP-NOLSE-NEXT: // =>This Loop Header: Depth=1
; SOFTFP-NOLSE-NEXT: // Child Loop BB0_3 Depth 2
-; SOFTFP-NOLSE-NEXT: mov w22, w0
-; SOFTFP-NOLSE-NEXT: and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w21, w0
-; SOFTFP-NOLSE-NEXT: and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w20
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT: mov w1, w21
+; SOFTFP-NOLSE-NEXT: mov w22, w0
+; SOFTFP-NOLSE-NEXT: mov w0, w21
+; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT: mov w1, w22
; SOFTFP-NOLSE-NEXT: bl fminf
; SOFTFP-NOLSE-NEXT: bl __truncsfhf2
; SOFTFP-NOLSE-NEXT: mov w8, w0
@@ -74,7 +74,7 @@ define half @test_atomicrmw_fmin_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: // Parent Loop BB0_2 Depth=1
; SOFTFP-NOLSE-NEXT: // => This Inner Loop Header: Depth=2
; SOFTFP-NOLSE-NEXT: ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT: cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT: cmp w0, w21, uxth
; SOFTFP-NOLSE-NEXT: b.ne .LBB0_1
; SOFTFP-NOLSE-NEXT: // %bb.4: // %cmpxchg.trystore
; SOFTFP-NOLSE-NEXT: // in Loop: Header=BB0_3 Depth=2
@@ -148,13 +148,13 @@ define half @test_atomicrmw_fmin_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: .LBB1_2: // %atomicrmw.start
; SOFTFP-NOLSE-NEXT: // =>This Loop Header: Depth=1
; SOFTFP-NOLSE-NEXT: // Child Loop BB1_3 Depth 2
-; SOFTFP-NOLSE-NEXT: mov w22, w0
-; SOFTFP-NOLSE-NEXT: and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w21, w0
-; SOFTFP-NOLSE-NEXT: and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w20
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT: mov w1, w21
+; SOFTFP-NOLSE-NEXT: mov w22, w0
+; SOFTFP-NOLSE-NEXT: mov w0, w21
+; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT: mov w1, w22
; SOFTFP-NOLSE-NEXT: bl fminf
; SOFTFP-NOLSE-NEXT: bl __truncsfhf2
; SOFTFP-NOLSE-NEXT: mov w8, w0
@@ -162,7 +162,7 @@ define half @test_atomicrmw_fmin_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: // Parent Loop BB1_2 Depth=1
; SOFTFP-NOLSE-NEXT: // => This Inner Loop Header: Depth=2
; SOFTFP-NOLSE-NEXT: ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT: cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT: cmp w0, w21, uxth
; SOFTFP-NOLSE-NEXT: b.ne .LBB1_1
; SOFTFP-NOLSE-NEXT: // %bb.4: // %cmpxchg.trystore
; SOFTFP-NOLSE-NEXT: // in Loop: Header=BB1_3 Depth=2
@@ -591,19 +591,19 @@ define <2 x half> @test_atomicrmw_fmin_v2f16_seq_cst_align4(ptr %ptr, <2 x half>
; SOFTFP-NOLSE-NEXT: .LBB6_2: // %atomicrmw.start
; SOFTFP-NOLSE-NEXT: // =>This Loop Header: Depth=1
; SOFTFP-NOLSE-NEXT: // Child Loop BB6_3 Depth 2
-; SOFTFP-NOLSE-NEXT: and w0, w19, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w19
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w24, w0
-; SOFTFP-NOLSE-NEXT: and w0, w23, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w23
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w1, w24
; SOFTFP-NOLSE-NEXT: bl fminf
; SOFTFP-NOLSE-NEXT: bl __truncsfhf2
; SOFTFP-NOLSE-NEXT: mov w24, w0
-; SOFTFP-NOLSE-NEXT: and w0, w21, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w21
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w25, w0
-; SOFTFP-NOLSE-NEXT: and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w22
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w1, w25
; SOFTFP-NOLSE-NEXT: bl fminf
diff --git a/llvm/test/CodeGen/AArch64/atomicrmw-fsub.ll b/llvm/test/CodeGen/AArch64/atomicrmw-fsub.ll
index 82e0f14e68e26..01beb5c50afdd 100644
--- a/llvm/test/CodeGen/AArch64/atomicrmw-fsub.ll
+++ b/llvm/test/CodeGen/AArch64/atomicrmw-fsub.ll
@@ -58,13 +58,13 @@ define half @test_atomicrmw_fsub_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: .LBB0_2: // %atomicrmw.start
; SOFTFP-NOLSE-NEXT: // =>This Loop Header: Depth=1
; SOFTFP-NOLSE-NEXT: // Child Loop BB0_3 Depth 2
-; SOFTFP-NOLSE-NEXT: mov w22, w0
-; SOFTFP-NOLSE-NEXT: and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w21, w0
-; SOFTFP-NOLSE-NEXT: and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w20
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT: mov w1, w21
+; SOFTFP-NOLSE-NEXT: mov w22, w0
+; SOFTFP-NOLSE-NEXT: mov w0, w21
+; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT: mov w1, w22
; SOFTFP-NOLSE-NEXT: bl __subsf3
; SOFTFP-NOLSE-NEXT: bl __truncsfhf2
; SOFTFP-NOLSE-NEXT: mov w8, w0
@@ -72,7 +72,7 @@ define half @test_atomicrmw_fsub_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: // Parent Loop BB0_2 Depth=1
; SOFTFP-NOLSE-NEXT: // => This Inner Loop Header: Depth=2
; SOFTFP-NOLSE-NEXT: ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT: cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT: cmp w0, w21, uxth
; SOFTFP-NOLSE-NEXT: b.ne .LBB0_1
; SOFTFP-NOLSE-NEXT: // %bb.4: // %cmpxchg.trystore
; SOFTFP-NOLSE-NEXT: // in Loop: Header=BB0_3 Depth=2
@@ -146,13 +146,13 @@ define half @test_atomicrmw_fsub_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: .LBB1_2: // %atomicrmw.start
; SOFTFP-NOLSE-NEXT: // =>This Loop Header: Depth=1
; SOFTFP-NOLSE-NEXT: // Child Loop BB1_3 Depth 2
-; SOFTFP-NOLSE-NEXT: mov w22, w0
-; SOFTFP-NOLSE-NEXT: and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w21, w0
-; SOFTFP-NOLSE-NEXT: and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w20
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT: mov w1, w21
+; SOFTFP-NOLSE-NEXT: mov w22, w0
+; SOFTFP-NOLSE-NEXT: mov w0, w21
+; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT: mov w1, w22
; SOFTFP-NOLSE-NEXT: bl __subsf3
; SOFTFP-NOLSE-NEXT: bl __truncsfhf2
; SOFTFP-NOLSE-NEXT: mov w8, w0
@@ -160,7 +160,7 @@ define half @test_atomicrmw_fsub_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: // Parent Loop BB1_2 Depth=1
; SOFTFP-NOLSE-NEXT: // => This Inner Loop Header: Depth=2
; SOFTFP-NOLSE-NEXT: ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT: cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT: cmp w0, w21, uxth
; SOFTFP-NOLSE-NEXT: b.ne .LBB1_1
; SOFTFP-NOLSE-NEXT: // %bb.4: // %cmpxchg.trystore
; SOFTFP-NOLSE-NEXT: // in Loop: Header=BB1_3 Depth=2
@@ -711,19 +711,19 @@ define <2 x half> @test_atomicrmw_fsub_v2f16_seq_cst_align4(ptr %ptr, <2 x half>
; SOFTFP-NOLSE-NEXT: .LBB7_2: // %atomicrmw.start
; SOFTFP-NOLSE-NEXT: // =>This Loop Header: Depth=1
; SOFTFP-NOLSE-NEXT: // Child Loop BB7_3 Depth 2
-; SOFTFP-NOLSE-NEXT: and w0, w19, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w19
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w24, w0
-; SOFTFP-NOLSE-NEXT: and w0, w23, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w23
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w1, w24
; SOFTFP-NOLSE-NEXT: bl __subsf3
; SOFTFP-NOLSE-NEXT: bl __truncsfhf2
; SOFTFP-NOLSE-NEXT: mov w24, w0
-; SOFTFP-NOLSE-NEXT: and w0, w21, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w21
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w25, w0
-; SOFTFP-NOLSE-NEXT: and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w22
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w1, w25
; SOFTFP-NOLSE-NEXT: bl __subsf3
diff --git a/llvm/test/CodeGen/AArch64/strictfp_f16_abi_promote.ll b/llvm/test/CodeGen/AArc...
[truncated]
|
|
@llvm/pr-subscribers-backend-aarch64 Author: Oliver Stannard (ostannard) ChangesThe AArch32 PCS passes small integer arguments in registers by zero- or sign-extending them in the caller, but we were previously generating calls to the __atomic and __sync functions which left other values in the high bits. This is important in practice for the atomic min/max functions, which have signed versions which expect the value to have been correctly sign-extended. This bug existed in two places: the AtomicExpand IR pass, and in DAG ISel. For the firrst, the fix is to add the While testing this, I noticed that AArch64 is doing this extension in some cases, but that isn't needed because in the AArch64 PCS the high bits are unspecified, the callee does the extension. There was already a target lowering hook to configure this, so I've set it correctly for AArch64. Fixes #61880 Patch is 39.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/128891.diff 11 Files Affected:
diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index a75fa688d87a8..b13d380ae7c7e 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -1999,6 +1999,19 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
Value *IntValue =
Builder.CreateBitOrPointerCast(ValueOperand, SizedIntTy);
Args.push_back(IntValue);
+
+ // Set the zeroext/signext attributes on the parameter if needed to match
+ // the target's ABI.
+ if (TLI->shouldExtendTypeInLibCall(
+ TLI->getMemValueType(DL, SizedIntTy))) {
+ // The only atomic operations affected by signedness are min/max, and
+ // we don't have __atomic_ libcalls for them, so IsSigned is always
+ // false.
+ if (TLI->shouldSignExtendTypeInLibCall(SizedIntTy, false /*IsSigned*/))
+ Attr = Attr.addParamAttribute(Ctx, Args.size() - 1, Attribute::SExt);
+ else
+ Attr = Attr.addParamAttribute(Ctx, Args.size() - 1, Attribute::ZExt);
+ }
} else {
AllocaValue = AllocaBuilder.CreateAlloca(ValueOperand->getType());
AllocaValue->setAlignment(AllocaAlignment);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index f56097fdbb51a..9cd4e42cfd062 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -4386,23 +4386,46 @@ void SelectionDAGLegalize::ConvertNodeToLibcall(SDNode *Node) {
AtomicOrdering Order = cast<AtomicSDNode>(Node)->getMergedOrdering();
RTLIB::Libcall LC = RTLIB::getOUTLINE_ATOMIC(Opc, Order, VT);
EVT RetVT = Node->getValueType(0);
+ SDValue ChainIn = Node->getOperand(0);
+ SDValue Pointer = Node->getOperand(1);
+ SDLoc dl(Node);
SmallVector<SDValue, 4> Ops;
+
+ // Zero/sign extend small operands if required by the target's ABI.
+ SmallVector<SDValue, 4> ExtendedOps;
+ for (auto Op = Node->op_begin() + 2, E = Node->op_end(); Op != E; ++Op) {
+ if (TLI.shouldExtendTypeInLibCall(VT)) {
+ bool IsSigned =
+ Opc == ISD::ATOMIC_LOAD_MIN || Opc == ISD::ATOMIC_LOAD_MAX;
+ if (TLI.shouldSignExtendTypeInLibCall(
+ EVT(VT).getTypeForEVT(*DAG.getContext()), IsSigned))
+ ExtendedOps.push_back(DAG.getNode(ISD::SIGN_EXTEND_INREG, dl,
+ Op->getValueType(), *Op,
+ DAG.getValueType(VT)));
+ else
+ ExtendedOps.push_back(DAG.getZeroExtendInReg(*Op, dl, VT));
+
+ } else {
+ ExtendedOps.push_back(*Op);
+ }
+ }
+
if (TLI.getLibcallName(LC)) {
// If outline atomic available, prepare its arguments and expand.
- Ops.append(Node->op_begin() + 2, Node->op_end());
- Ops.push_back(Node->getOperand(1));
+ Ops.append(ExtendedOps.begin(), ExtendedOps.end());
+ Ops.push_back(Pointer);
} else {
LC = RTLIB::getSYNC(Opc, VT);
assert(LC != RTLIB::UNKNOWN_LIBCALL &&
"Unexpected atomic op or value type!");
// Arguments for expansion to sync libcall
- Ops.append(Node->op_begin() + 1, Node->op_end());
+ Ops.push_back(Pointer);
+ Ops.append(ExtendedOps.begin(), ExtendedOps.end());
}
- std::pair<SDValue, SDValue> Tmp = TLI.makeLibCall(DAG, LC, RetVT,
- Ops, CallOptions,
- SDLoc(Node),
- Node->getOperand(0));
+
+ std::pair<SDValue, SDValue> Tmp =
+ TLI.makeLibCall(DAG, LC, RetVT, Ops, CallOptions, dl, ChainIn);
Results.push_back(Tmp.first);
Results.push_back(Tmp.second);
break;
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index 1987c892ac080..9a009f0eb6980 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -1378,6 +1378,10 @@ class AArch64TargetLowering : public TargetLowering {
bool shouldScalarizeBinop(SDValue VecOp) const override {
return VecOp.getOpcode() == ISD::SETCC;
}
+
+ bool shouldExtendTypeInLibCall(EVT Type) const override {
+ return false;
+ }
};
namespace AArch64 {
diff --git a/llvm/test/CodeGen/AArch64/atomicrmw-fadd.ll b/llvm/test/CodeGen/AArch64/atomicrmw-fadd.ll
index 21729b9dfd101..b650040617ecd 100644
--- a/llvm/test/CodeGen/AArch64/atomicrmw-fadd.ll
+++ b/llvm/test/CodeGen/AArch64/atomicrmw-fadd.ll
@@ -58,13 +58,13 @@ define half @test_atomicrmw_fadd_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: .LBB0_2: // %atomicrmw.start
; SOFTFP-NOLSE-NEXT: // =>This Loop Header: Depth=1
; SOFTFP-NOLSE-NEXT: // Child Loop BB0_3 Depth 2
-; SOFTFP-NOLSE-NEXT: mov w22, w0
-; SOFTFP-NOLSE-NEXT: and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w21, w0
-; SOFTFP-NOLSE-NEXT: and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w20
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT: mov w1, w21
+; SOFTFP-NOLSE-NEXT: mov w22, w0
+; SOFTFP-NOLSE-NEXT: mov w0, w21
+; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT: mov w1, w22
; SOFTFP-NOLSE-NEXT: bl __addsf3
; SOFTFP-NOLSE-NEXT: bl __truncsfhf2
; SOFTFP-NOLSE-NEXT: mov w8, w0
@@ -72,7 +72,7 @@ define half @test_atomicrmw_fadd_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: // Parent Loop BB0_2 Depth=1
; SOFTFP-NOLSE-NEXT: // => This Inner Loop Header: Depth=2
; SOFTFP-NOLSE-NEXT: ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT: cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT: cmp w0, w21, uxth
; SOFTFP-NOLSE-NEXT: b.ne .LBB0_1
; SOFTFP-NOLSE-NEXT: // %bb.4: // %cmpxchg.trystore
; SOFTFP-NOLSE-NEXT: // in Loop: Header=BB0_3 Depth=2
@@ -146,13 +146,13 @@ define half @test_atomicrmw_fadd_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: .LBB1_2: // %atomicrmw.start
; SOFTFP-NOLSE-NEXT: // =>This Loop Header: Depth=1
; SOFTFP-NOLSE-NEXT: // Child Loop BB1_3 Depth 2
-; SOFTFP-NOLSE-NEXT: mov w22, w0
-; SOFTFP-NOLSE-NEXT: and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w21, w0
-; SOFTFP-NOLSE-NEXT: and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w20
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT: mov w1, w21
+; SOFTFP-NOLSE-NEXT: mov w22, w0
+; SOFTFP-NOLSE-NEXT: mov w0, w21
+; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT: mov w1, w22
; SOFTFP-NOLSE-NEXT: bl __addsf3
; SOFTFP-NOLSE-NEXT: bl __truncsfhf2
; SOFTFP-NOLSE-NEXT: mov w8, w0
@@ -160,7 +160,7 @@ define half @test_atomicrmw_fadd_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: // Parent Loop BB1_2 Depth=1
; SOFTFP-NOLSE-NEXT: // => This Inner Loop Header: Depth=2
; SOFTFP-NOLSE-NEXT: ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT: cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT: cmp w0, w21, uxth
; SOFTFP-NOLSE-NEXT: b.ne .LBB1_1
; SOFTFP-NOLSE-NEXT: // %bb.4: // %cmpxchg.trystore
; SOFTFP-NOLSE-NEXT: // in Loop: Header=BB1_3 Depth=2
@@ -711,19 +711,19 @@ define <2 x half> @test_atomicrmw_fadd_v2f16_seq_cst_align4(ptr %ptr, <2 x half>
; SOFTFP-NOLSE-NEXT: .LBB7_2: // %atomicrmw.start
; SOFTFP-NOLSE-NEXT: // =>This Loop Header: Depth=1
; SOFTFP-NOLSE-NEXT: // Child Loop BB7_3 Depth 2
-; SOFTFP-NOLSE-NEXT: and w0, w19, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w19
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w24, w0
-; SOFTFP-NOLSE-NEXT: and w0, w23, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w23
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w1, w24
; SOFTFP-NOLSE-NEXT: bl __addsf3
; SOFTFP-NOLSE-NEXT: bl __truncsfhf2
; SOFTFP-NOLSE-NEXT: mov w24, w0
-; SOFTFP-NOLSE-NEXT: and w0, w21, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w21
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w25, w0
-; SOFTFP-NOLSE-NEXT: and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w22
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w1, w25
; SOFTFP-NOLSE-NEXT: bl __addsf3
diff --git a/llvm/test/CodeGen/AArch64/atomicrmw-fmax.ll b/llvm/test/CodeGen/AArch64/atomicrmw-fmax.ll
index 9b5e48d2b4217..41c5afe0f64a9 100644
--- a/llvm/test/CodeGen/AArch64/atomicrmw-fmax.ll
+++ b/llvm/test/CodeGen/AArch64/atomicrmw-fmax.ll
@@ -60,13 +60,13 @@ define half @test_atomicrmw_fmax_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: .LBB0_2: // %atomicrmw.start
; SOFTFP-NOLSE-NEXT: // =>This Loop Header: Depth=1
; SOFTFP-NOLSE-NEXT: // Child Loop BB0_3 Depth 2
-; SOFTFP-NOLSE-NEXT: mov w22, w0
-; SOFTFP-NOLSE-NEXT: and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w21, w0
-; SOFTFP-NOLSE-NEXT: and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w20
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT: mov w1, w21
+; SOFTFP-NOLSE-NEXT: mov w22, w0
+; SOFTFP-NOLSE-NEXT: mov w0, w21
+; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT: mov w1, w22
; SOFTFP-NOLSE-NEXT: bl fmaxf
; SOFTFP-NOLSE-NEXT: bl __truncsfhf2
; SOFTFP-NOLSE-NEXT: mov w8, w0
@@ -74,7 +74,7 @@ define half @test_atomicrmw_fmax_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: // Parent Loop BB0_2 Depth=1
; SOFTFP-NOLSE-NEXT: // => This Inner Loop Header: Depth=2
; SOFTFP-NOLSE-NEXT: ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT: cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT: cmp w0, w21, uxth
; SOFTFP-NOLSE-NEXT: b.ne .LBB0_1
; SOFTFP-NOLSE-NEXT: // %bb.4: // %cmpxchg.trystore
; SOFTFP-NOLSE-NEXT: // in Loop: Header=BB0_3 Depth=2
@@ -148,13 +148,13 @@ define half @test_atomicrmw_fmax_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: .LBB1_2: // %atomicrmw.start
; SOFTFP-NOLSE-NEXT: // =>This Loop Header: Depth=1
; SOFTFP-NOLSE-NEXT: // Child Loop BB1_3 Depth 2
-; SOFTFP-NOLSE-NEXT: mov w22, w0
-; SOFTFP-NOLSE-NEXT: and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w21, w0
-; SOFTFP-NOLSE-NEXT: and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w20
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT: mov w1, w21
+; SOFTFP-NOLSE-NEXT: mov w22, w0
+; SOFTFP-NOLSE-NEXT: mov w0, w21
+; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT: mov w1, w22
; SOFTFP-NOLSE-NEXT: bl fmaxf
; SOFTFP-NOLSE-NEXT: bl __truncsfhf2
; SOFTFP-NOLSE-NEXT: mov w8, w0
@@ -162,7 +162,7 @@ define half @test_atomicrmw_fmax_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: // Parent Loop BB1_2 Depth=1
; SOFTFP-NOLSE-NEXT: // => This Inner Loop Header: Depth=2
; SOFTFP-NOLSE-NEXT: ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT: cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT: cmp w0, w21, uxth
; SOFTFP-NOLSE-NEXT: b.ne .LBB1_1
; SOFTFP-NOLSE-NEXT: // %bb.4: // %cmpxchg.trystore
; SOFTFP-NOLSE-NEXT: // in Loop: Header=BB1_3 Depth=2
@@ -591,19 +591,19 @@ define <2 x half> @test_atomicrmw_fmax_v2f16_seq_cst_align4(ptr %ptr, <2 x half>
; SOFTFP-NOLSE-NEXT: .LBB6_2: // %atomicrmw.start
; SOFTFP-NOLSE-NEXT: // =>This Loop Header: Depth=1
; SOFTFP-NOLSE-NEXT: // Child Loop BB6_3 Depth 2
-; SOFTFP-NOLSE-NEXT: and w0, w19, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w19
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w24, w0
-; SOFTFP-NOLSE-NEXT: and w0, w23, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w23
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w1, w24
; SOFTFP-NOLSE-NEXT: bl fmaxf
; SOFTFP-NOLSE-NEXT: bl __truncsfhf2
; SOFTFP-NOLSE-NEXT: mov w24, w0
-; SOFTFP-NOLSE-NEXT: and w0, w21, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w21
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w25, w0
-; SOFTFP-NOLSE-NEXT: and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w22
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w1, w25
; SOFTFP-NOLSE-NEXT: bl fmaxf
diff --git a/llvm/test/CodeGen/AArch64/atomicrmw-fmin.ll b/llvm/test/CodeGen/AArch64/atomicrmw-fmin.ll
index f6c542fe7d407..a01bd182e61e6 100644
--- a/llvm/test/CodeGen/AArch64/atomicrmw-fmin.ll
+++ b/llvm/test/CodeGen/AArch64/atomicrmw-fmin.ll
@@ -60,13 +60,13 @@ define half @test_atomicrmw_fmin_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: .LBB0_2: // %atomicrmw.start
; SOFTFP-NOLSE-NEXT: // =>This Loop Header: Depth=1
; SOFTFP-NOLSE-NEXT: // Child Loop BB0_3 Depth 2
-; SOFTFP-NOLSE-NEXT: mov w22, w0
-; SOFTFP-NOLSE-NEXT: and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w21, w0
-; SOFTFP-NOLSE-NEXT: and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w20
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT: mov w1, w21
+; SOFTFP-NOLSE-NEXT: mov w22, w0
+; SOFTFP-NOLSE-NEXT: mov w0, w21
+; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT: mov w1, w22
; SOFTFP-NOLSE-NEXT: bl fminf
; SOFTFP-NOLSE-NEXT: bl __truncsfhf2
; SOFTFP-NOLSE-NEXT: mov w8, w0
@@ -74,7 +74,7 @@ define half @test_atomicrmw_fmin_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: // Parent Loop BB0_2 Depth=1
; SOFTFP-NOLSE-NEXT: // => This Inner Loop Header: Depth=2
; SOFTFP-NOLSE-NEXT: ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT: cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT: cmp w0, w21, uxth
; SOFTFP-NOLSE-NEXT: b.ne .LBB0_1
; SOFTFP-NOLSE-NEXT: // %bb.4: // %cmpxchg.trystore
; SOFTFP-NOLSE-NEXT: // in Loop: Header=BB0_3 Depth=2
@@ -148,13 +148,13 @@ define half @test_atomicrmw_fmin_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: .LBB1_2: // %atomicrmw.start
; SOFTFP-NOLSE-NEXT: // =>This Loop Header: Depth=1
; SOFTFP-NOLSE-NEXT: // Child Loop BB1_3 Depth 2
-; SOFTFP-NOLSE-NEXT: mov w22, w0
-; SOFTFP-NOLSE-NEXT: and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w21, w0
-; SOFTFP-NOLSE-NEXT: and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w20
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT: mov w1, w21
+; SOFTFP-NOLSE-NEXT: mov w22, w0
+; SOFTFP-NOLSE-NEXT: mov w0, w21
+; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT: mov w1, w22
; SOFTFP-NOLSE-NEXT: bl fminf
; SOFTFP-NOLSE-NEXT: bl __truncsfhf2
; SOFTFP-NOLSE-NEXT: mov w8, w0
@@ -162,7 +162,7 @@ define half @test_atomicrmw_fmin_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: // Parent Loop BB1_2 Depth=1
; SOFTFP-NOLSE-NEXT: // => This Inner Loop Header: Depth=2
; SOFTFP-NOLSE-NEXT: ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT: cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT: cmp w0, w21, uxth
; SOFTFP-NOLSE-NEXT: b.ne .LBB1_1
; SOFTFP-NOLSE-NEXT: // %bb.4: // %cmpxchg.trystore
; SOFTFP-NOLSE-NEXT: // in Loop: Header=BB1_3 Depth=2
@@ -591,19 +591,19 @@ define <2 x half> @test_atomicrmw_fmin_v2f16_seq_cst_align4(ptr %ptr, <2 x half>
; SOFTFP-NOLSE-NEXT: .LBB6_2: // %atomicrmw.start
; SOFTFP-NOLSE-NEXT: // =>This Loop Header: Depth=1
; SOFTFP-NOLSE-NEXT: // Child Loop BB6_3 Depth 2
-; SOFTFP-NOLSE-NEXT: and w0, w19, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w19
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w24, w0
-; SOFTFP-NOLSE-NEXT: and w0, w23, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w23
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w1, w24
; SOFTFP-NOLSE-NEXT: bl fminf
; SOFTFP-NOLSE-NEXT: bl __truncsfhf2
; SOFTFP-NOLSE-NEXT: mov w24, w0
-; SOFTFP-NOLSE-NEXT: and w0, w21, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w21
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w25, w0
-; SOFTFP-NOLSE-NEXT: and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w22
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w1, w25
; SOFTFP-NOLSE-NEXT: bl fminf
diff --git a/llvm/test/CodeGen/AArch64/atomicrmw-fsub.ll b/llvm/test/CodeGen/AArch64/atomicrmw-fsub.ll
index 82e0f14e68e26..01beb5c50afdd 100644
--- a/llvm/test/CodeGen/AArch64/atomicrmw-fsub.ll
+++ b/llvm/test/CodeGen/AArch64/atomicrmw-fsub.ll
@@ -58,13 +58,13 @@ define half @test_atomicrmw_fsub_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: .LBB0_2: // %atomicrmw.start
; SOFTFP-NOLSE-NEXT: // =>This Loop Header: Depth=1
; SOFTFP-NOLSE-NEXT: // Child Loop BB0_3 Depth 2
-; SOFTFP-NOLSE-NEXT: mov w22, w0
-; SOFTFP-NOLSE-NEXT: and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w21, w0
-; SOFTFP-NOLSE-NEXT: and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w20
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT: mov w1, w21
+; SOFTFP-NOLSE-NEXT: mov w22, w0
+; SOFTFP-NOLSE-NEXT: mov w0, w21
+; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT: mov w1, w22
; SOFTFP-NOLSE-NEXT: bl __subsf3
; SOFTFP-NOLSE-NEXT: bl __truncsfhf2
; SOFTFP-NOLSE-NEXT: mov w8, w0
@@ -72,7 +72,7 @@ define half @test_atomicrmw_fsub_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: // Parent Loop BB0_2 Depth=1
; SOFTFP-NOLSE-NEXT: // => This Inner Loop Header: Depth=2
; SOFTFP-NOLSE-NEXT: ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT: cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT: cmp w0, w21, uxth
; SOFTFP-NOLSE-NEXT: b.ne .LBB0_1
; SOFTFP-NOLSE-NEXT: // %bb.4: // %cmpxchg.trystore
; SOFTFP-NOLSE-NEXT: // in Loop: Header=BB0_3 Depth=2
@@ -146,13 +146,13 @@ define half @test_atomicrmw_fsub_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: .LBB1_2: // %atomicrmw.start
; SOFTFP-NOLSE-NEXT: // =>This Loop Header: Depth=1
; SOFTFP-NOLSE-NEXT: // Child Loop BB1_3 Depth 2
-; SOFTFP-NOLSE-NEXT: mov w22, w0
-; SOFTFP-NOLSE-NEXT: and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w21, w0
-; SOFTFP-NOLSE-NEXT: and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w20
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT: mov w1, w21
+; SOFTFP-NOLSE-NEXT: mov w22, w0
+; SOFTFP-NOLSE-NEXT: mov w0, w21
+; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT: mov w1, w22
; SOFTFP-NOLSE-NEXT: bl __subsf3
; SOFTFP-NOLSE-NEXT: bl __truncsfhf2
; SOFTFP-NOLSE-NEXT: mov w8, w0
@@ -160,7 +160,7 @@ define half @test_atomicrmw_fsub_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
; SOFTFP-NOLSE-NEXT: // Parent Loop BB1_2 Depth=1
; SOFTFP-NOLSE-NEXT: // => This Inner Loop Header: Depth=2
; SOFTFP-NOLSE-NEXT: ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT: cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT: cmp w0, w21, uxth
; SOFTFP-NOLSE-NEXT: b.ne .LBB1_1
; SOFTFP-NOLSE-NEXT: // %bb.4: // %cmpxchg.trystore
; SOFTFP-NOLSE-NEXT: // in Loop: Header=BB1_3 Depth=2
@@ -711,19 +711,19 @@ define <2 x half> @test_atomicrmw_fsub_v2f16_seq_cst_align4(ptr %ptr, <2 x half>
; SOFTFP-NOLSE-NEXT: .LBB7_2: // %atomicrmw.start
; SOFTFP-NOLSE-NEXT: // =>This Loop Header: Depth=1
; SOFTFP-NOLSE-NEXT: // Child Loop BB7_3 Depth 2
-; SOFTFP-NOLSE-NEXT: and w0, w19, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w19
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w24, w0
-; SOFTFP-NOLSE-NEXT: and w0, w23, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w23
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w1, w24
; SOFTFP-NOLSE-NEXT: bl __subsf3
; SOFTFP-NOLSE-NEXT: bl __truncsfhf2
; SOFTFP-NOLSE-NEXT: mov w24, w0
-; SOFTFP-NOLSE-NEXT: and w0, w21, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w21
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w25, w0
-; SOFTFP-NOLSE-NEXT: and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT: mov w0, w22
; SOFTFP-NOLSE-NEXT: bl __extendhfsf2
; SOFTFP-NOLSE-NEXT: mov w1, w25
; SOFTFP-NOLSE-NEXT: bl __subsf3
diff --git a/llvm/test/CodeGen/AArch64/strictfp_f16_abi_promote.ll b/llvm/test/CodeGen/AArc...
[truncated]
|
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
The Darwin ABI is a little different here:
|
| if (TLI->shouldSignExtendTypeInLibCall(SizedIntTy, false /*IsSigned*/)) | ||
| Attr = Attr.addParamAttribute(Ctx, Args.size() - 1, Attribute::SExt); | ||
| else | ||
| Attr = Attr.addParamAttribute(Ctx, Args.size() - 1, Attribute::ZExt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (TLI->shouldSignExtendTypeInLibCall(SizedIntTy, false /*IsSigned*/)) | |
| Attr = Attr.addParamAttribute(Ctx, Args.size() - 1, Attribute::SExt); | |
| else | |
| Attr = Attr.addParamAttribute(Ctx, Args.size() - 1, Attribute::ZExt); | |
| Attribute::AttrKind ExtAttr = TLI->shouldSignExtendTypeInLibCall(SizedIntTy, /*IsSigned=*/false) ? Attribute::SExt : Attribute::ZExt; | |
| Attr = Attr.addParamAttribute(Ctx, Args.size() - 1, AttrKind); |
| // the target's ABI. | ||
| if (TLI->shouldExtendTypeInLibCall( | ||
| TLI->getMemValueType(DL, SizedIntTy))) { | ||
| // The only atomic operations affected by signedness are min/max, and | ||
| // we don't have __atomic_ libcalls for them, so IsSigned is always | ||
| // false. | ||
| if (TLI->shouldSignExtendTypeInLibCall(SizedIntTy, false /*IsSigned*/)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is using 3 levels of terrible TargetLowering API. Really we should be able to query the signature of the specific libcall from RuntimeLibcalls, not these global overrides.
The part I understand least is why you need getMemValueType
The AArch32 PCS passes small integer arguments in registers by zero- or sign-extending them in the caller, but we were previously generating calls to the __atomic and __sync functions which left other values in the high bits. This is important in practice for the atomic min/max functions, which have signed versions which expect the value to have been correctly sign-extended.
This bug existed in two places: the AtomicExpand IR pass, and in DAG ISel. For the firrst, the fix is to add the
zeroextorsignextattributes to the function arguments, and for the latter we insert the zero- or sign-extension functions when converting the DAG node to a function call.While testing this, I noticed that AArch64 is doing this extension in some cases, but that isn't needed because in the AArch64 PCS the high bits are unspecified, the callee does the extension. There was already a target lowering hook to configure this, so I've set it correctly for AArch64.
Fixes #61880