[ARM] Use correct ABI for atomic functions #128891

ostannard · 2025-02-26T15:14:44Z

The AArch32 PCS passes small integer arguments in registers by zero- or sign-extending them in the caller, but we were previously generating calls to the __atomic and __sync functions which left other values in the high bits. This is important in practice for the atomic min/max functions, which have signed versions which expect the value to have been correctly sign-extended.

This bug existed in two places: the AtomicExpand IR pass, and in DAG ISel. For the firrst, the fix is to add the zeroext or signext attributes to the function arguments, and for the latter we insert the zero- or sign-extension functions when converting the DAG node to a function call.

While testing this, I noticed that AArch64 is doing this extension in some cases, but that isn't needed because in the AArch64 PCS the high bits are unspecified, the callee does the extension. There was already a target lowering hook to configure this, so I've set it correctly for AArch64.

Fixes #61880

In AAPCS64, the high bits of registers used to pass small arguments are unspecified, and it is the callee's job to sign- or zero-extend them if needed. This means that we don't need to do this extension in the caller.

The AArch32 PCS requires the caller to sign- or zero-extend small integer types to 32-bit before passing them to a function. For most calls this is handled by clang by setting the zeroext or signext parameter attributes, but we were adding calls to library functions in AtomicExpandPass without doing this. Fixes llvm#61880

The AArch32 PCS passes small integer arguments in registers by zero- or sign-extending them in the caller, but we were previously generating calls to the __atomic and __sync functions which left other values in the high bits. This is important in practice for the atomic min/max functions, which have signed versions which expect the value to have been correctly sign-extended. Fixes llvm#61880.

llvmbot · 2025-02-26T15:15:23Z

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-arm

Author: Oliver Stannard (ostannard)

Changes

The AArch32 PCS passes small integer arguments in registers by zero- or sign-extending them in the caller, but we were previously generating calls to the __atomic and __sync functions which left other values in the high bits. This is important in practice for the atomic min/max functions, which have signed versions which expect the value to have been correctly sign-extended.

This bug existed in two places: the AtomicExpand IR pass, and in DAG ISel. For the firrst, the fix is to add the zeroext or signext attributes to the function arguments, and for the latter we insert the zero- or sign-extension functions when converting the DAG node to a function call.

While testing this, I noticed that AArch64 is doing this extension in some cases, but that isn't needed because in the AArch64 PCS the high bits are unspecified, the callee does the extension. There was already a target lowering hook to configure this, so I've set it correctly for AArch64.

Fixes #61880

Patch is 39.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/128891.diff

11 Files Affected:

(modified) llvm/lib/CodeGen/AtomicExpandPass.cpp (+13)
(modified) llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp (+30-7)
(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.h (+4)
(modified) llvm/test/CodeGen/AArch64/atomicrmw-fadd.ll (+16-16)
(modified) llvm/test/CodeGen/AArch64/atomicrmw-fmax.ll (+16-16)
(modified) llvm/test/CodeGen/AArch64/atomicrmw-fmin.ll (+16-16)
(modified) llvm/test/CodeGen/AArch64/atomicrmw-fsub.ll (+16-16)
(modified) llvm/test/CodeGen/AArch64/strictfp_f16_abi_promote.ll (+7-10)
(modified) llvm/test/CodeGen/ARM/atomic-cmpxchg.ll (+10-9)
(modified) llvm/test/CodeGen/ARM/atomic-load-store.ll (+37-29)
(modified) llvm/test/CodeGen/ARM/thumbv6m-atomic32.ll (+264)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index a75fa688d87a8..b13d380ae7c7e 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -1999,6 +1999,19 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
       Value *IntValue =
           Builder.CreateBitOrPointerCast(ValueOperand, SizedIntTy);
       Args.push_back(IntValue);
+
+      // Set the zeroext/signext attributes on the parameter if needed to match
+      // the target's ABI.
+      if (TLI->shouldExtendTypeInLibCall(
+              TLI->getMemValueType(DL, SizedIntTy))) {
+        // The only atomic operations affected by signedness are min/max, and
+        // we don't have __atomic_ libcalls for them, so IsSigned is always
+        // false.
+        if (TLI->shouldSignExtendTypeInLibCall(SizedIntTy, false /*IsSigned*/))
+          Attr = Attr.addParamAttribute(Ctx, Args.size() - 1, Attribute::SExt);
+        else
+          Attr = Attr.addParamAttribute(Ctx, Args.size() - 1, Attribute::ZExt);
+      }
     } else {
       AllocaValue = AllocaBuilder.CreateAlloca(ValueOperand->getType());
       AllocaValue->setAlignment(AllocaAlignment);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index f56097fdbb51a..9cd4e42cfd062 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -4386,23 +4386,46 @@ void SelectionDAGLegalize::ConvertNodeToLibcall(SDNode *Node) {
     AtomicOrdering Order = cast<AtomicSDNode>(Node)->getMergedOrdering();
     RTLIB::Libcall LC = RTLIB::getOUTLINE_ATOMIC(Opc, Order, VT);
     EVT RetVT = Node->getValueType(0);
+    SDValue ChainIn = Node->getOperand(0);
+    SDValue Pointer = Node->getOperand(1);
+    SDLoc dl(Node);
     SmallVector<SDValue, 4> Ops;
+
+    // Zero/sign extend small operands if required by the target's ABI.
+    SmallVector<SDValue, 4> ExtendedOps;
+    for (auto Op = Node->op_begin() + 2, E = Node->op_end(); Op != E; ++Op) {
+      if (TLI.shouldExtendTypeInLibCall(VT)) {
+        bool IsSigned =
+            Opc == ISD::ATOMIC_LOAD_MIN || Opc == ISD::ATOMIC_LOAD_MAX;
+        if (TLI.shouldSignExtendTypeInLibCall(
+                EVT(VT).getTypeForEVT(*DAG.getContext()), IsSigned))
+          ExtendedOps.push_back(DAG.getNode(ISD::SIGN_EXTEND_INREG, dl,
+                                            Op->getValueType(), *Op,
+                                            DAG.getValueType(VT)));
+        else
+          ExtendedOps.push_back(DAG.getZeroExtendInReg(*Op, dl, VT));
+
+      } else {
+        ExtendedOps.push_back(*Op);
+      }
+    }
+
     if (TLI.getLibcallName(LC)) {
       // If outline atomic available, prepare its arguments and expand.
-      Ops.append(Node->op_begin() + 2, Node->op_end());
-      Ops.push_back(Node->getOperand(1));
+      Ops.append(ExtendedOps.begin(), ExtendedOps.end());
+      Ops.push_back(Pointer);
 
     } else {
       LC = RTLIB::getSYNC(Opc, VT);
       assert(LC != RTLIB::UNKNOWN_LIBCALL &&
              "Unexpected atomic op or value type!");
       // Arguments for expansion to sync libcall
-      Ops.append(Node->op_begin() + 1, Node->op_end());
+      Ops.push_back(Pointer);
+      Ops.append(ExtendedOps.begin(), ExtendedOps.end());
     }
-    std::pair<SDValue, SDValue> Tmp = TLI.makeLibCall(DAG, LC, RetVT,
-                                                      Ops, CallOptions,
-                                                      SDLoc(Node),
-                                                      Node->getOperand(0));
+
+    std::pair<SDValue, SDValue> Tmp =
+        TLI.makeLibCall(DAG, LC, RetVT, Ops, CallOptions, dl, ChainIn);
     Results.push_back(Tmp.first);
     Results.push_back(Tmp.second);
     break;
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index 1987c892ac080..9a009f0eb6980 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -1378,6 +1378,10 @@ class AArch64TargetLowering : public TargetLowering {
   bool shouldScalarizeBinop(SDValue VecOp) const override {
     return VecOp.getOpcode() == ISD::SETCC;
   }
+
+  bool shouldExtendTypeInLibCall(EVT Type) const override {
+    return false;
+  }
 };
 
 namespace AArch64 {
diff --git a/llvm/test/CodeGen/AArch64/atomicrmw-fadd.ll b/llvm/test/CodeGen/AArch64/atomicrmw-fadd.ll
index 21729b9dfd101..b650040617ecd 100644
--- a/llvm/test/CodeGen/AArch64/atomicrmw-fadd.ll
+++ b/llvm/test/CodeGen/AArch64/atomicrmw-fadd.ll
@@ -58,13 +58,13 @@ define half @test_atomicrmw_fadd_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:  .LBB0_2: // %atomicrmw.start
 ; SOFTFP-NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; SOFTFP-NOLSE-NEXT:    // Child Loop BB0_3 Depth 2
-; SOFTFP-NOLSE-NEXT:    mov w22, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w21, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w20
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT:    mov w1, w21
+; SOFTFP-NOLSE-NEXT:    mov w22, w0
+; SOFTFP-NOLSE-NEXT:    mov w0, w21
+; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT:    mov w1, w22
 ; SOFTFP-NOLSE-NEXT:    bl __addsf3
 ; SOFTFP-NOLSE-NEXT:    bl __truncsfhf2
 ; SOFTFP-NOLSE-NEXT:    mov w8, w0
@@ -72,7 +72,7 @@ define half @test_atomicrmw_fadd_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:    // Parent Loop BB0_2 Depth=1
 ; SOFTFP-NOLSE-NEXT:    // => This Inner Loop Header: Depth=2
 ; SOFTFP-NOLSE-NEXT:    ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT:    cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT:    cmp w0, w21, uxth
 ; SOFTFP-NOLSE-NEXT:    b.ne .LBB0_1
 ; SOFTFP-NOLSE-NEXT:  // %bb.4: // %cmpxchg.trystore
 ; SOFTFP-NOLSE-NEXT:    // in Loop: Header=BB0_3 Depth=2
@@ -146,13 +146,13 @@ define half @test_atomicrmw_fadd_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:  .LBB1_2: // %atomicrmw.start
 ; SOFTFP-NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; SOFTFP-NOLSE-NEXT:    // Child Loop BB1_3 Depth 2
-; SOFTFP-NOLSE-NEXT:    mov w22, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w21, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w20
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT:    mov w1, w21
+; SOFTFP-NOLSE-NEXT:    mov w22, w0
+; SOFTFP-NOLSE-NEXT:    mov w0, w21
+; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT:    mov w1, w22
 ; SOFTFP-NOLSE-NEXT:    bl __addsf3
 ; SOFTFP-NOLSE-NEXT:    bl __truncsfhf2
 ; SOFTFP-NOLSE-NEXT:    mov w8, w0
@@ -160,7 +160,7 @@ define half @test_atomicrmw_fadd_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:    // Parent Loop BB1_2 Depth=1
 ; SOFTFP-NOLSE-NEXT:    // => This Inner Loop Header: Depth=2
 ; SOFTFP-NOLSE-NEXT:    ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT:    cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT:    cmp w0, w21, uxth
 ; SOFTFP-NOLSE-NEXT:    b.ne .LBB1_1
 ; SOFTFP-NOLSE-NEXT:  // %bb.4: // %cmpxchg.trystore
 ; SOFTFP-NOLSE-NEXT:    // in Loop: Header=BB1_3 Depth=2
@@ -711,19 +711,19 @@ define <2 x half> @test_atomicrmw_fadd_v2f16_seq_cst_align4(ptr %ptr, <2 x half>
 ; SOFTFP-NOLSE-NEXT:  .LBB7_2: // %atomicrmw.start
 ; SOFTFP-NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; SOFTFP-NOLSE-NEXT:    // Child Loop BB7_3 Depth 2
-; SOFTFP-NOLSE-NEXT:    and w0, w19, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w19
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w24, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w23, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w23
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w1, w24
 ; SOFTFP-NOLSE-NEXT:    bl __addsf3
 ; SOFTFP-NOLSE-NEXT:    bl __truncsfhf2
 ; SOFTFP-NOLSE-NEXT:    mov w24, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w21, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w21
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w25, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w22
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w1, w25
 ; SOFTFP-NOLSE-NEXT:    bl __addsf3
diff --git a/llvm/test/CodeGen/AArch64/atomicrmw-fmax.ll b/llvm/test/CodeGen/AArch64/atomicrmw-fmax.ll
index 9b5e48d2b4217..41c5afe0f64a9 100644
--- a/llvm/test/CodeGen/AArch64/atomicrmw-fmax.ll
+++ b/llvm/test/CodeGen/AArch64/atomicrmw-fmax.ll
@@ -60,13 +60,13 @@ define half @test_atomicrmw_fmax_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:  .LBB0_2: // %atomicrmw.start
 ; SOFTFP-NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; SOFTFP-NOLSE-NEXT:    // Child Loop BB0_3 Depth 2
-; SOFTFP-NOLSE-NEXT:    mov w22, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w21, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w20
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT:    mov w1, w21
+; SOFTFP-NOLSE-NEXT:    mov w22, w0
+; SOFTFP-NOLSE-NEXT:    mov w0, w21
+; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT:    mov w1, w22
 ; SOFTFP-NOLSE-NEXT:    bl fmaxf
 ; SOFTFP-NOLSE-NEXT:    bl __truncsfhf2
 ; SOFTFP-NOLSE-NEXT:    mov w8, w0
@@ -74,7 +74,7 @@ define half @test_atomicrmw_fmax_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:    // Parent Loop BB0_2 Depth=1
 ; SOFTFP-NOLSE-NEXT:    // => This Inner Loop Header: Depth=2
 ; SOFTFP-NOLSE-NEXT:    ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT:    cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT:    cmp w0, w21, uxth
 ; SOFTFP-NOLSE-NEXT:    b.ne .LBB0_1
 ; SOFTFP-NOLSE-NEXT:  // %bb.4: // %cmpxchg.trystore
 ; SOFTFP-NOLSE-NEXT:    // in Loop: Header=BB0_3 Depth=2
@@ -148,13 +148,13 @@ define half @test_atomicrmw_fmax_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:  .LBB1_2: // %atomicrmw.start
 ; SOFTFP-NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; SOFTFP-NOLSE-NEXT:    // Child Loop BB1_3 Depth 2
-; SOFTFP-NOLSE-NEXT:    mov w22, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w21, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w20
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT:    mov w1, w21
+; SOFTFP-NOLSE-NEXT:    mov w22, w0
+; SOFTFP-NOLSE-NEXT:    mov w0, w21
+; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT:    mov w1, w22
 ; SOFTFP-NOLSE-NEXT:    bl fmaxf
 ; SOFTFP-NOLSE-NEXT:    bl __truncsfhf2
 ; SOFTFP-NOLSE-NEXT:    mov w8, w0
@@ -162,7 +162,7 @@ define half @test_atomicrmw_fmax_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:    // Parent Loop BB1_2 Depth=1
 ; SOFTFP-NOLSE-NEXT:    // => This Inner Loop Header: Depth=2
 ; SOFTFP-NOLSE-NEXT:    ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT:    cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT:    cmp w0, w21, uxth
 ; SOFTFP-NOLSE-NEXT:    b.ne .LBB1_1
 ; SOFTFP-NOLSE-NEXT:  // %bb.4: // %cmpxchg.trystore
 ; SOFTFP-NOLSE-NEXT:    // in Loop: Header=BB1_3 Depth=2
@@ -591,19 +591,19 @@ define <2 x half> @test_atomicrmw_fmax_v2f16_seq_cst_align4(ptr %ptr, <2 x half>
 ; SOFTFP-NOLSE-NEXT:  .LBB6_2: // %atomicrmw.start
 ; SOFTFP-NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; SOFTFP-NOLSE-NEXT:    // Child Loop BB6_3 Depth 2
-; SOFTFP-NOLSE-NEXT:    and w0, w19, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w19
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w24, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w23, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w23
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w1, w24
 ; SOFTFP-NOLSE-NEXT:    bl fmaxf
 ; SOFTFP-NOLSE-NEXT:    bl __truncsfhf2
 ; SOFTFP-NOLSE-NEXT:    mov w24, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w21, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w21
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w25, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w22
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w1, w25
 ; SOFTFP-NOLSE-NEXT:    bl fmaxf
diff --git a/llvm/test/CodeGen/AArch64/atomicrmw-fmin.ll b/llvm/test/CodeGen/AArch64/atomicrmw-fmin.ll
index f6c542fe7d407..a01bd182e61e6 100644
--- a/llvm/test/CodeGen/AArch64/atomicrmw-fmin.ll
+++ b/llvm/test/CodeGen/AArch64/atomicrmw-fmin.ll
@@ -60,13 +60,13 @@ define half @test_atomicrmw_fmin_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:  .LBB0_2: // %atomicrmw.start
 ; SOFTFP-NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; SOFTFP-NOLSE-NEXT:    // Child Loop BB0_3 Depth 2
-; SOFTFP-NOLSE-NEXT:    mov w22, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w21, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w20
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT:    mov w1, w21
+; SOFTFP-NOLSE-NEXT:    mov w22, w0
+; SOFTFP-NOLSE-NEXT:    mov w0, w21
+; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT:    mov w1, w22
 ; SOFTFP-NOLSE-NEXT:    bl fminf
 ; SOFTFP-NOLSE-NEXT:    bl __truncsfhf2
 ; SOFTFP-NOLSE-NEXT:    mov w8, w0
@@ -74,7 +74,7 @@ define half @test_atomicrmw_fmin_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:    // Parent Loop BB0_2 Depth=1
 ; SOFTFP-NOLSE-NEXT:    // => This Inner Loop Header: Depth=2
 ; SOFTFP-NOLSE-NEXT:    ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT:    cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT:    cmp w0, w21, uxth
 ; SOFTFP-NOLSE-NEXT:    b.ne .LBB0_1
 ; SOFTFP-NOLSE-NEXT:  // %bb.4: // %cmpxchg.trystore
 ; SOFTFP-NOLSE-NEXT:    // in Loop: Header=BB0_3 Depth=2
@@ -148,13 +148,13 @@ define half @test_atomicrmw_fmin_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:  .LBB1_2: // %atomicrmw.start
 ; SOFTFP-NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; SOFTFP-NOLSE-NEXT:    // Child Loop BB1_3 Depth 2
-; SOFTFP-NOLSE-NEXT:    mov w22, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w21, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w20
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT:    mov w1, w21
+; SOFTFP-NOLSE-NEXT:    mov w22, w0
+; SOFTFP-NOLSE-NEXT:    mov w0, w21
+; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT:    mov w1, w22
 ; SOFTFP-NOLSE-NEXT:    bl fminf
 ; SOFTFP-NOLSE-NEXT:    bl __truncsfhf2
 ; SOFTFP-NOLSE-NEXT:    mov w8, w0
@@ -162,7 +162,7 @@ define half @test_atomicrmw_fmin_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:    // Parent Loop BB1_2 Depth=1
 ; SOFTFP-NOLSE-NEXT:    // => This Inner Loop Header: Depth=2
 ; SOFTFP-NOLSE-NEXT:    ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT:    cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT:    cmp w0, w21, uxth
 ; SOFTFP-NOLSE-NEXT:    b.ne .LBB1_1
 ; SOFTFP-NOLSE-NEXT:  // %bb.4: // %cmpxchg.trystore
 ; SOFTFP-NOLSE-NEXT:    // in Loop: Header=BB1_3 Depth=2
@@ -591,19 +591,19 @@ define <2 x half> @test_atomicrmw_fmin_v2f16_seq_cst_align4(ptr %ptr, <2 x half>
 ; SOFTFP-NOLSE-NEXT:  .LBB6_2: // %atomicrmw.start
 ; SOFTFP-NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; SOFTFP-NOLSE-NEXT:    // Child Loop BB6_3 Depth 2
-; SOFTFP-NOLSE-NEXT:    and w0, w19, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w19
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w24, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w23, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w23
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w1, w24
 ; SOFTFP-NOLSE-NEXT:    bl fminf
 ; SOFTFP-NOLSE-NEXT:    bl __truncsfhf2
 ; SOFTFP-NOLSE-NEXT:    mov w24, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w21, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w21
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w25, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w22
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w1, w25
 ; SOFTFP-NOLSE-NEXT:    bl fminf
diff --git a/llvm/test/CodeGen/AArch64/atomicrmw-fsub.ll b/llvm/test/CodeGen/AArch64/atomicrmw-fsub.ll
index 82e0f14e68e26..01beb5c50afdd 100644
--- a/llvm/test/CodeGen/AArch64/atomicrmw-fsub.ll
+++ b/llvm/test/CodeGen/AArch64/atomicrmw-fsub.ll
@@ -58,13 +58,13 @@ define half @test_atomicrmw_fsub_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:  .LBB0_2: // %atomicrmw.start
 ; SOFTFP-NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; SOFTFP-NOLSE-NEXT:    // Child Loop BB0_3 Depth 2
-; SOFTFP-NOLSE-NEXT:    mov w22, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w21, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w20
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT:    mov w1, w21
+; SOFTFP-NOLSE-NEXT:    mov w22, w0
+; SOFTFP-NOLSE-NEXT:    mov w0, w21
+; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT:    mov w1, w22
 ; SOFTFP-NOLSE-NEXT:    bl __subsf3
 ; SOFTFP-NOLSE-NEXT:    bl __truncsfhf2
 ; SOFTFP-NOLSE-NEXT:    mov w8, w0
@@ -72,7 +72,7 @@ define half @test_atomicrmw_fsub_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:    // Parent Loop BB0_2 Depth=1
 ; SOFTFP-NOLSE-NEXT:    // => This Inner Loop Header: Depth=2
 ; SOFTFP-NOLSE-NEXT:    ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT:    cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT:    cmp w0, w21, uxth
 ; SOFTFP-NOLSE-NEXT:    b.ne .LBB0_1
 ; SOFTFP-NOLSE-NEXT:  // %bb.4: // %cmpxchg.trystore
 ; SOFTFP-NOLSE-NEXT:    // in Loop: Header=BB0_3 Depth=2
@@ -146,13 +146,13 @@ define half @test_atomicrmw_fsub_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:  .LBB1_2: // %atomicrmw.start
 ; SOFTFP-NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; SOFTFP-NOLSE-NEXT:    // Child Loop BB1_3 Depth 2
-; SOFTFP-NOLSE-NEXT:    mov w22, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w21, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w20
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT:    mov w1, w21
+; SOFTFP-NOLSE-NEXT:    mov w22, w0
+; SOFTFP-NOLSE-NEXT:    mov w0, w21
+; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT:    mov w1, w22
 ; SOFTFP-NOLSE-NEXT:    bl __subsf3
 ; SOFTFP-NOLSE-NEXT:    bl __truncsfhf2
 ; SOFTFP-NOLSE-NEXT:    mov w8, w0
@@ -160,7 +160,7 @@ define half @test_atomicrmw_fsub_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:    // Parent Loop BB1_2 Depth=1
 ; SOFTFP-NOLSE-NEXT:    // => This Inner Loop Header: Depth=2
 ; SOFTFP-NOLSE-NEXT:    ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT:    cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT:    cmp w0, w21, uxth
 ; SOFTFP-NOLSE-NEXT:    b.ne .LBB1_1
 ; SOFTFP-NOLSE-NEXT:  // %bb.4: // %cmpxchg.trystore
 ; SOFTFP-NOLSE-NEXT:    // in Loop: Header=BB1_3 Depth=2
@@ -711,19 +711,19 @@ define <2 x half> @test_atomicrmw_fsub_v2f16_seq_cst_align4(ptr %ptr, <2 x half>
 ; SOFTFP-NOLSE-NEXT:  .LBB7_2: // %atomicrmw.start
 ; SOFTFP-NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; SOFTFP-NOLSE-NEXT:    // Child Loop BB7_3 Depth 2
-; SOFTFP-NOLSE-NEXT:    and w0, w19, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w19
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w24, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w23, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w23
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w1, w24
 ; SOFTFP-NOLSE-NEXT:    bl __subsf3
 ; SOFTFP-NOLSE-NEXT:    bl __truncsfhf2
 ; SOFTFP-NOLSE-NEXT:    mov w24, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w21, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w21
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w25, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w22
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w1, w25
 ; SOFTFP-NOLSE-NEXT:    bl __subsf3
diff --git a/llvm/test/CodeGen/AArch64/strictfp_f16_abi_promote.ll b/llvm/test/CodeGen/AArc...
[truncated]

llvmbot · 2025-02-26T15:15:24Z

@llvm/pr-subscribers-backend-aarch64

Author: Oliver Stannard (ostannard)

Changes

The AArch32 PCS passes small integer arguments in registers by zero- or sign-extending them in the caller, but we were previously generating calls to the __atomic and __sync functions which left other values in the high bits. This is important in practice for the atomic min/max functions, which have signed versions which expect the value to have been correctly sign-extended.

This bug existed in two places: the AtomicExpand IR pass, and in DAG ISel. For the firrst, the fix is to add the zeroext or signext attributes to the function arguments, and for the latter we insert the zero- or sign-extension functions when converting the DAG node to a function call.

While testing this, I noticed that AArch64 is doing this extension in some cases, but that isn't needed because in the AArch64 PCS the high bits are unspecified, the callee does the extension. There was already a target lowering hook to configure this, so I've set it correctly for AArch64.

Fixes #61880

Patch is 39.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/128891.diff

11 Files Affected:

(modified) llvm/lib/CodeGen/AtomicExpandPass.cpp (+13)
(modified) llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp (+30-7)
(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.h (+4)
(modified) llvm/test/CodeGen/AArch64/atomicrmw-fadd.ll (+16-16)
(modified) llvm/test/CodeGen/AArch64/atomicrmw-fmax.ll (+16-16)
(modified) llvm/test/CodeGen/AArch64/atomicrmw-fmin.ll (+16-16)
(modified) llvm/test/CodeGen/AArch64/atomicrmw-fsub.ll (+16-16)
(modified) llvm/test/CodeGen/AArch64/strictfp_f16_abi_promote.ll (+7-10)
(modified) llvm/test/CodeGen/ARM/atomic-cmpxchg.ll (+10-9)
(modified) llvm/test/CodeGen/ARM/atomic-load-store.ll (+37-29)
(modified) llvm/test/CodeGen/ARM/thumbv6m-atomic32.ll (+264)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index a75fa688d87a8..b13d380ae7c7e 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -1999,6 +1999,19 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
       Value *IntValue =
           Builder.CreateBitOrPointerCast(ValueOperand, SizedIntTy);
       Args.push_back(IntValue);
+
+      // Set the zeroext/signext attributes on the parameter if needed to match
+      // the target's ABI.
+      if (TLI->shouldExtendTypeInLibCall(
+              TLI->getMemValueType(DL, SizedIntTy))) {
+        // The only atomic operations affected by signedness are min/max, and
+        // we don't have __atomic_ libcalls for them, so IsSigned is always
+        // false.
+        if (TLI->shouldSignExtendTypeInLibCall(SizedIntTy, false /*IsSigned*/))
+          Attr = Attr.addParamAttribute(Ctx, Args.size() - 1, Attribute::SExt);
+        else
+          Attr = Attr.addParamAttribute(Ctx, Args.size() - 1, Attribute::ZExt);
+      }
     } else {
       AllocaValue = AllocaBuilder.CreateAlloca(ValueOperand->getType());
       AllocaValue->setAlignment(AllocaAlignment);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index f56097fdbb51a..9cd4e42cfd062 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -4386,23 +4386,46 @@ void SelectionDAGLegalize::ConvertNodeToLibcall(SDNode *Node) {
     AtomicOrdering Order = cast<AtomicSDNode>(Node)->getMergedOrdering();
     RTLIB::Libcall LC = RTLIB::getOUTLINE_ATOMIC(Opc, Order, VT);
     EVT RetVT = Node->getValueType(0);
+    SDValue ChainIn = Node->getOperand(0);
+    SDValue Pointer = Node->getOperand(1);
+    SDLoc dl(Node);
     SmallVector<SDValue, 4> Ops;
+
+    // Zero/sign extend small operands if required by the target's ABI.
+    SmallVector<SDValue, 4> ExtendedOps;
+    for (auto Op = Node->op_begin() + 2, E = Node->op_end(); Op != E; ++Op) {
+      if (TLI.shouldExtendTypeInLibCall(VT)) {
+        bool IsSigned =
+            Opc == ISD::ATOMIC_LOAD_MIN || Opc == ISD::ATOMIC_LOAD_MAX;
+        if (TLI.shouldSignExtendTypeInLibCall(
+                EVT(VT).getTypeForEVT(*DAG.getContext()), IsSigned))
+          ExtendedOps.push_back(DAG.getNode(ISD::SIGN_EXTEND_INREG, dl,
+                                            Op->getValueType(), *Op,
+                                            DAG.getValueType(VT)));
+        else
+          ExtendedOps.push_back(DAG.getZeroExtendInReg(*Op, dl, VT));
+
+      } else {
+        ExtendedOps.push_back(*Op);
+      }
+    }
+
     if (TLI.getLibcallName(LC)) {
       // If outline atomic available, prepare its arguments and expand.
-      Ops.append(Node->op_begin() + 2, Node->op_end());
-      Ops.push_back(Node->getOperand(1));
+      Ops.append(ExtendedOps.begin(), ExtendedOps.end());
+      Ops.push_back(Pointer);
 
     } else {
       LC = RTLIB::getSYNC(Opc, VT);
       assert(LC != RTLIB::UNKNOWN_LIBCALL &&
              "Unexpected atomic op or value type!");
       // Arguments for expansion to sync libcall
-      Ops.append(Node->op_begin() + 1, Node->op_end());
+      Ops.push_back(Pointer);
+      Ops.append(ExtendedOps.begin(), ExtendedOps.end());
     }
-    std::pair<SDValue, SDValue> Tmp = TLI.makeLibCall(DAG, LC, RetVT,
-                                                      Ops, CallOptions,
-                                                      SDLoc(Node),
-                                                      Node->getOperand(0));
+
+    std::pair<SDValue, SDValue> Tmp =
+        TLI.makeLibCall(DAG, LC, RetVT, Ops, CallOptions, dl, ChainIn);
     Results.push_back(Tmp.first);
     Results.push_back(Tmp.second);
     break;
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index 1987c892ac080..9a009f0eb6980 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -1378,6 +1378,10 @@ class AArch64TargetLowering : public TargetLowering {
   bool shouldScalarizeBinop(SDValue VecOp) const override {
     return VecOp.getOpcode() == ISD::SETCC;
   }
+
+  bool shouldExtendTypeInLibCall(EVT Type) const override {
+    return false;
+  }
 };
 
 namespace AArch64 {
diff --git a/llvm/test/CodeGen/AArch64/atomicrmw-fadd.ll b/llvm/test/CodeGen/AArch64/atomicrmw-fadd.ll
index 21729b9dfd101..b650040617ecd 100644
--- a/llvm/test/CodeGen/AArch64/atomicrmw-fadd.ll
+++ b/llvm/test/CodeGen/AArch64/atomicrmw-fadd.ll
@@ -58,13 +58,13 @@ define half @test_atomicrmw_fadd_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:  .LBB0_2: // %atomicrmw.start
 ; SOFTFP-NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; SOFTFP-NOLSE-NEXT:    // Child Loop BB0_3 Depth 2
-; SOFTFP-NOLSE-NEXT:    mov w22, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w21, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w20
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT:    mov w1, w21
+; SOFTFP-NOLSE-NEXT:    mov w22, w0
+; SOFTFP-NOLSE-NEXT:    mov w0, w21
+; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT:    mov w1, w22
 ; SOFTFP-NOLSE-NEXT:    bl __addsf3
 ; SOFTFP-NOLSE-NEXT:    bl __truncsfhf2
 ; SOFTFP-NOLSE-NEXT:    mov w8, w0
@@ -72,7 +72,7 @@ define half @test_atomicrmw_fadd_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:    // Parent Loop BB0_2 Depth=1
 ; SOFTFP-NOLSE-NEXT:    // => This Inner Loop Header: Depth=2
 ; SOFTFP-NOLSE-NEXT:    ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT:    cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT:    cmp w0, w21, uxth
 ; SOFTFP-NOLSE-NEXT:    b.ne .LBB0_1
 ; SOFTFP-NOLSE-NEXT:  // %bb.4: // %cmpxchg.trystore
 ; SOFTFP-NOLSE-NEXT:    // in Loop: Header=BB0_3 Depth=2
@@ -146,13 +146,13 @@ define half @test_atomicrmw_fadd_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:  .LBB1_2: // %atomicrmw.start
 ; SOFTFP-NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; SOFTFP-NOLSE-NEXT:    // Child Loop BB1_3 Depth 2
-; SOFTFP-NOLSE-NEXT:    mov w22, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w21, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w20
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT:    mov w1, w21
+; SOFTFP-NOLSE-NEXT:    mov w22, w0
+; SOFTFP-NOLSE-NEXT:    mov w0, w21
+; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT:    mov w1, w22
 ; SOFTFP-NOLSE-NEXT:    bl __addsf3
 ; SOFTFP-NOLSE-NEXT:    bl __truncsfhf2
 ; SOFTFP-NOLSE-NEXT:    mov w8, w0
@@ -160,7 +160,7 @@ define half @test_atomicrmw_fadd_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:    // Parent Loop BB1_2 Depth=1
 ; SOFTFP-NOLSE-NEXT:    // => This Inner Loop Header: Depth=2
 ; SOFTFP-NOLSE-NEXT:    ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT:    cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT:    cmp w0, w21, uxth
 ; SOFTFP-NOLSE-NEXT:    b.ne .LBB1_1
 ; SOFTFP-NOLSE-NEXT:  // %bb.4: // %cmpxchg.trystore
 ; SOFTFP-NOLSE-NEXT:    // in Loop: Header=BB1_3 Depth=2
@@ -711,19 +711,19 @@ define <2 x half> @test_atomicrmw_fadd_v2f16_seq_cst_align4(ptr %ptr, <2 x half>
 ; SOFTFP-NOLSE-NEXT:  .LBB7_2: // %atomicrmw.start
 ; SOFTFP-NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; SOFTFP-NOLSE-NEXT:    // Child Loop BB7_3 Depth 2
-; SOFTFP-NOLSE-NEXT:    and w0, w19, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w19
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w24, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w23, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w23
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w1, w24
 ; SOFTFP-NOLSE-NEXT:    bl __addsf3
 ; SOFTFP-NOLSE-NEXT:    bl __truncsfhf2
 ; SOFTFP-NOLSE-NEXT:    mov w24, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w21, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w21
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w25, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w22
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w1, w25
 ; SOFTFP-NOLSE-NEXT:    bl __addsf3
diff --git a/llvm/test/CodeGen/AArch64/atomicrmw-fmax.ll b/llvm/test/CodeGen/AArch64/atomicrmw-fmax.ll
index 9b5e48d2b4217..41c5afe0f64a9 100644
--- a/llvm/test/CodeGen/AArch64/atomicrmw-fmax.ll
+++ b/llvm/test/CodeGen/AArch64/atomicrmw-fmax.ll
@@ -60,13 +60,13 @@ define half @test_atomicrmw_fmax_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:  .LBB0_2: // %atomicrmw.start
 ; SOFTFP-NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; SOFTFP-NOLSE-NEXT:    // Child Loop BB0_3 Depth 2
-; SOFTFP-NOLSE-NEXT:    mov w22, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w21, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w20
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT:    mov w1, w21
+; SOFTFP-NOLSE-NEXT:    mov w22, w0
+; SOFTFP-NOLSE-NEXT:    mov w0, w21
+; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT:    mov w1, w22
 ; SOFTFP-NOLSE-NEXT:    bl fmaxf
 ; SOFTFP-NOLSE-NEXT:    bl __truncsfhf2
 ; SOFTFP-NOLSE-NEXT:    mov w8, w0
@@ -74,7 +74,7 @@ define half @test_atomicrmw_fmax_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:    // Parent Loop BB0_2 Depth=1
 ; SOFTFP-NOLSE-NEXT:    // => This Inner Loop Header: Depth=2
 ; SOFTFP-NOLSE-NEXT:    ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT:    cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT:    cmp w0, w21, uxth
 ; SOFTFP-NOLSE-NEXT:    b.ne .LBB0_1
 ; SOFTFP-NOLSE-NEXT:  // %bb.4: // %cmpxchg.trystore
 ; SOFTFP-NOLSE-NEXT:    // in Loop: Header=BB0_3 Depth=2
@@ -148,13 +148,13 @@ define half @test_atomicrmw_fmax_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:  .LBB1_2: // %atomicrmw.start
 ; SOFTFP-NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; SOFTFP-NOLSE-NEXT:    // Child Loop BB1_3 Depth 2
-; SOFTFP-NOLSE-NEXT:    mov w22, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w21, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w20
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT:    mov w1, w21
+; SOFTFP-NOLSE-NEXT:    mov w22, w0
+; SOFTFP-NOLSE-NEXT:    mov w0, w21
+; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT:    mov w1, w22
 ; SOFTFP-NOLSE-NEXT:    bl fmaxf
 ; SOFTFP-NOLSE-NEXT:    bl __truncsfhf2
 ; SOFTFP-NOLSE-NEXT:    mov w8, w0
@@ -162,7 +162,7 @@ define half @test_atomicrmw_fmax_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:    // Parent Loop BB1_2 Depth=1
 ; SOFTFP-NOLSE-NEXT:    // => This Inner Loop Header: Depth=2
 ; SOFTFP-NOLSE-NEXT:    ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT:    cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT:    cmp w0, w21, uxth
 ; SOFTFP-NOLSE-NEXT:    b.ne .LBB1_1
 ; SOFTFP-NOLSE-NEXT:  // %bb.4: // %cmpxchg.trystore
 ; SOFTFP-NOLSE-NEXT:    // in Loop: Header=BB1_3 Depth=2
@@ -591,19 +591,19 @@ define <2 x half> @test_atomicrmw_fmax_v2f16_seq_cst_align4(ptr %ptr, <2 x half>
 ; SOFTFP-NOLSE-NEXT:  .LBB6_2: // %atomicrmw.start
 ; SOFTFP-NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; SOFTFP-NOLSE-NEXT:    // Child Loop BB6_3 Depth 2
-; SOFTFP-NOLSE-NEXT:    and w0, w19, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w19
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w24, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w23, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w23
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w1, w24
 ; SOFTFP-NOLSE-NEXT:    bl fmaxf
 ; SOFTFP-NOLSE-NEXT:    bl __truncsfhf2
 ; SOFTFP-NOLSE-NEXT:    mov w24, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w21, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w21
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w25, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w22
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w1, w25
 ; SOFTFP-NOLSE-NEXT:    bl fmaxf
diff --git a/llvm/test/CodeGen/AArch64/atomicrmw-fmin.ll b/llvm/test/CodeGen/AArch64/atomicrmw-fmin.ll
index f6c542fe7d407..a01bd182e61e6 100644
--- a/llvm/test/CodeGen/AArch64/atomicrmw-fmin.ll
+++ b/llvm/test/CodeGen/AArch64/atomicrmw-fmin.ll
@@ -60,13 +60,13 @@ define half @test_atomicrmw_fmin_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:  .LBB0_2: // %atomicrmw.start
 ; SOFTFP-NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; SOFTFP-NOLSE-NEXT:    // Child Loop BB0_3 Depth 2
-; SOFTFP-NOLSE-NEXT:    mov w22, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w21, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w20
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT:    mov w1, w21
+; SOFTFP-NOLSE-NEXT:    mov w22, w0
+; SOFTFP-NOLSE-NEXT:    mov w0, w21
+; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT:    mov w1, w22
 ; SOFTFP-NOLSE-NEXT:    bl fminf
 ; SOFTFP-NOLSE-NEXT:    bl __truncsfhf2
 ; SOFTFP-NOLSE-NEXT:    mov w8, w0
@@ -74,7 +74,7 @@ define half @test_atomicrmw_fmin_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:    // Parent Loop BB0_2 Depth=1
 ; SOFTFP-NOLSE-NEXT:    // => This Inner Loop Header: Depth=2
 ; SOFTFP-NOLSE-NEXT:    ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT:    cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT:    cmp w0, w21, uxth
 ; SOFTFP-NOLSE-NEXT:    b.ne .LBB0_1
 ; SOFTFP-NOLSE-NEXT:  // %bb.4: // %cmpxchg.trystore
 ; SOFTFP-NOLSE-NEXT:    // in Loop: Header=BB0_3 Depth=2
@@ -148,13 +148,13 @@ define half @test_atomicrmw_fmin_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:  .LBB1_2: // %atomicrmw.start
 ; SOFTFP-NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; SOFTFP-NOLSE-NEXT:    // Child Loop BB1_3 Depth 2
-; SOFTFP-NOLSE-NEXT:    mov w22, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w21, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w20
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT:    mov w1, w21
+; SOFTFP-NOLSE-NEXT:    mov w22, w0
+; SOFTFP-NOLSE-NEXT:    mov w0, w21
+; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT:    mov w1, w22
 ; SOFTFP-NOLSE-NEXT:    bl fminf
 ; SOFTFP-NOLSE-NEXT:    bl __truncsfhf2
 ; SOFTFP-NOLSE-NEXT:    mov w8, w0
@@ -162,7 +162,7 @@ define half @test_atomicrmw_fmin_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:    // Parent Loop BB1_2 Depth=1
 ; SOFTFP-NOLSE-NEXT:    // => This Inner Loop Header: Depth=2
 ; SOFTFP-NOLSE-NEXT:    ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT:    cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT:    cmp w0, w21, uxth
 ; SOFTFP-NOLSE-NEXT:    b.ne .LBB1_1
 ; SOFTFP-NOLSE-NEXT:  // %bb.4: // %cmpxchg.trystore
 ; SOFTFP-NOLSE-NEXT:    // in Loop: Header=BB1_3 Depth=2
@@ -591,19 +591,19 @@ define <2 x half> @test_atomicrmw_fmin_v2f16_seq_cst_align4(ptr %ptr, <2 x half>
 ; SOFTFP-NOLSE-NEXT:  .LBB6_2: // %atomicrmw.start
 ; SOFTFP-NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; SOFTFP-NOLSE-NEXT:    // Child Loop BB6_3 Depth 2
-; SOFTFP-NOLSE-NEXT:    and w0, w19, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w19
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w24, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w23, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w23
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w1, w24
 ; SOFTFP-NOLSE-NEXT:    bl fminf
 ; SOFTFP-NOLSE-NEXT:    bl __truncsfhf2
 ; SOFTFP-NOLSE-NEXT:    mov w24, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w21, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w21
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w25, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w22
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w1, w25
 ; SOFTFP-NOLSE-NEXT:    bl fminf
diff --git a/llvm/test/CodeGen/AArch64/atomicrmw-fsub.ll b/llvm/test/CodeGen/AArch64/atomicrmw-fsub.ll
index 82e0f14e68e26..01beb5c50afdd 100644
--- a/llvm/test/CodeGen/AArch64/atomicrmw-fsub.ll
+++ b/llvm/test/CodeGen/AArch64/atomicrmw-fsub.ll
@@ -58,13 +58,13 @@ define half @test_atomicrmw_fsub_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:  .LBB0_2: // %atomicrmw.start
 ; SOFTFP-NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; SOFTFP-NOLSE-NEXT:    // Child Loop BB0_3 Depth 2
-; SOFTFP-NOLSE-NEXT:    mov w22, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w21, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w20
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT:    mov w1, w21
+; SOFTFP-NOLSE-NEXT:    mov w22, w0
+; SOFTFP-NOLSE-NEXT:    mov w0, w21
+; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT:    mov w1, w22
 ; SOFTFP-NOLSE-NEXT:    bl __subsf3
 ; SOFTFP-NOLSE-NEXT:    bl __truncsfhf2
 ; SOFTFP-NOLSE-NEXT:    mov w8, w0
@@ -72,7 +72,7 @@ define half @test_atomicrmw_fsub_f16_seq_cst_align2(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:    // Parent Loop BB0_2 Depth=1
 ; SOFTFP-NOLSE-NEXT:    // => This Inner Loop Header: Depth=2
 ; SOFTFP-NOLSE-NEXT:    ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT:    cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT:    cmp w0, w21, uxth
 ; SOFTFP-NOLSE-NEXT:    b.ne .LBB0_1
 ; SOFTFP-NOLSE-NEXT:  // %bb.4: // %cmpxchg.trystore
 ; SOFTFP-NOLSE-NEXT:    // in Loop: Header=BB0_3 Depth=2
@@ -146,13 +146,13 @@ define half @test_atomicrmw_fsub_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:  .LBB1_2: // %atomicrmw.start
 ; SOFTFP-NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; SOFTFP-NOLSE-NEXT:    // Child Loop BB1_3 Depth 2
-; SOFTFP-NOLSE-NEXT:    mov w22, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w20, #0xffff
-; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w21, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w20
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
-; SOFTFP-NOLSE-NEXT:    mov w1, w21
+; SOFTFP-NOLSE-NEXT:    mov w22, w0
+; SOFTFP-NOLSE-NEXT:    mov w0, w21
+; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
+; SOFTFP-NOLSE-NEXT:    mov w1, w22
 ; SOFTFP-NOLSE-NEXT:    bl __subsf3
 ; SOFTFP-NOLSE-NEXT:    bl __truncsfhf2
 ; SOFTFP-NOLSE-NEXT:    mov w8, w0
@@ -160,7 +160,7 @@ define half @test_atomicrmw_fsub_f16_seq_cst_align4(ptr %ptr, half %value) #0 {
 ; SOFTFP-NOLSE-NEXT:    // Parent Loop BB1_2 Depth=1
 ; SOFTFP-NOLSE-NEXT:    // => This Inner Loop Header: Depth=2
 ; SOFTFP-NOLSE-NEXT:    ldaxrh w0, [x19]
-; SOFTFP-NOLSE-NEXT:    cmp w0, w22, uxth
+; SOFTFP-NOLSE-NEXT:    cmp w0, w21, uxth
 ; SOFTFP-NOLSE-NEXT:    b.ne .LBB1_1
 ; SOFTFP-NOLSE-NEXT:  // %bb.4: // %cmpxchg.trystore
 ; SOFTFP-NOLSE-NEXT:    // in Loop: Header=BB1_3 Depth=2
@@ -711,19 +711,19 @@ define <2 x half> @test_atomicrmw_fsub_v2f16_seq_cst_align4(ptr %ptr, <2 x half>
 ; SOFTFP-NOLSE-NEXT:  .LBB7_2: // %atomicrmw.start
 ; SOFTFP-NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; SOFTFP-NOLSE-NEXT:    // Child Loop BB7_3 Depth 2
-; SOFTFP-NOLSE-NEXT:    and w0, w19, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w19
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w24, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w23, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w23
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w1, w24
 ; SOFTFP-NOLSE-NEXT:    bl __subsf3
 ; SOFTFP-NOLSE-NEXT:    bl __truncsfhf2
 ; SOFTFP-NOLSE-NEXT:    mov w24, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w21, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w21
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w25, w0
-; SOFTFP-NOLSE-NEXT:    and w0, w22, #0xffff
+; SOFTFP-NOLSE-NEXT:    mov w0, w22
 ; SOFTFP-NOLSE-NEXT:    bl __extendhfsf2
 ; SOFTFP-NOLSE-NEXT:    mov w1, w25
 ; SOFTFP-NOLSE-NEXT:    bl __subsf3
diff --git a/llvm/test/CodeGen/AArch64/strictfp_f16_abi_promote.ll b/llvm/test/CodeGen/AArc...
[truncated]

github-actions · 2025-02-26T15:19:27Z

✅ With the latest revision this PR passed the C/C++ code formatter.

jroelofs · 2025-02-26T16:57:27Z

I noticed that AArch64 is doing this extension in some cases, but that isn't needed because in the AArch64 PCS the high bits are unspecified, the callee does the extension. There was already a target lowering hook to configure this, so I've set it correctly for AArch64

The Darwin ABI is a little different here:

The caller of a function is responsible for signing or zero-extending any argument with fewer than 32 bits.

https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms#Pass-arguments-to-functions-correctly

arsenm · 2025-02-27T06:52:47Z

llvm/lib/CodeGen/AtomicExpandPass.cpp

+        if (TLI->shouldSignExtendTypeInLibCall(SizedIntTy, false /*IsSigned*/))
+          Attr = Attr.addParamAttribute(Ctx, Args.size() - 1, Attribute::SExt);
+        else
+          Attr = Attr.addParamAttribute(Ctx, Args.size() - 1, Attribute::ZExt);


Suggested change

if (TLI->shouldSignExtendTypeInLibCall(SizedIntTy, false /*IsSigned*/))

Attr = Attr.addParamAttribute(Ctx, Args.size() - 1, Attribute::SExt);

else

Attr = Attr.addParamAttribute(Ctx, Args.size() - 1, Attribute::ZExt);

Attribute::AttrKind ExtAttr = TLI->shouldSignExtendTypeInLibCall(SizedIntTy, /*IsSigned=*/false) ? Attribute::SExt : Attribute::ZExt;

Attr = Attr.addParamAttribute(Ctx, Args.size() - 1, AttrKind);

arsenm · 2025-02-27T06:53:58Z

llvm/lib/CodeGen/AtomicExpandPass.cpp

+      // the target's ABI.
+      if (TLI->shouldExtendTypeInLibCall(
+              TLI->getMemValueType(DL, SizedIntTy))) {
+        // The only atomic operations affected by signedness are min/max, and
+        // we don't have __atomic_ libcalls for them, so IsSigned is always
+        // false.
+        if (TLI->shouldSignExtendTypeInLibCall(SizedIntTy, false /*IsSigned*/))


This is using 3 levels of terrible TargetLowering API. Really we should be able to query the signature of the specific libcall from RuntimeLibcalls, not these global overrides.

The part I understand least is why you need getMemValueType

ostannard added 5 commits February 26, 2025 15:00

[AArch64] Don't extend arguments to libcalls

b2b9dd1

In AAPCS64, the high bits of registers used to pass small arguments are unspecified, and it is the callee's job to sign- or zero-extend them if needed. This means that we don't need to do this extension in the caller.

Re-generate test (whitespace change only)

51a48e3

[ARM] Add tests showing missing truncation

afbfb86

ostannard requested review from davemgreen, efriedma-quic and nikic February 26, 2025 15:14

llvmbot added backend:ARM backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well labels Feb 26, 2025

clang-format

f7de953

arsenm reviewed Feb 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ARM] Use correct ABI for atomic functions #128891

[ARM] Use correct ABI for atomic functions #128891

Uh oh!

ostannard commented Feb 26, 2025

Uh oh!

llvmbot commented Feb 26, 2025 •

edited

Loading

Uh oh!

llvmbot commented Feb 26, 2025

Uh oh!

github-actions bot commented Feb 26, 2025 •

edited

Loading

Uh oh!

jroelofs commented Feb 26, 2025

Uh oh!

arsenm Feb 27, 2025

Uh oh!

arsenm Feb 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[ARM] Use correct ABI for atomic functions #128891

Are you sure you want to change the base?

[ARM] Use correct ABI for atomic functions #128891

Uh oh!

Conversation

ostannard commented Feb 26, 2025

Uh oh!

llvmbot commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 26, 2025

Uh oh!

github-actions bot commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jroelofs commented Feb 26, 2025

Uh oh!

arsenm Feb 27, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Feb 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

llvmbot commented Feb 26, 2025 •

edited

Loading

github-actions bot commented Feb 26, 2025 •

edited

Loading