[SDAG] Merge multiple-result libcall expansion into DAG.expandMultipleResultFPLibCall() #114792

MacDue · 2024-11-04T13:43:38Z

This merges the logic for expanding both FFREXP and FSINCOS into one method DAG.expandMultipleResultFPLibCall(). This reduces duplication and also allows FFREXP to benefit from the stack slot elimination implemented for FSINCOS. This method will also be used in future to implement more multiple-result intrinsics (such as modf and sincospi).

…eResultFPLibCall() This merges the logic for expanding both FFREXP and FSINCOS into one method `DAG.expandMultipleResultFPLibCall(). This reduces duplication and also allows FFREXP to benefit from the stack slot elimination implemented for FSINCOS. This method will also be used in future to implement more multiple-result intrinsics (such as modf and sincospi).

llvmbot · 2024-11-04T13:44:12Z

@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-selectiondag

Author: Benjamin Maxwell (MacDue)

Changes

This merges the logic for expanding both FFREXP and FSINCOS into one method DAG.expandMultipleResultFPLibCall(). This reduces duplication and also allows FFREXP to benefit from the stack slot elimination implemented for FSINCOS. This method will also be used in future to implement more multiple-result intrinsics (such as modf and sincospi).

Full diff: https://github.com/llvm/llvm-project/pull/114792.diff

5 Files Affected:

(modified) llvm/include/llvm/CodeGen/SelectionDAG.h (+9-2)
(modified) llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp (+6-46)
(modified) llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp (+15-3)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+59-27)
(modified) llvm/test/CodeGen/X86/llvm.frexp.ll (+12-33)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h
index e03773f46ae092..9035aa3ea31278 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -34,6 +34,7 @@
 #include "llvm/IR/ConstantRange.h"
 #include "llvm/IR/DebugLoc.h"
 #include "llvm/IR/Metadata.h"
+#include "llvm/IR/RuntimeLibcalls.h"
 #include "llvm/Support/Allocator.h"
 #include "llvm/Support/ArrayRecycler.h"
 #include "llvm/Support/CodeGen.h"
@@ -1595,8 +1596,14 @@ class SelectionDAG {
   SDValue getPartialReduceAdd(SDLoc DL, EVT ReducedTy, SDValue Op1,
                               SDValue Op2);
 
-  /// Expand the specified \c ISD::FSINCOS node as the Legalize pass would.
-  bool expandFSINCOS(SDNode *Node, SmallVectorImpl<SDValue> &Results);
+  /// Expands a node with multiple results to an FP or vector libcall. The
+  /// libcall is expected to take all the operands of the \p Node followed by
+  /// output pointers for each of the results. \p CallRetResNo can be optionally
+  /// set to indicate that one of the results comes from the libcall's return
+  /// value.
+  bool expandMultipleResultFPLibCall(RTLIB::Libcall LC, SDNode *Node,
+                                     SmallVectorImpl<SDValue> &Results,
+                                     std::optional<unsigned> CallRetResNo = {});
 
   /// Expand the specified \c ISD::VAARG node as the Legalize pass would.
   SDValue expandVAArg(SDNode *Node);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index 142774ef4f2e40..6c160e6e90a1fb 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -132,7 +132,6 @@ class SelectionDAGLegalize {
                         TargetLowering::ArgListTy &&Args, bool isSigned);
   std::pair<SDValue, SDValue> ExpandLibCall(RTLIB::Libcall LC, SDNode *Node, bool isSigned);
 
-  void ExpandFrexpLibCall(SDNode *Node, SmallVectorImpl<SDValue> &Results);
   void ExpandFPLibCall(SDNode *Node, RTLIB::Libcall LC,
                        SmallVectorImpl<SDValue> &Results);
   void ExpandFPLibCall(SDNode *Node, RTLIB::Libcall Call_F32,
@@ -2144,47 +2143,6 @@ std::pair<SDValue, SDValue> SelectionDAGLegalize::ExpandLibCall(RTLIB::Libcall L
   return ExpandLibCall(LC, Node, std::move(Args), isSigned);
 }
 
-void SelectionDAGLegalize::ExpandFrexpLibCall(
-    SDNode *Node, SmallVectorImpl<SDValue> &Results) {
-  SDLoc dl(Node);
-  EVT VT = Node->getValueType(0);
-  EVT ExpVT = Node->getValueType(1);
-
-  SDValue FPOp = Node->getOperand(0);
-
-  EVT ArgVT = FPOp.getValueType();
-  Type *ArgTy = ArgVT.getTypeForEVT(*DAG.getContext());
-
-  TargetLowering::ArgListEntry FPArgEntry;
-  FPArgEntry.Node = FPOp;
-  FPArgEntry.Ty = ArgTy;
-
-  SDValue StackSlot = DAG.CreateStackTemporary(ExpVT);
-  TargetLowering::ArgListEntry PtrArgEntry;
-  PtrArgEntry.Node = StackSlot;
-  PtrArgEntry.Ty = PointerType::get(*DAG.getContext(),
-                                    DAG.getDataLayout().getAllocaAddrSpace());
-
-  TargetLowering::ArgListTy Args = {FPArgEntry, PtrArgEntry};
-
-  RTLIB::Libcall LC = RTLIB::getFREXP(VT);
-  auto [Call, Chain] = ExpandLibCall(LC, Node, std::move(Args), false);
-
-  // FIXME: Get type of int for libcall declaration and cast
-
-  int FrameIdx = cast<FrameIndexSDNode>(StackSlot)->getIndex();
-  auto PtrInfo =
-      MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), FrameIdx);
-
-  SDValue LoadExp = DAG.getLoad(ExpVT, dl, Chain, StackSlot, PtrInfo);
-  SDValue OutputChain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other,
-                                    LoadExp.getValue(1), DAG.getRoot());
-  DAG.setRoot(OutputChain);
-
-  Results.push_back(Call);
-  Results.push_back(LoadExp);
-}
-
 void SelectionDAGLegalize::ExpandFPLibCall(SDNode* Node,
                                            RTLIB::Libcall LC,
                                            SmallVectorImpl<SDValue> &Results) {
@@ -4562,10 +4520,11 @@ void SelectionDAGLegalize::ConvertNodeToLibcall(SDNode *Node) {
     ExpandFPLibCall(Node, RTLIB::TANH_F32, RTLIB::TANH_F64, RTLIB::TANH_F80,
                     RTLIB::TANH_F128, RTLIB::TANH_PPCF128, Results);
     break;
-  case ISD::FSINCOS:
-    // Expand into sincos libcall.
-    (void)DAG.expandFSINCOS(Node, Results);
+  case ISD::FSINCOS: {
+    RTLIB::Libcall LC = RTLIB::getFSINCOS(Node->getValueType(0));
+    DAG.expandMultipleResultFPLibCall(LC, Node, Results);
     break;
+  }
   case ISD::FLOG:
   case ISD::STRICT_FLOG:
     ExpandFPLibCall(Node, RTLIB::LOG_F32, RTLIB::LOG_F64, RTLIB::LOG_F80,
@@ -4649,7 +4608,8 @@ void SelectionDAGLegalize::ConvertNodeToLibcall(SDNode *Node) {
                     RTLIB::LDEXP_F128, RTLIB::LDEXP_PPCF128, Results);
     break;
   case ISD::FFREXP: {
-    ExpandFrexpLibCall(Node, Results);
+    RTLIB::Libcall LC = RTLIB::getFREXP(Node->getValueType(0));
+    DAG.expandMultipleResultFPLibCall(LC, Node, Results, /*CallRetResNo=*/0);
     break;
   }
   case ISD::FPOWI:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
index 8403c98545187a..6630357410a99b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
@@ -27,6 +27,7 @@
 //===----------------------------------------------------------------------===//
 
 #include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/ScopeExit.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/Analysis/TargetLibraryInfo.h"
 #include "llvm/Analysis/VectorUtils.h"
@@ -270,6 +271,15 @@ SDValue VectorLegalizer::LegalizeOp(SDValue Op) {
   DenseMap<SDValue, SDValue>::iterator I = LegalizedNodes.find(Op);
   if (I != LegalizedNodes.end()) return I->second;
 
+  // Handle legalizing the root if it changes.
+  auto FixupRoot = make_scope_exit([&, OldRoot = DAG.getRoot()] {
+    SDValue Root = DAG.getRoot();
+    if (Root != OldRoot) {
+      if (SDValue LegalRoot = LegalizeOp(Root))
+        DAG.setRoot(LegalRoot);
+    }
+  });
+
   // Legalize the operands
   SmallVector<SDValue, 8> Ops;
   for (const SDValue &Oper : Op->op_values())
@@ -1192,11 +1202,13 @@ void VectorLegalizer::Expand(SDNode *Node, SmallVectorImpl<SDValue> &Results) {
       return;
 
     break;
-  case ISD::FSINCOS:
-    if (DAG.expandFSINCOS(Node, Results))
+  case ISD::FSINCOS: {
+    RTLIB::Libcall LC =
+        RTLIB::getFSINCOS(Node->getValueType(0).getVectorElementType());
+    if (DAG.expandMultipleResultFPLibCall(LC, Node, Results))
       return;
-
     break;
+  }
   case ISD::VECTOR_COMPRESS:
     Results.push_back(TLI.expandVECTOR_COMPRESS(Node, DAG));
     return;
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 166b6dbf46db87..610e274caaafe3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -2484,13 +2484,12 @@ SDValue SelectionDAG::getPartialReduceAdd(SDLoc DL, EVT ReducedTy, SDValue Op1,
   return Subvectors[0];
 }
 
-bool SelectionDAG::expandFSINCOS(SDNode *Node,
-                                 SmallVectorImpl<SDValue> &Results) {
+bool SelectionDAG::expandMultipleResultFPLibCall(
+    RTLIB::Libcall LC, SDNode *Node, SmallVectorImpl<SDValue> &Results,
+    std::optional<unsigned> CallRetResNo) {
+  LLVMContext &Ctx = *getContext();
   EVT VT = Node->getValueType(0);
-  LLVMContext *Ctx = getContext();
-  Type *Ty = VT.getTypeForEVT(*Ctx);
-  RTLIB::Libcall LC =
-      RTLIB::getFSINCOS(VT.isVector() ? VT.getVectorElementType() : VT);
+  unsigned NumResults = Node->getNumValues();
 
   const char *LCName = TLI->getLibcallName(LC);
   if (!LC || !LCName)
@@ -2506,6 +2505,7 @@ bool SelectionDAG::expandFSINCOS(SDNode *Node,
     return nullptr;
   };
 
+  // For vector types, we must find a vector mapping for the libcall.
   VecDesc const *VD = nullptr;
   if (VT.isVector() && !(VD = getVecDesc()))
     return false;
@@ -2513,71 +2513,103 @@ bool SelectionDAG::expandFSINCOS(SDNode *Node,
   // Find users of the node that store the results (and share input chains). The
   // destination pointers can be used instead of creating stack allocations.
   SDValue StoresInChain{};
-  std::array<StoreSDNode *, 2> ResultStores = {nullptr};
+  SmallVector<StoreSDNode *, 2> ResultStores(NumResults);
   for (SDNode *User : Node->uses()) {
     if (!ISD::isNormalStore(User))
       continue;
     auto *ST = cast<StoreSDNode>(User);
-    if (!ST->isSimple() || ST->getAddressSpace() != 0 ||
-        ST->getAlign() < getDataLayout().getABITypeAlign(Ty->getScalarType()) ||
+    SDValue StoreValue = ST->getValue();
+    unsigned ResNo = StoreValue.getResNo();
+    Type *StoreType = StoreValue.getValueType().getTypeForEVT(Ctx);
+    if (CallRetResNo == ResNo || !ST->isSimple() ||
+        ST->getAddressSpace() != 0 ||
+        ST->getAlign() <
+            getDataLayout().getABITypeAlign(StoreType->getScalarType()) ||
         (StoresInChain && ST->getChain() != StoresInChain) ||
         Node->isPredecessorOf(ST->getChain().getNode()))
       continue;
-    ResultStores[ST->getValue().getResNo()] = ST;
+    ResultStores[ResNo] = ST;
     StoresInChain = ST->getChain();
   }
 
   TargetLowering::ArgListTy Args;
   TargetLowering::ArgListEntry Entry{};
 
-  // Pass the argument.
-  Entry.Node = Node->getOperand(0);
-  Entry.Ty = Ty;
-  Args.push_back(Entry);
+  // Pass the arguments.
+  for (const SDValue &Op : Node->op_values()) {
+    EVT ArgVT = Op.getValueType();
+    Type *ArgTy = ArgVT.getTypeForEVT(Ctx);
+    Entry.Node = Node->getOperand(0);
+    Entry.Ty = ArgTy;
+    Args.push_back(Entry);
+  }
 
-  // Pass the output pointers for sin and cos.
-  SmallVector<SDValue, 2> ResultPtrs{};
-  for (StoreSDNode *ST : ResultStores) {
-    SDValue ResultPtr = ST ? ST->getBasePtr() : CreateStackTemporary(VT);
+  // Pass the output pointers.
+  SmallVector<SDValue, 2> ResultPtrs(NumResults);
+  for (auto [ResNo, ST] : llvm::enumerate(ResultStores)) {
+    if (ResNo == CallRetResNo)
+      continue;
+    EVT ResVT = Node->getValueType(ResNo);
+    SDValue ResultPtr = ST ? ST->getBasePtr() : CreateStackTemporary(ResVT);
     Entry.Node = ResultPtr;
-    Entry.Ty = PointerType::getUnqual(Ty->getContext());
+    Entry.Ty = PointerType::getUnqual(Ctx);
+    ResultPtrs[ResNo] = ResultPtr;
     Args.push_back(Entry);
-    ResultPtrs.push_back(ResultPtr);
   }
 
   SDLoc DL(Node);
 
+  // Pass the vector mask (if required).
   if (VD && VD->isMasked()) {
-    EVT MaskVT = TLI->getSetCCResultType(getDataLayout(), *Ctx, VT);
+    EVT MaskVT = TLI->getSetCCResultType(getDataLayout(), Ctx, VT);
     Entry.Node = getBoolConstant(true, DL, MaskVT, VT);
-    Entry.Ty = MaskVT.getTypeForEVT(*Ctx);
+    Entry.Ty = MaskVT.getTypeForEVT(Ctx);
     Args.push_back(Entry);
   }
 
+  Type *RetType = CallRetResNo.has_value()
+                      ? Node->getValueType(*CallRetResNo).getTypeForEVT(Ctx)
+                      : Type::getVoidTy(Ctx);
   SDValue InChain = StoresInChain ? StoresInChain : getEntryNode();
   SDValue Callee = getExternalSymbol(VD ? VD->getVectorFnName().data() : LCName,
                                      TLI->getPointerTy(getDataLayout()));
   TargetLowering::CallLoweringInfo CLI(*this);
   CLI.setDebugLoc(DL).setChain(InChain).setLibCallee(
-      TLI->getLibcallCallingConv(LC), Type::getVoidTy(*Ctx), Callee,
-      std::move(Args));
+      TLI->getLibcallCallingConv(LC), RetType, Callee, std::move(Args));
 
-  auto [Call, OutChain] = TLI->LowerCallTo(CLI);
+  auto [Call, CallChain] = TLI->LowerCallTo(CLI);
 
   for (auto [ResNo, ResultPtr] : llvm::enumerate(ResultPtrs)) {
+    if (ResNo == CallRetResNo) {
+      Results.push_back(Call);
+      continue;
+    }
     MachinePointerInfo PtrInfo;
     if (StoreSDNode *ST = ResultStores[ResNo]) {
       // Replace store with the library call.
-      ReplaceAllUsesOfValueWith(SDValue(ST, 0), OutChain);
+      ReplaceAllUsesOfValueWith(SDValue(ST, 0), CallChain);
       PtrInfo = ST->getPointerInfo();
     } else {
       PtrInfo = MachinePointerInfo::getFixedStack(
           getMachineFunction(), cast<FrameIndexSDNode>(ResultPtr)->getIndex());
     }
-    SDValue LoadResult = getLoad(VT, DL, OutChain, ResultPtr, PtrInfo);
+    SDValue LoadResult =
+        getLoad(Node->getValueType(ResNo), DL, CallChain, ResultPtr, PtrInfo);
     Results.push_back(LoadResult);
   }
 
+  // FIXME: Find a way to avoid updating the root. This is needed for x86, which
+  // uses a floating-point stack. If (for example) the node to be expanded has
+  // two results one floating-point which is returned by the call, and one
+  // integer result, set returned via an output pointer. If only the integer
+  // result is used then the `CopyFromReg` for the FP result may be optimized
+  // out. This prevents an FP stack pop from being emitted for it. Setting the
+  // root like this ensures there will be a use of the `CopyFromReg` chain, and
+  // ensures the FP pop will be emitted.
+  SDValue OutputChain =
+      getNode(ISD::TokenFactor, DL, MVT::Other, getRoot(), CallChain);
+  setRoot(OutputChain);
+
   return true;
 }
 
diff --git a/llvm/test/CodeGen/X86/llvm.frexp.ll b/llvm/test/CodeGen/X86/llvm.frexp.ll
index cd560ad627de4c..96de34519556d0 100644
--- a/llvm/test/CodeGen/X86/llvm.frexp.ll
+++ b/llvm/test/CodeGen/X86/llvm.frexp.ll
@@ -325,28 +325,27 @@ define { <4 x float>, <4 x i32> } @test_frexp_v4f32_v4i32(<4 x float> %a) {
 ;
 ; WIN32-LABEL: test_frexp_v4f32_v4i32:
 ; WIN32:       # %bb.0:
-; WIN32-NEXT:    pushl %edi
 ; WIN32-NEXT:    pushl %esi
-; WIN32-NEXT:    subl $60, %esp
+; WIN32-NEXT:    subl $44, %esp
 ; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %esi
-; WIN32-NEXT:    leal {{[0-9]+}}(%esp), %eax
+; WIN32-NEXT:    leal 24(%esi), %eax
 ; WIN32-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    fstpl (%esp)
 ; WIN32-NEXT:    calll _frexp
 ; WIN32-NEXT:    fstpl {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Folded Spill
-; WIN32-NEXT:    leal {{[0-9]+}}(%esp), %eax
+; WIN32-NEXT:    leal 20(%esi), %eax
 ; WIN32-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    fstpl (%esp)
 ; WIN32-NEXT:    calll _frexp
 ; WIN32-NEXT:    fstpl {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Folded Spill
-; WIN32-NEXT:    leal {{[0-9]+}}(%esp), %eax
+; WIN32-NEXT:    leal 16(%esi), %eax
 ; WIN32-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    fstpl (%esp)
 ; WIN32-NEXT:    calll _frexp
-; WIN32-NEXT:    leal {{[0-9]+}}(%esp), %eax
+; WIN32-NEXT:    leal 28(%esi), %eax
 ; WIN32-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    fstpl (%esp)
@@ -361,22 +360,13 @@ define { <4 x float>, <4 x i32> } @test_frexp_v4f32_v4i32(<4 x float> %a) {
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
-; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %ecx
-; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %edx
-; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN32-NEXT:    movl %edi, 28(%esi)
-; WIN32-NEXT:    movl %edx, 24(%esi)
-; WIN32-NEXT:    movl %ecx, 20(%esi)
-; WIN32-NEXT:    movl %eax, 16(%esi)
 ; WIN32-NEXT:    fstps 12(%esi)
 ; WIN32-NEXT:    fstps 8(%esi)
 ; WIN32-NEXT:    fstps 4(%esi)
 ; WIN32-NEXT:    fstps (%esi)
 ; WIN32-NEXT:    movl %esi, %eax
-; WIN32-NEXT:    addl $60, %esp
+; WIN32-NEXT:    addl $44, %esp
 ; WIN32-NEXT:    popl %esi
-; WIN32-NEXT:    popl %edi
 ; WIN32-NEXT:    retl
   %result = call { <4 x float>, <4 x i32> } @llvm.frexp.v4f32.v4i32(<4 x float> %a)
   ret { <4 x float>, <4 x i32> } %result
@@ -499,46 +489,35 @@ define <4 x i32> @test_frexp_v4f32_v4i32_only_use_exp(<4 x float> %a) {
 ;
 ; WIN32-LABEL: test_frexp_v4f32_v4i32_only_use_exp:
 ; WIN32:       # %bb.0:
-; WIN32-NEXT:    pushl %edi
 ; WIN32-NEXT:    pushl %esi
-; WIN32-NEXT:    subl $28, %esp
+; WIN32-NEXT:    subl $12, %esp
 ; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %esi
-; WIN32-NEXT:    leal {{[0-9]+}}(%esp), %eax
+; WIN32-NEXT:    leal 8(%esi), %eax
 ; WIN32-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    fstpl (%esp)
 ; WIN32-NEXT:    calll _frexp
 ; WIN32-NEXT:    fstp %st(0)
-; WIN32-NEXT:    leal {{[0-9]+}}(%esp), %eax
+; WIN32-NEXT:    leal 4(%esi), %eax
 ; WIN32-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    fstpl (%esp)
 ; WIN32-NEXT:    calll _frexp
 ; WIN32-NEXT:    fstp %st(0)
-; WIN32-NEXT:    leal {{[0-9]+}}(%esp), %eax
+; WIN32-NEXT:    leal 12(%esi), %eax
 ; WIN32-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    fstpl (%esp)
 ; WIN32-NEXT:    calll _frexp
 ; WIN32-NEXT:    fstp %st(0)
-; WIN32-NEXT:    leal {{[0-9]+}}(%esp), %eax
-; WIN32-NEXT:    movl %eax, {{[0-9]+}}(%esp)
+; WIN32-NEXT:    movl %esi, {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    flds {{[0-9]+}}(%esp)
 ; WIN32-NEXT:    fstpl (%esp)
 ; WIN32-NEXT:    calll _frexp
 ; WIN32-NEXT:    fstp %st(0)
-; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %ecx
-; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %edx
-; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN32-NEXT:    movl %edi, 12(%esi)
-; WIN32-NEXT:    movl %edx, 8(%esi)
-; WIN32-NEXT:    movl %ecx, 4(%esi)
-; WIN32-NEXT:    movl %eax, (%esi)
 ; WIN32-NEXT:    movl %esi, %eax
-; WIN32-NEXT:    addl $28, %esp
+; WIN32-NEXT:    addl $12, %esp
 ; WIN32-NEXT:    popl %esi
-; WIN32-NEXT:    popl %edi
 ; WIN32-NEXT:    retl
   %result = call { <4 x float>, <4 x i32> } @llvm.frexp.v4f32.v4i32(<4 x float> %a)
   %result.1 = extractvalue { <4 x float>, <4 x i32> } %result, 1

MacDue · 2024-11-04T13:48:00Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

+  // FIXME: Find a way to avoid updating the root. This is needed for x86, which
+  // uses a floating-point stack. If (for example) the node to be expanded has
+  // two results one floating-point which is returned by the call, and one
+  // integer result, set returned via an output pointer. If only the integer
+  // result is used then the `CopyFromReg` for the FP result may be optimized
+  // out. This prevents an FP stack pop from being emitted for it. Setting the
+  // root like this ensures there will be a use of the `CopyFromReg` chain, and
+  // ensures the FP pop will be emitted.
+  SDValue OutputChain =
+      getNode(ISD::TokenFactor, DL, MVT::Other, getRoot(), CallChain);
+  setRoot(OutputChain);


This logic came from the frexp expansion I added the FIXME/explanation as I was initially quite confused about why this was necessary. This required making the small change to LegalizeVectorOps.cpp included in the patch.

MacDue · 2024-11-04T13:51:53Z

llvm/test/CodeGen/X86/llvm.frexp.ll

-; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %ecx
-; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %edx
-; WIN32-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN32-NEXT:    movl %edi, 28(%esi)
-; WIN32-NEXT:    movl %edx, 24(%esi)
-; WIN32-NEXT:    movl %ecx, 20(%esi)
-; WIN32-NEXT:    movl %eax, 16(%esi)


As far as I can tell, this is just removing some redundant stack slots + copies, though I'm less familiar with x86 assembly. Previously, it went stack slot -> memory pointed at by %esi. Now it just directly uses the %esi pointers.

(the same appears to be the case for the other targets updated)

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp

arsenm · 2024-11-04T23:12:05Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

+  // FIXME: Find a way to avoid updating the root. This is needed for x86, which
+  // uses a floating-point stack. If (for example) the node to be expanded has
+  // two results one floating-point which is returned by the call, and one
+  // integer result, set returned via an output pointer. If only the integer
+  // result is used then the `CopyFromReg` for the FP result may be optimized
+  // out. This prevents an FP stack pop from being emitted for it. Setting the
+  // root like this ensures there will be a use of the `CopyFromReg` chain, and
+  // ensures the FP pop will be emitted.
+  SDValue OutputChain =
+      getNode(ISD::TokenFactor, DL, MVT::Other, getRoot(), CallChain);
+  setRoot(LegalizeOp ? LegalizeOp(OutputChain) : OutputChain);


I do not understand this, and the explanation sounds like it's working around something else broken. There shouldn't be any need for a callback. The chain logic should be unconditional, both of these cases have the same properties

This workaround was added for the frexp expansion in 1d92b68. I added the FIXME as I think there should be a better solution here, but I could not come up with anything decent 😅

The result chain from a frexp expansion is something like f80,ch,glue = CopyFromReg t23, Register:f80 $fp0 (for x86). That corresponds to the FP result, and the integer result comes from an output pointer.

For the broken cases (fixed in that commit), the FP result is unused. The load of the integer result takes its in-chain from the CopyFromReg (which is good). However, in DAG combine, it'll find a better chain that does not include the CopyFromReg (it seems it can always skip a CopyFromReg to find better memory chains).

That CopyFromReg then becomes unused and DCE'd. The x86 backend then won't emit an FP stack pop (fstp %st(0))) for the FP result, which is bad...

Updating the root creates a fake use of the CopyFromReg chain, which prevents it from being optimized out (and hence keeps the pop).

This is only really an issue for calls with multiple results (since for single result calls if the result was unused, the node would have been DCE'd before the call expansion)

The logic here is unconditional -- the chain is updated the same way callback or not (and I think the legalization is a no-op). However, vector legalization keeps track of the legalized nodes. Since the new root node is not one of the results it does not get automatically legalized, and if not explicitly legalized we'll eventually hit an assert.

I realized I can slightly tweak the DAG to avoid the need for the callback, so I've done that now 🙂

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

llvm/include/llvm/CodeGen/SelectionDAG.h

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

sdesmalen-arm

I don't know if there's any better to suggest for the CopyFromReg that is optimised away when it has no uses, but it doesn't seem to break or regress anything either. So overall LGTM.

MacDue · 2024-11-06T11:05:51Z

I don't know if there's any better to suggest for the CopyFromReg that is optimised away when it has no uses, but it doesn't seem to break or regress anything either. So overall LGTM.

Given that the workaround here has been in the repo for about a year, I think it should be safe 🙂

MacDue requested review from arsenm, efriedma-quic and sdesmalen-arm November 4, 2024 13:43

llvmbot added backend:X86 llvm:SelectionDAG SelectionDAGISel as well labels Nov 4, 2024

MacDue commented Nov 4, 2024

View reviewed changes

Update more tests

b092e2b

llvmbot added the backend:PowerPC label Nov 4, 2024

arsenm reviewed Nov 4, 2024

View reviewed changes

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp Outdated Show resolved Hide resolved

Use callback

d234e52

arsenm reviewed Nov 4, 2024

View reviewed changes

Remove callback

b86b178

sdesmalen-arm reviewed Nov 5, 2024

View reviewed changes

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp Outdated Show resolved Hide resolved

llvm/include/llvm/CodeGen/SelectionDAG.h Show resolved Hide resolved

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp Outdated Show resolved Hide resolved

Fixups

4638ce5

sdesmalen-arm reviewed Nov 5, 2024

View reviewed changes

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp Outdated Show resolved Hide resolved

Make root update conditional

e620edf

sdesmalen-arm approved these changes Nov 5, 2024

View reviewed changes

MacDue merged commit ea6b8fa into llvm:main Nov 6, 2024
8 checks passed

MacDue deleted the multi_result_libcall_expand branch November 6, 2024 11:06

[SDAG] Merge multiple-result libcall expansion into DAG.expandMultipleResultFPLibCall() #114792

[SDAG] Merge multiple-result libcall expansion into DAG.expandMultipleResultFPLibCall() #114792

Uh oh!

Conversation

MacDue commented Nov 4, 2024

Uh oh!

llvmbot commented Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MacDue Nov 4, 2024

Choose a reason for hiding this comment

Uh oh!

MacDue Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MacDue Nov 4, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

arsenm Nov 4, 2024

Choose a reason for hiding this comment

Uh oh!

MacDue Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MacDue Nov 4, 2024

Choose a reason for hiding this comment

Uh oh!

MacDue Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sdesmalen-arm left a comment

Choose a reason for hiding this comment

Uh oh!

MacDue commented Nov 6, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

llvmbot commented Nov 4, 2024 •

edited

Loading

MacDue Nov 4, 2024 •

edited

Loading

MacDue Nov 4, 2024 •

edited

Loading

MacDue Nov 5, 2024 •

edited

Loading