-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[SDAG] Merge multiple-result libcall expansion into DAG.expandMultipleResultFPLibCall() #114792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…eResultFPLibCall() This merges the logic for expanding both FFREXP and FSINCOS into one method `DAG.expandMultipleResultFPLibCall(). This reduces duplication and also allows FFREXP to benefit from the stack slot elimination implemented for FSINCOS. This method will also be used in future to implement more multiple-result intrinsics (such as modf and sincospi).
|
@llvm/pr-subscribers-backend-powerpc @llvm/pr-subscribers-llvm-selectiondag Author: Benjamin Maxwell (MacDue) ChangesThis merges the logic for expanding both FFREXP and FSINCOS into one method Full diff: https://github.com/llvm/llvm-project/pull/114792.diff 5 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h
index e03773f46ae092..9035aa3ea31278 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -34,6 +34,7 @@
#include "llvm/IR/ConstantRange.h"
#include "llvm/IR/DebugLoc.h"
#include "llvm/IR/Metadata.h"
+#include "llvm/IR/RuntimeLibcalls.h"
#include "llvm/Support/Allocator.h"
#include "llvm/Support/ArrayRecycler.h"
#include "llvm/Support/CodeGen.h"
@@ -1595,8 +1596,14 @@ class SelectionDAG {
SDValue getPartialReduceAdd(SDLoc DL, EVT ReducedTy, SDValue Op1,
SDValue Op2);
- /// Expand the specified \c ISD::FSINCOS node as the Legalize pass would.
- bool expandFSINCOS(SDNode *Node, SmallVectorImpl<SDValue> &Results);
+ /// Expands a node with multiple results to an FP or vector libcall. The
+ /// libcall is expected to take all the operands of the \p Node followed by
+ /// output pointers for each of the results. \p CallRetResNo can be optionally
+ /// set to indicate that one of the results comes from the libcall's return
+ /// value.
+ bool expandMultipleResultFPLibCall(RTLIB::Libcall LC, SDNode *Node,
+ SmallVectorImpl<SDValue> &Results,
+ std::optional<unsigned> CallRetResNo = {});
/// Expand the specified \c ISD::VAARG node as the Legalize pass would.
SDValue expandVAArg(SDNode *Node);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index 142774ef4f2e40..6c160e6e90a1fb 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -132,7 +132,6 @@ class SelectionDAGLegalize {
TargetLowering::ArgListTy &&Args, bool isSigned);
std::pair<SDValue, SDValue> ExpandLibCall(RTLIB::Libcall LC, SDNode *Node, bool isSigned);
- void ExpandFrexpLibCall(SDNode *Node, SmallVectorImpl<SDValue> &Results);
void ExpandFPLibCall(SDNode *Node, RTLIB::Libcall LC,
SmallVectorImpl<SDValue> &Results);
void ExpandFPLibCall(SDNode *Node, RTLIB::Libcall Call_F32,
@@ -2144,47 +2143,6 @@ std::pair<SDValue, SDValue> SelectionDAGLegalize::ExpandLibCall(RTLIB::Libcall L
return ExpandLibCall(LC, Node, std::move(Args), isSigned);
}
-void SelectionDAGLegalize::ExpandFrexpLibCall(
- SDNode *Node, SmallVectorImpl<SDValue> &Results) {
- SDLoc dl(Node);
- EVT VT = Node->getValueType(0);
- EVT ExpVT = Node->getValueType(1);
-
- SDValue FPOp = Node->getOperand(0);
-
- EVT ArgVT = FPOp.getValueType();
- Type *ArgTy = ArgVT.getTypeForEVT(*DAG.getContext());
-
- TargetLowering::ArgListEntry FPArgEntry;
- FPArgEntry.Node = FPOp;
- FPArgEntry.Ty = ArgTy;
-
- SDValue StackSlot = DAG.CreateStackTemporary(ExpVT);
- TargetLowering::ArgListEntry PtrArgEntry;
- PtrArgEntry.Node = StackSlot;
- PtrArgEntry.Ty = PointerType::get(*DAG.getContext(),
- DAG.getDataLayout().getAllocaAddrSpace());
-
- TargetLowering::ArgListTy Args = {FPArgEntry, PtrArgEntry};
-
- RTLIB::Libcall LC = RTLIB::getFREXP(VT);
- auto [Call, Chain] = ExpandLibCall(LC, Node, std::move(Args), false);
-
- // FIXME: Get type of int for libcall declaration and cast
-
- int FrameIdx = cast<FrameIndexSDNode>(StackSlot)->getIndex();
- auto PtrInfo =
- MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), FrameIdx);
-
- SDValue LoadExp = DAG.getLoad(ExpVT, dl, Chain, StackSlot, PtrInfo);
- SDValue OutputChain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other,
- LoadExp.getValue(1), DAG.getRoot());
- DAG.setRoot(OutputChain);
-
- Results.push_back(Call);
- Results.push_back(LoadExp);
-}
-
void SelectionDAGLegalize::ExpandFPLibCall(SDNode* Node,
RTLIB::Libcall LC,
SmallVectorImpl<SDValue> &Results) {
@@ -4562,10 +4520,11 @@ void SelectionDAGLegalize::ConvertNodeToLibcall(SDNode *Node) {
ExpandFPLibCall(Node, RTLIB::TANH_F32, RTLIB::TANH_F64, RTLIB::TANH_F80,
RTLIB::TANH_F128, RTLIB::TANH_PPCF128, Results);
break;
- case ISD::FSINCOS:
- // Expand into sincos libcall.
- (void)DAG.expandFSINCOS(Node, Results);
+ case ISD::FSINCOS: {
+ RTLIB::Libcall LC = RTLIB::getFSINCOS(Node->getValueType(0));
+ DAG.expandMultipleResultFPLibCall(LC, Node, Results);
break;
+ }
case ISD::FLOG:
case ISD::STRICT_FLOG:
ExpandFPLibCall(Node, RTLIB::LOG_F32, RTLIB::LOG_F64, RTLIB::LOG_F80,
@@ -4649,7 +4608,8 @@ void SelectionDAGLegalize::ConvertNodeToLibcall(SDNode *Node) {
RTLIB::LDEXP_F128, RTLIB::LDEXP_PPCF128, Results);
break;
case ISD::FFREXP: {
- ExpandFrexpLibCall(Node, Results);
+ RTLIB::Libcall LC = RTLIB::getFREXP(Node->getValueType(0));
+ DAG.expandMultipleResultFPLibCall(LC, Node, Results, /*CallRetResNo=*/0);
break;
}
case ISD::FPOWI:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
index 8403c98545187a..6630357410a99b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
@@ -27,6 +27,7 @@
//===----------------------------------------------------------------------===//
#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/ScopeExit.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/VectorUtils.h"
@@ -270,6 +271,15 @@ SDValue VectorLegalizer::LegalizeOp(SDValue Op) {
DenseMap<SDValue, SDValue>::iterator I = LegalizedNodes.find(Op);
if (I != LegalizedNodes.end()) return I->second;
+ // Handle legalizing the root if it changes.
+ auto FixupRoot = make_scope_exit([&, OldRoot = DAG.getRoot()] {
+ SDValue Root = DAG.getRoot();
+ if (Root != OldRoot) {
+ if (SDValue LegalRoot = LegalizeOp(Root))
+ DAG.setRoot(LegalRoot);
+ }
+ });
+
// Legalize the operands
SmallVector<SDValue, 8> Ops;
for (const SDValue &Oper : Op->op_values())
@@ -1192,11 +1202,13 @@ void VectorLegalizer::Expand(SDNode *Node, SmallVectorImpl<SDValue> &Results) {
return;
break;
- case ISD::FSINCOS:
- if (DAG.expandFSINCOS(Node, Results))
+ case ISD::FSINCOS: {
+ RTLIB::Libcall LC =
+ RTLIB::getFSINCOS(Node->getValueType(0).getVectorElementType());
+ if (DAG.expandMultipleResultFPLibCall(LC, Node, Results))
return;
-
break;
+ }
case ISD::VECTOR_COMPRESS:
Results.push_back(TLI.expandVECTOR_COMPRESS(Node, DAG));
return;
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 166b6dbf46db87..610e274caaafe3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -2484,13 +2484,12 @@ SDValue SelectionDAG::getPartialReduceAdd(SDLoc DL, EVT ReducedTy, SDValue Op1,
return Subvectors[0];
}
-bool SelectionDAG::expandFSINCOS(SDNode *Node,
- SmallVectorImpl<SDValue> &Results) {
+bool SelectionDAG::expandMultipleResultFPLibCall(
+ RTLIB::Libcall LC, SDNode *Node, SmallVectorImpl<SDValue> &Results,
+ std::optional<unsigned> CallRetResNo) {
+ LLVMContext &Ctx = *getContext();
EVT VT = Node->getValueType(0);
- LLVMContext *Ctx = getContext();
- Type *Ty = VT.getTypeForEVT(*Ctx);
- RTLIB::Libcall LC =
- RTLIB::getFSINCOS(VT.isVector() ? VT.getVectorElementType() : VT);
+ unsigned NumResults = Node->getNumValues();
const char *LCName = TLI->getLibcallName(LC);
if (!LC || !LCName)
@@ -2506,6 +2505,7 @@ bool SelectionDAG::expandFSINCOS(SDNode *Node,
return nullptr;
};
+ // For vector types, we must find a vector mapping for the libcall.
VecDesc const *VD = nullptr;
if (VT.isVector() && !(VD = getVecDesc()))
return false;
@@ -2513,71 +2513,103 @@ bool SelectionDAG::expandFSINCOS(SDNode *Node,
// Find users of the node that store the results (and share input chains). The
// destination pointers can be used instead of creating stack allocations.
SDValue StoresInChain{};
- std::array<StoreSDNode *, 2> ResultStores = {nullptr};
+ SmallVector<StoreSDNode *, 2> ResultStores(NumResults);
for (SDNode *User : Node->uses()) {
if (!ISD::isNormalStore(User))
continue;
auto *ST = cast<StoreSDNode>(User);
- if (!ST->isSimple() || ST->getAddressSpace() != 0 ||
- ST->getAlign() < getDataLayout().getABITypeAlign(Ty->getScalarType()) ||
+ SDValue StoreValue = ST->getValue();
+ unsigned ResNo = StoreValue.getResNo();
+ Type *StoreType = StoreValue.getValueType().getTypeForEVT(Ctx);
+ if (CallRetResNo == ResNo || !ST->isSimple() ||
+ ST->getAddressSpace() != 0 ||
+ ST->getAlign() <
+ getDataLayout().getABITypeAlign(StoreType->getScalarType()) ||
(StoresInChain && ST->getChain() != StoresInChain) ||
Node->isPredecessorOf(ST->getChain().getNode()))
continue;
- ResultStores[ST->getValue().getResNo()] = ST;
+ ResultStores[ResNo] = ST;
StoresInChain = ST->getChain();
}
TargetLowering::ArgListTy Args;
TargetLowering::ArgListEntry Entry{};
- // Pass the argument.
- Entry.Node = Node->getOperand(0);
- Entry.Ty = Ty;
- Args.push_back(Entry);
+ // Pass the arguments.
+ for (const SDValue &Op : Node->op_values()) {
+ EVT ArgVT = Op.getValueType();
+ Type *ArgTy = ArgVT.getTypeForEVT(Ctx);
+ Entry.Node = Node->getOperand(0);
+ Entry.Ty = ArgTy;
+ Args.push_back(Entry);
+ }
- // Pass the output pointers for sin and cos.
- SmallVector<SDValue, 2> ResultPtrs{};
- for (StoreSDNode *ST : ResultStores) {
- SDValue ResultPtr = ST ? ST->getBasePtr() : CreateStackTemporary(VT);
+ // Pass the output pointers.
+ SmallVector<SDValue, 2> ResultPtrs(NumResults);
+ for (auto [ResNo, ST] : llvm::enumerate(ResultStores)) {
+ if (ResNo == CallRetResNo)
+ continue;
+ EVT ResVT = Node->getValueType(ResNo);
+ SDValue ResultPtr = ST ? ST->getBasePtr() : CreateStackTemporary(ResVT);
Entry.Node = ResultPtr;
- Entry.Ty = PointerType::getUnqual(Ty->getContext());
+ Entry.Ty = PointerType::getUnqual(Ctx);
+ ResultPtrs[ResNo] = ResultPtr;
Args.push_back(Entry);
- ResultPtrs.push_back(ResultPtr);
}
SDLoc DL(Node);
+ // Pass the vector mask (if required).
if (VD && VD->isMasked()) {
- EVT MaskVT = TLI->getSetCCResultType(getDataLayout(), *Ctx, VT);
+ EVT MaskVT = TLI->getSetCCResultType(getDataLayout(), Ctx, VT);
Entry.Node = getBoolConstant(true, DL, MaskVT, VT);
- Entry.Ty = MaskVT.getTypeForEVT(*Ctx);
+ Entry.Ty = MaskVT.getTypeForEVT(Ctx);
Args.push_back(Entry);
}
+ Type *RetType = CallRetResNo.has_value()
+ ? Node->getValueType(*CallRetResNo).getTypeForEVT(Ctx)
+ : Type::getVoidTy(Ctx);
SDValue InChain = StoresInChain ? StoresInChain : getEntryNode();
SDValue Callee = getExternalSymbol(VD ? VD->getVectorFnName().data() : LCName,
TLI->getPointerTy(getDataLayout()));
TargetLowering::CallLoweringInfo CLI(*this);
CLI.setDebugLoc(DL).setChain(InChain).setLibCallee(
- TLI->getLibcallCallingConv(LC), Type::getVoidTy(*Ctx), Callee,
- std::move(Args));
+ TLI->getLibcallCallingConv(LC), RetType, Callee, std::move(Args));
- auto [Call, OutChain] = TLI->LowerCallTo(CLI);
+ auto [Call, CallChain] = TLI->LowerCallTo(CLI);
for (auto [ResNo, ResultPtr] : llvm::enumerate(ResultPtrs)) {
+ if (ResNo == CallRetResNo) {
+ Results.push_back(Call);
+ continue;
+ }
MachinePointerInfo PtrInfo;
if (StoreSDNode *ST = ResultStores[ResNo]) {
// Replace store with the library call.
- ReplaceAllUsesOfValueWith(SDValue(ST, 0), OutChain);
+ ReplaceAllUsesOfValueWith(SDValue(ST, 0), CallChain);
PtrInfo = ST->getPointerInfo();
} else {
PtrInfo = MachinePointerInfo::getFixedStack(
getMachineFunction(), cast<FrameIndexSDNode>(ResultPtr)->getIndex());
}
- SDValue LoadResult = getLoad(VT, DL, OutChain, ResultPtr, PtrInfo);
+ SDValue LoadResult =
+ getLoad(Node->getValueType(ResNo), DL, CallChain, ResultPtr, PtrInfo);
Results.push_back(LoadResult);
}
+ // FIXME: Find a way to avoid updating the root. This is needed for x86, which
+ // uses a floating-point stack. If (for example) the node to be expanded has
+ // two results one floating-point which is returned by the call, and one
+ // integer result, set returned via an output pointer. If only the integer
+ // result is used then the `CopyFromReg` for the FP result may be optimized
+ // out. This prevents an FP stack pop from being emitted for it. Setting the
+ // root like this ensures there will be a use of the `CopyFromReg` chain, and
+ // ensures the FP pop will be emitted.
+ SDValue OutputChain =
+ getNode(ISD::TokenFactor, DL, MVT::Other, getRoot(), CallChain);
+ setRoot(OutputChain);
+
return true;
}
diff --git a/llvm/test/CodeGen/X86/llvm.frexp.ll b/llvm/test/CodeGen/X86/llvm.frexp.ll
index cd560ad627de4c..96de34519556d0 100644
--- a/llvm/test/CodeGen/X86/llvm.frexp.ll
+++ b/llvm/test/CodeGen/X86/llvm.frexp.ll
@@ -325,28 +325,27 @@ define { <4 x float>, <4 x i32> } @test_frexp_v4f32_v4i32(<4 x float> %a) {
;
; WIN32-LABEL: test_frexp_v4f32_v4i32:
; WIN32: # %bb.0:
-; WIN32-NEXT: pushl %edi
; WIN32-NEXT: pushl %esi
-; WIN32-NEXT: subl $60, %esp
+; WIN32-NEXT: subl $44, %esp
; WIN32-NEXT: movl {{[0-9]+}}(%esp), %esi
-; WIN32-NEXT: leal {{[0-9]+}}(%esp), %eax
+; WIN32-NEXT: leal 24(%esi), %eax
; WIN32-NEXT: movl %eax, {{[0-9]+}}(%esp)
; WIN32-NEXT: flds {{[0-9]+}}(%esp)
; WIN32-NEXT: fstpl (%esp)
; WIN32-NEXT: calll _frexp
; WIN32-NEXT: fstpl {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Folded Spill
-; WIN32-NEXT: leal {{[0-9]+}}(%esp), %eax
+; WIN32-NEXT: leal 20(%esi), %eax
; WIN32-NEXT: movl %eax, {{[0-9]+}}(%esp)
; WIN32-NEXT: flds {{[0-9]+}}(%esp)
; WIN32-NEXT: fstpl (%esp)
; WIN32-NEXT: calll _frexp
; WIN32-NEXT: fstpl {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Folded Spill
-; WIN32-NEXT: leal {{[0-9]+}}(%esp), %eax
+; WIN32-NEXT: leal 16(%esi), %eax
; WIN32-NEXT: movl %eax, {{[0-9]+}}(%esp)
; WIN32-NEXT: flds {{[0-9]+}}(%esp)
; WIN32-NEXT: fstpl (%esp)
; WIN32-NEXT: calll _frexp
-; WIN32-NEXT: leal {{[0-9]+}}(%esp), %eax
+; WIN32-NEXT: leal 28(%esi), %eax
; WIN32-NEXT: movl %eax, {{[0-9]+}}(%esp)
; WIN32-NEXT: flds {{[0-9]+}}(%esp)
; WIN32-NEXT: fstpl (%esp)
@@ -361,22 +360,13 @@ define { <4 x float>, <4 x i32> } @test_frexp_v4f32_v4i32(<4 x float> %a) {
; WIN32-NEXT: flds {{[0-9]+}}(%esp)
; WIN32-NEXT: flds {{[0-9]+}}(%esp)
; WIN32-NEXT: flds {{[0-9]+}}(%esp)
-; WIN32-NEXT: movl {{[0-9]+}}(%esp), %eax
-; WIN32-NEXT: movl {{[0-9]+}}(%esp), %ecx
-; WIN32-NEXT: movl {{[0-9]+}}(%esp), %edx
-; WIN32-NEXT: movl {{[0-9]+}}(%esp), %edi
-; WIN32-NEXT: movl %edi, 28(%esi)
-; WIN32-NEXT: movl %edx, 24(%esi)
-; WIN32-NEXT: movl %ecx, 20(%esi)
-; WIN32-NEXT: movl %eax, 16(%esi)
; WIN32-NEXT: fstps 12(%esi)
; WIN32-NEXT: fstps 8(%esi)
; WIN32-NEXT: fstps 4(%esi)
; WIN32-NEXT: fstps (%esi)
; WIN32-NEXT: movl %esi, %eax
-; WIN32-NEXT: addl $60, %esp
+; WIN32-NEXT: addl $44, %esp
; WIN32-NEXT: popl %esi
-; WIN32-NEXT: popl %edi
; WIN32-NEXT: retl
%result = call { <4 x float>, <4 x i32> } @llvm.frexp.v4f32.v4i32(<4 x float> %a)
ret { <4 x float>, <4 x i32> } %result
@@ -499,46 +489,35 @@ define <4 x i32> @test_frexp_v4f32_v4i32_only_use_exp(<4 x float> %a) {
;
; WIN32-LABEL: test_frexp_v4f32_v4i32_only_use_exp:
; WIN32: # %bb.0:
-; WIN32-NEXT: pushl %edi
; WIN32-NEXT: pushl %esi
-; WIN32-NEXT: subl $28, %esp
+; WIN32-NEXT: subl $12, %esp
; WIN32-NEXT: movl {{[0-9]+}}(%esp), %esi
-; WIN32-NEXT: leal {{[0-9]+}}(%esp), %eax
+; WIN32-NEXT: leal 8(%esi), %eax
; WIN32-NEXT: movl %eax, {{[0-9]+}}(%esp)
; WIN32-NEXT: flds {{[0-9]+}}(%esp)
; WIN32-NEXT: fstpl (%esp)
; WIN32-NEXT: calll _frexp
; WIN32-NEXT: fstp %st(0)
-; WIN32-NEXT: leal {{[0-9]+}}(%esp), %eax
+; WIN32-NEXT: leal 4(%esi), %eax
; WIN32-NEXT: movl %eax, {{[0-9]+}}(%esp)
; WIN32-NEXT: flds {{[0-9]+}}(%esp)
; WIN32-NEXT: fstpl (%esp)
; WIN32-NEXT: calll _frexp
; WIN32-NEXT: fstp %st(0)
-; WIN32-NEXT: leal {{[0-9]+}}(%esp), %eax
+; WIN32-NEXT: leal 12(%esi), %eax
; WIN32-NEXT: movl %eax, {{[0-9]+}}(%esp)
; WIN32-NEXT: flds {{[0-9]+}}(%esp)
; WIN32-NEXT: fstpl (%esp)
; WIN32-NEXT: calll _frexp
; WIN32-NEXT: fstp %st(0)
-; WIN32-NEXT: leal {{[0-9]+}}(%esp), %eax
-; WIN32-NEXT: movl %eax, {{[0-9]+}}(%esp)
+; WIN32-NEXT: movl %esi, {{[0-9]+}}(%esp)
; WIN32-NEXT: flds {{[0-9]+}}(%esp)
; WIN32-NEXT: fstpl (%esp)
; WIN32-NEXT: calll _frexp
; WIN32-NEXT: fstp %st(0)
-; WIN32-NEXT: movl {{[0-9]+}}(%esp), %eax
-; WIN32-NEXT: movl {{[0-9]+}}(%esp), %ecx
-; WIN32-NEXT: movl {{[0-9]+}}(%esp), %edx
-; WIN32-NEXT: movl {{[0-9]+}}(%esp), %edi
-; WIN32-NEXT: movl %edi, 12(%esi)
-; WIN32-NEXT: movl %edx, 8(%esi)
-; WIN32-NEXT: movl %ecx, 4(%esi)
-; WIN32-NEXT: movl %eax, (%esi)
; WIN32-NEXT: movl %esi, %eax
-; WIN32-NEXT: addl $28, %esp
+; WIN32-NEXT: addl $12, %esp
; WIN32-NEXT: popl %esi
-; WIN32-NEXT: popl %edi
; WIN32-NEXT: retl
%result = call { <4 x float>, <4 x i32> } @llvm.frexp.v4f32.v4i32(<4 x float> %a)
%result.1 = extractvalue { <4 x float>, <4 x i32> } %result, 1
|
| // FIXME: Find a way to avoid updating the root. This is needed for x86, which | ||
| // uses a floating-point stack. If (for example) the node to be expanded has | ||
| // two results one floating-point which is returned by the call, and one | ||
| // integer result, set returned via an output pointer. If only the integer | ||
| // result is used then the `CopyFromReg` for the FP result may be optimized | ||
| // out. This prevents an FP stack pop from being emitted for it. Setting the | ||
| // root like this ensures there will be a use of the `CopyFromReg` chain, and | ||
| // ensures the FP pop will be emitted. | ||
| SDValue OutputChain = | ||
| getNode(ISD::TokenFactor, DL, MVT::Other, getRoot(), CallChain); | ||
| setRoot(OutputChain); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic came from the frexp expansion I added the FIXME/explanation as I was initially quite confused about why this was necessary. This required making the small change to LegalizeVectorOps.cpp included in the patch.
| ; WIN32-NEXT: movl {{[0-9]+}}(%esp), %eax | ||
| ; WIN32-NEXT: movl {{[0-9]+}}(%esp), %ecx | ||
| ; WIN32-NEXT: movl {{[0-9]+}}(%esp), %edx | ||
| ; WIN32-NEXT: movl {{[0-9]+}}(%esp), %edi | ||
| ; WIN32-NEXT: movl %edi, 28(%esi) | ||
| ; WIN32-NEXT: movl %edx, 24(%esi) | ||
| ; WIN32-NEXT: movl %ecx, 20(%esi) | ||
| ; WIN32-NEXT: movl %eax, 16(%esi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, this is just removing some redundant stack slots + copies, though I'm less familiar with x86 assembly. Previously, it went stack slot -> memory pointed at by %esi. Now it just directly uses the %esi pointers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(the same appears to be the case for the other targets updated)
| // FIXME: Find a way to avoid updating the root. This is needed for x86, which | ||
| // uses a floating-point stack. If (for example) the node to be expanded has | ||
| // two results one floating-point which is returned by the call, and one | ||
| // integer result, set returned via an output pointer. If only the integer | ||
| // result is used then the `CopyFromReg` for the FP result may be optimized | ||
| // out. This prevents an FP stack pop from being emitted for it. Setting the | ||
| // root like this ensures there will be a use of the `CopyFromReg` chain, and | ||
| // ensures the FP pop will be emitted. | ||
| SDValue OutputChain = | ||
| getNode(ISD::TokenFactor, DL, MVT::Other, getRoot(), CallChain); | ||
| setRoot(LegalizeOp ? LegalizeOp(OutputChain) : OutputChain); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand this, and the explanation sounds like it's working around something else broken. There shouldn't be any need for a callback. The chain logic should be unconditional, both of these cases have the same properties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This workaround was added for the frexp expansion in 1d92b68. I added the FIXME as I think there should be a better solution here, but I could not come up with anything decent 😅
The result chain from a frexp expansion is something like f80,ch,glue = CopyFromReg t23, Register:f80 $fp0 (for x86). That corresponds to the FP result, and the integer result comes from an output pointer.
For the broken cases (fixed in that commit), the FP result is unused. The load of the integer result takes its in-chain from the CopyFromReg (which is good). However, in DAG combine, it'll find a better chain that does not include the CopyFromReg (it seems it can always skip a CopyFromReg to find better memory chains).
That CopyFromReg then becomes unused and DCE'd. The x86 backend then won't emit an FP stack pop (fstp %st(0))) for the FP result, which is bad...
Updating the root creates a fake use of the CopyFromReg chain, which prevents it from being optimized out (and hence keeps the pop).
This is only really an issue for calls with multiple results (since for single result calls if the result was unused, the node would have been DCE'd before the call expansion)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here is unconditional -- the chain is updated the same way callback or not (and I think the legalization is a no-op). However, vector legalization keeps track of the legalized nodes. Since the new root node is not one of the results it does not get automatically legalized, and if not explicitly legalized we'll eventually hit an assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized I can slightly tweak the DAG to avoid the need for the callback, so I've done that now 🙂
sdesmalen-arm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if there's any better to suggest for the CopyFromReg that is optimised away when it has no uses, but it doesn't seem to break or regress anything either. So overall LGTM.
Given that the workaround here has been in the repo for about a year, I think it should be safe 🙂 |
This merges the logic for expanding both FFREXP and FSINCOS into one method
DAG.expandMultipleResultFPLibCall(). This reduces duplication and also allows FFREXP to benefit from the stack slot elimination implemented for FSINCOS. This method will also be used in future to implement more multiple-result intrinsics (such as modf and sincospi).