Skip to content

Commit bcee0ee

Browse files
authored
[SDAG] Fix deferring constrained function calls (#153029)
Selection DAG has a more sophisticated execution order representation than the simple sequence used in IR, so building the DAG can take into account specific properties of the nodes to better express possible parallelism. The existing implementation does this for constrained function calls, some of them are considered as independent, which can potentially improve the generated code. However this mechanism incorrectly implies that the calls with exception behavior 'ebIgnore' cannot raise floating-point exception. The purpose of this change is to fix the implementation. In the current implementation, constrained function calls don't immediately update the DAG root. Instead, the DAG builder collects their output chains and flushes them when the root is required. Constrained function calls cannot be moved across calls of external functions and intrinsics that access floating-point environment, they work as barriers. Between the barriers, constrained function calls can be reordered, they may be considered independent from viewpoint of raising exceptions. For strictfp functions this is possible only if floating-point trapping is disabled. This change introduces a new restriction - the calls with default exception handling cannot not be moved between strictfp function calls. Otherwise the exceptions raised by such call can disturb the expected exception sequence. It means that constrained function calls with strict exception behavior act as barriers for the calls with non-strict behavior and vice versa. Effectively it means that the entire sequence of constrained calls in IR is split into "strict" and "non-strict" regions, in which restrictions on the order of constrained calls are relaxed, but move from one region to another is not allowed. It agrees with the representation of strictfp code in high-level languages. For example, C/C++ strictfp code correspond to blocks where pragma `STDC FENV_ACCESS ON` is in effect, this restriction should help preserving the intended semantics. When floating-point exception trapping is enabled, constrained intrinsics with 'ebStrict' cannot be reordered, their sequence must be identical to the original source order. The current implementation does not distinguish between strictfp modes with trapping and without it. This change make assumption that the trapping is disabled. It is not correct in the general case, but is compatible with the existing implementation.
1 parent 6a0f392 commit bcee0ee

File tree

3 files changed

+108
-49
lines changed

3 files changed

+108
-49
lines changed

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

Lines changed: 65 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1162,6 +1162,43 @@ SDValue SelectionDAGBuilder::getMemoryRoot() {
11621162
return updateRoot(PendingLoads);
11631163
}
11641164

1165+
SDValue SelectionDAGBuilder::getFPOperationRoot(fp::ExceptionBehavior EB) {
1166+
// If the new exception behavior differs from that of the pending
1167+
// ones, chain up them and update the root.
1168+
switch (EB) {
1169+
case fp::ExceptionBehavior::ebMayTrap:
1170+
case fp::ExceptionBehavior::ebIgnore:
1171+
// Floating-point exceptions produced by such operations are not intended
1172+
// to be observed, so the sequence of these operations does not need to be
1173+
// preserved.
1174+
//
1175+
// They however must not be mixed with the instructions that have strict
1176+
// exception behavior. Placing an operation with 'ebIgnore' behavior between
1177+
// 'ebStrict' operations could distort the observed exception behavior.
1178+
if (!PendingConstrainedFPStrict.empty()) {
1179+
assert(PendingConstrainedFP.empty());
1180+
updateRoot(PendingConstrainedFPStrict);
1181+
}
1182+
break;
1183+
case fp::ExceptionBehavior::ebStrict:
1184+
// Floating-point exception produced by these operations may be observed, so
1185+
// they must be correctly chained. If trapping on FP exceptions is
1186+
// disabled, the exceptions can be observed only by functions that read
1187+
// exception flags, like 'llvm.get_fpenv' or 'fetestexcept'. It means that
1188+
// the order of operations is not significant between barriers.
1189+
//
1190+
// If trapping is enabled, each operation becomes an implicit observation
1191+
// point, so the operations must be sequenced according their original
1192+
// source order.
1193+
if (!PendingConstrainedFP.empty()) {
1194+
assert(PendingConstrainedFPStrict.empty());
1195+
updateRoot(PendingConstrainedFP);
1196+
}
1197+
// TODO: Add support for trapping-enabled scenarios.
1198+
}
1199+
return DAG.getRoot();
1200+
}
1201+
11651202
SDValue SelectionDAGBuilder::getRoot() {
11661203
// Chain up all pending constrained intrinsics together with all
11671204
// pending loads, by simply appending them to PendingLoads and
@@ -8298,49 +8335,47 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
82988335
}
82998336
}
83008337

8338+
void SelectionDAGBuilder::pushFPOpOutChain(SDValue Result,
8339+
fp::ExceptionBehavior EB) {
8340+
assert(Result.getNode()->getNumValues() == 2);
8341+
SDValue OutChain = Result.getValue(1);
8342+
assert(OutChain.getValueType() == MVT::Other);
8343+
8344+
// Instead of updating the root immediately, push the produced chain to the
8345+
// appropriate list, deferring the update until the root is requested. In this
8346+
// case, the nodes from the lists are chained using TokenFactor, indicating
8347+
// that the operations are independent.
8348+
//
8349+
// In particular, the root is updated before any call that might access the
8350+
// floating-point environment, except for constrained intrinsics.
8351+
switch (EB) {
8352+
case fp::ExceptionBehavior::ebMayTrap:
8353+
case fp::ExceptionBehavior::ebIgnore:
8354+
PendingConstrainedFP.push_back(OutChain);
8355+
break;
8356+
case fp::ExceptionBehavior::ebStrict:
8357+
PendingConstrainedFPStrict.push_back(OutChain);
8358+
break;
8359+
}
8360+
}
8361+
83018362
void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
83028363
const ConstrainedFPIntrinsic &FPI) {
83038364
SDLoc sdl = getCurSDLoc();
83048365

83058366
// We do not need to serialize constrained FP intrinsics against
83068367
// each other or against (nonvolatile) loads, so they can be
83078368
// chained like loads.
8308-
SDValue Chain = DAG.getRoot();
8369+
fp::ExceptionBehavior EB = *FPI.getExceptionBehavior();
8370+
SDValue Chain = getFPOperationRoot(EB);
83098371
SmallVector<SDValue, 4> Opers;
83108372
Opers.push_back(Chain);
83118373
for (unsigned I = 0, E = FPI.getNonMetadataArgCount(); I != E; ++I)
83128374
Opers.push_back(getValue(FPI.getArgOperand(I)));
83138375

8314-
auto pushOutChain = [this](SDValue Result, fp::ExceptionBehavior EB) {
8315-
assert(Result.getNode()->getNumValues() == 2);
8316-
8317-
// Push node to the appropriate list so that future instructions can be
8318-
// chained up correctly.
8319-
SDValue OutChain = Result.getValue(1);
8320-
switch (EB) {
8321-
case fp::ExceptionBehavior::ebIgnore:
8322-
// The only reason why ebIgnore nodes still need to be chained is that
8323-
// they might depend on the current rounding mode, and therefore must
8324-
// not be moved across instruction that may change that mode.
8325-
[[fallthrough]];
8326-
case fp::ExceptionBehavior::ebMayTrap:
8327-
// These must not be moved across calls or instructions that may change
8328-
// floating-point exception masks.
8329-
PendingConstrainedFP.push_back(OutChain);
8330-
break;
8331-
case fp::ExceptionBehavior::ebStrict:
8332-
// These must not be moved across calls or instructions that may change
8333-
// floating-point exception masks or read floating-point exception flags.
8334-
// In addition, they cannot be optimized out even if unused.
8335-
PendingConstrainedFPStrict.push_back(OutChain);
8336-
break;
8337-
}
8338-
};
8339-
83408376
const TargetLowering &TLI = DAG.getTargetLoweringInfo();
83418377
EVT VT = TLI.getValueType(DAG.getDataLayout(), FPI.getType());
83428378
SDVTList VTs = DAG.getVTList(VT, MVT::Other);
8343-
fp::ExceptionBehavior EB = *FPI.getExceptionBehavior();
83448379

83458380
SDNodeFlags Flags;
83468381
if (EB == fp::ExceptionBehavior::ebIgnore)
@@ -8364,7 +8399,7 @@ void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
83648399
!TLI.isFMAFasterThanFMulAndFAdd(DAG.getMachineFunction(), VT)) {
83658400
Opers.pop_back();
83668401
SDValue Mul = DAG.getNode(ISD::STRICT_FMUL, sdl, VTs, Opers, Flags);
8367-
pushOutChain(Mul, EB);
8402+
pushFPOpOutChain(Mul, EB);
83688403
Opcode = ISD::STRICT_FADD;
83698404
Opers.clear();
83708405
Opers.push_back(Mul.getValue(1));
@@ -8395,7 +8430,7 @@ void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
83958430
}
83968431

83978432
SDValue Result = DAG.getNode(Opcode, sdl, VTs, Opers, Flags);
8398-
pushOutChain(Result, EB);
8433+
pushFPOpOutChain(Result, EB);
83998434

84008435
SDValue FPResult = Result.getValue(0);
84018436
setValue(&FPI, FPResult);

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,11 @@ class SelectionDAGBuilder {
195195
/// Update root to include all chains from the Pending list.
196196
SDValue updateRoot(SmallVectorImpl<SDValue> &Pending);
197197

198+
/// Given a node representing a floating-point operation and its specified
199+
/// exception behavior, this either updates the root or stores the node in
200+
/// a list to be added to chains latter.
201+
void pushFPOpOutChain(SDValue Result, fp::ExceptionBehavior EB);
202+
198203
/// A unique monotonically increasing number used to order the SDNodes we
199204
/// create.
200205
unsigned SDNodeOrder;
@@ -300,6 +305,13 @@ class SelectionDAGBuilder {
300305
/// memory node that may need to be ordered after any prior load instructions.
301306
SDValue getMemoryRoot();
302307

308+
/// Return the current virtual root of the Selection DAG, flushing
309+
/// PendingConstrainedFP or PendingConstrainedFPStrict items if the new
310+
/// exception behavior (specified by \p EB) differs from that of the pending
311+
/// instructions. This must be done before emitting constrained FP operation
312+
/// call.
313+
SDValue getFPOperationRoot(fp::ExceptionBehavior EB);
314+
303315
/// Similar to getMemoryRoot, but also flushes PendingConstrainedFP(Strict)
304316
/// items. This must be done before emitting any call other any other node
305317
/// that may need to be ordered after FP instructions due to other side

llvm/test/CodeGen/X86/fp-intrinsics-flags.ll

Lines changed: 31 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -100,40 +100,52 @@ entry:
100100
ret i32 %result
101101
}
102102

103-
; These 2 divs only differ in their exception behavior and will be CSEd. Make
104-
; sure the nofpexcept flag is not set on the combined node.
103+
; These 4 divs only differ in their exception behavior. They form two groups,
104+
; whithin each the constrained functions have the same exception hehavior and
105+
; may be CSE'd. Instructions with different exception behavior belong to
106+
; different groups, they have different chain argument and cannot be CSE'd.
105107
define void @binop_cse(double %a, double %b, ptr %x, ptr %y) #0 {
106108
entry:
107109
; CHECK-LABEL: name: binop_cse
108-
; CHECK: [[MOV32rm:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.0)
109-
; CHECK: [[MOV32rm1:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.1, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.1, align 16)
110-
; CHECK: [[MOVSDrm_alt:%[0-9]+]]:fr64 = MOVSDrm_alt %fixed-stack.3, 1, $noreg, 0, $noreg :: (load (s64) from %fixed-stack.3, align 16)
111-
; CHECK: %3:fr64 = DIVSDrm [[MOVSDrm_alt]], %fixed-stack.2, 1, $noreg, 0, $noreg, implicit $mxcsr :: (load (s64) from %fixed-stack.2)
112-
; CHECK: MOVSDmr killed [[MOV32rm1]], 1, $noreg, 0, $noreg, %3 :: (store (s64) into %ir.x, align 4)
113-
; CHECK: MOVSDmr killed [[MOV32rm]], 1, $noreg, 0, $noreg, %3 :: (store (s64) into %ir.y, align 4)
110+
; CHECK: [[Y:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.0)
111+
; CHECK: [[X:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.1, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.1, align 16)
112+
; CHECK: [[B:%[0-9]+]]:fr64 = MOVSDrm_alt %fixed-stack.2, 1, $noreg, 0, $noreg :: (load (s64) from %fixed-stack.2)
113+
; CHECK: [[A:%[0-9]+]]:fr64 = MOVSDrm_alt %fixed-stack.3, 1, $noreg, 0, $noreg :: (load (s64) from %fixed-stack.3, align 16)
114+
; CHECK: [[DIV0:%[0-9]+]]:fr64 = DIVSDrr [[A]], [[B]], implicit $mxcsr
115+
; CHECK: [[DIV1:%[0-9]+]]:fr64 = nofpexcept DIVSDrr [[A]], [[B]], implicit $mxcsr
116+
; CHECK: MOVSDmr killed [[X]], 1, $noreg, 0, $noreg, [[DIV1]] :: (store (s64) into %ir.x, align 4)
117+
; CHECK: MOVSDmr killed [[Y]], 1, $noreg, 0, $noreg, [[DIV1]] :: (store (s64) into %ir.y, align 4)
114118
; CHECK: RET 0
115119
%div = call double @llvm.experimental.constrained.fdiv.f64(double %a, double %b, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
120+
%div1 = call double @llvm.experimental.constrained.fdiv.f64(double %a, double %b, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
116121
%div2 = call double @llvm.experimental.constrained.fdiv.f64(double %a, double %b, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
117-
store double %div, ptr %x
118-
store double %div2, ptr %y
122+
%div3 = call double @llvm.experimental.constrained.fdiv.f64(double %a, double %b, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
123+
store double %div2, ptr %x
124+
store double %div3, ptr %y
119125
ret void
120126
}
121127

122-
; These 2 sitofps only differ in their exception behavior and will be CSEd. Make
123-
; sure the nofpexcept flag is not set on the combined node.
128+
; These 4 divs only differ in their exception behavior. They form two groups,
129+
; whithin each the constrained functions have the same exception hehavior and
130+
; may be CSE'd. Instructions with different exception behavior belong to
131+
; different groups, they have different chain argument and cannot be CSE'd.
124132
define void @sitofp_cse(i32 %a, ptr %x, ptr %y) #0 {
125133
entry:
126134
; CHECK-LABEL: name: sitofp_cse
127-
; CHECK: [[MOV32rm:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.0, align 8)
128-
; CHECK: [[MOV32rm1:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.1, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.1)
129-
; CHECK: %2:fr64 = CVTSI2SDrm %fixed-stack.2, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.2, align 16)
130-
; CHECK: MOVSDmr killed [[MOV32rm1]], 1, $noreg, 0, $noreg, %2 :: (store (s64) into %ir.x, align 4)
131-
; CHECK: MOVSDmr killed [[MOV32rm]], 1, $noreg, 0, $noreg, %2 :: (store (s64) into %ir.y, align 4)
135+
; CHECK: [[Y:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.0, align 8)
136+
; CHECK: [[X:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.1, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.1)
137+
; CHECK: [[A:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.2, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.2, align 16)
138+
; CHECK: [[CVT0:%[0-9]+]]:fr64 = CVTSI2SDrr [[A]]
139+
; CHECK: [[CVT1:%[0-9]+]]:fr64 = nofpexcept CVTSI2SDrr [[A]]
140+
; CHECK: MOVSDmr killed [[X]], 1, $noreg, 0, $noreg, [[CVT1]] :: (store (s64) into %ir.x, align 4)
141+
; CHECK: MOVSDmr killed [[Y]], 1, $noreg, 0, $noreg, [[CVT1]] :: (store (s64) into %ir.y, align 4)
132142
; CHECK: RET 0
133143
%result = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 %a, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
144+
%result1 = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 %a, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
134145
%result2 = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 %a, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
135-
store double %result, ptr %x
136-
store double %result2, ptr %y
146+
%result3 = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 %a, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
147+
store double %result2, ptr %x
148+
store double %result3, ptr %y
137149
ret void
138150
}
139151

0 commit comments

Comments
 (0)