Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 65 additions & 30 deletions llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1164,6 +1164,43 @@ SDValue SelectionDAGBuilder::getMemoryRoot() {
return updateRoot(PendingLoads);
}

SDValue SelectionDAGBuilder::getFPOperationRoot(fp::ExceptionBehavior EB) {
// If the new exception behavior differs from that of the pending
// ones, chain up them and update the root.
switch (EB) {
case fp::ExceptionBehavior::ebMayTrap:
case fp::ExceptionBehavior::ebIgnore:
// Floating-point exceptions produced by such operations are not intended
// to be observed, so the sequence of these operations does not need to be
// preserved.
//
// They however must not be mixed with the instructions that have strict
// exception behavior. Placing an operation with 'ebIgnore' behavior between
// 'ebStrict' operations could distort the observed exception behavior.
if (!PendingConstrainedFPStrict.empty()) {
assert(PendingConstrainedFP.empty());
updateRoot(PendingConstrainedFPStrict);
}
break;
case fp::ExceptionBehavior::ebStrict:
// Floating-point exception produced by these operations may be observed, so
// they must be correctly chained. If trapping on FP exceptions is
// disabled, the exceptions can be observed only by functions that read
// exception flags, like 'llvm.get_fpenv' or 'fetestexcept'. It means that
// the order of operations is not significant between barriers.
//
// If trapping is enabled, each operation becomes an implicit observation
// point, so the operations must be sequenced according their original
// source order.
if (!PendingConstrainedFP.empty()) {
assert(PendingConstrainedFPStrict.empty());
updateRoot(PendingConstrainedFP);
}
// TODO: Add support for trapping-enabled scenarios.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we 100% certain that this change won't create cases where instructions expecting traps to be off will be moved into a chunk of instructions that run with FP traps on?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If some part of a function runs with traps enabled and the other - with traps disabled, somewhere between them must be a function call, that changes FP control register. Such function is either an external function or an intrinsic with flag IntrInaccessibleMemOnly. Both are barriers for FP instructions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If some part of a function runs with traps enabled and the other - with traps disabled, somewhere between them must be a function call, that changes FP control register. Such function is either an external function or an intrinsic with flag IntrInaccessibleMemOnly. Both are barriers for FP instructions.

Is this specific to SelectionDAG? I thought we didn't have barriers for FP instructions. And since we can have multiple function calls in a basic block, it's possible for traps to be enabled and disabled in different parts of the same basic block. Thus my question about certainty around movement of instructions.

}
return DAG.getRoot();
}

SDValue SelectionDAGBuilder::getRoot() {
// Chain up all pending constrained intrinsics together with all
// pending loads, by simply appending them to PendingLoads and
Expand Down Expand Up @@ -8280,49 +8317,47 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
}
}

void SelectionDAGBuilder::pushFPOpOutChain(SDValue Result,
fp::ExceptionBehavior EB) {
assert(Result.getNode()->getNumValues() == 2);
SDValue OutChain = Result.getValue(1);
assert(OutChain.getValueType() == MVT::Other);

// Instead of updating the root immediately, push the produced chain to the
// appropriate list, deferring the update until the root is requested. In this
// case, the nodes from the lists are chained using TokenFactor, indicating
// that the operations are independent.
//
// In particular, the root is updated before any call that might access the
// floating-point environment, except for constrained intrinsics.
switch (EB) {
case fp::ExceptionBehavior::ebMayTrap:
case fp::ExceptionBehavior::ebIgnore:
PendingConstrainedFP.push_back(OutChain);
break;
case fp::ExceptionBehavior::ebStrict:
PendingConstrainedFPStrict.push_back(OutChain);
break;
}
}

void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
const ConstrainedFPIntrinsic &FPI) {
SDLoc sdl = getCurSDLoc();

// We do not need to serialize constrained FP intrinsics against
// each other or against (nonvolatile) loads, so they can be
// chained like loads.
SDValue Chain = DAG.getRoot();
fp::ExceptionBehavior EB = *FPI.getExceptionBehavior();
SDValue Chain = getFPOperationRoot(EB);
SmallVector<SDValue, 4> Opers;
Opers.push_back(Chain);
for (unsigned I = 0, E = FPI.getNonMetadataArgCount(); I != E; ++I)
Opers.push_back(getValue(FPI.getArgOperand(I)));

auto pushOutChain = [this](SDValue Result, fp::ExceptionBehavior EB) {
assert(Result.getNode()->getNumValues() == 2);

// Push node to the appropriate list so that future instructions can be
// chained up correctly.
SDValue OutChain = Result.getValue(1);
switch (EB) {
case fp::ExceptionBehavior::ebIgnore:
// The only reason why ebIgnore nodes still need to be chained is that
// they might depend on the current rounding mode, and therefore must
// not be moved across instruction that may change that mode.
[[fallthrough]];
case fp::ExceptionBehavior::ebMayTrap:
// These must not be moved across calls or instructions that may change
// floating-point exception masks.
PendingConstrainedFP.push_back(OutChain);
break;
case fp::ExceptionBehavior::ebStrict:
// These must not be moved across calls or instructions that may change
// floating-point exception masks or read floating-point exception flags.
// In addition, they cannot be optimized out even if unused.
PendingConstrainedFPStrict.push_back(OutChain);
break;
}
};

const TargetLowering &TLI = DAG.getTargetLoweringInfo();
EVT VT = TLI.getValueType(DAG.getDataLayout(), FPI.getType());
SDVTList VTs = DAG.getVTList(VT, MVT::Other);
fp::ExceptionBehavior EB = *FPI.getExceptionBehavior();

SDNodeFlags Flags;
if (EB == fp::ExceptionBehavior::ebIgnore)
Expand All @@ -8346,7 +8381,7 @@ void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
!TLI.isFMAFasterThanFMulAndFAdd(DAG.getMachineFunction(), VT)) {
Opers.pop_back();
SDValue Mul = DAG.getNode(ISD::STRICT_FMUL, sdl, VTs, Opers, Flags);
pushOutChain(Mul, EB);
pushFPOpOutChain(Mul, EB);
Opcode = ISD::STRICT_FADD;
Opers.clear();
Opers.push_back(Mul.getValue(1));
Expand Down Expand Up @@ -8377,7 +8412,7 @@ void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
}

SDValue Result = DAG.getNode(Opcode, sdl, VTs, Opers, Flags);
pushOutChain(Result, EB);
pushFPOpOutChain(Result, EB);

SDValue FPResult = Result.getValue(0);
setValue(&FPI, FPResult);
Expand Down
12 changes: 12 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,11 @@ class SelectionDAGBuilder {
/// Update root to include all chains from the Pending list.
SDValue updateRoot(SmallVectorImpl<SDValue> &Pending);

/// Given a node representing a floating-point operation and its specified
/// exception behavior, this either updates the root or stores the node in
/// a list to be added to chains latter.
void pushFPOpOutChain(SDValue Result, fp::ExceptionBehavior EB);

/// A unique monotonically increasing number used to order the SDNodes we
/// create.
unsigned SDNodeOrder;
Expand Down Expand Up @@ -300,6 +305,13 @@ class SelectionDAGBuilder {
/// memory node that may need to be ordered after any prior load instructions.
SDValue getMemoryRoot();

/// Return the current virtual root of the Selection DAG, flushing
/// PendingConstrainedFP or PendingConstrainedFPStrict items if the new
/// exception behavior (specified by \p EB) differs from that of the pending
/// instructions. This must be done before emitting constrained FP operation
/// call.
SDValue getFPOperationRoot(fp::ExceptionBehavior EB);

/// Similar to getMemoryRoot, but also flushes PendingConstrainedFP(Strict)
/// items. This must be done before emitting any call other any other node
/// that may need to be ordered after FP instructions due to other side
Expand Down
37 changes: 0 additions & 37 deletions llvm/test/CodeGen/X86/fp-intrinsics-flags.ll
Original file line number Diff line number Diff line change
Expand Up @@ -100,43 +100,6 @@ entry:
ret i32 %result
}

; These 2 divs only differ in their exception behavior and will be CSEd. Make
; sure the nofpexcept flag is not set on the combined node.
define void @binop_cse(double %a, double %b, ptr %x, ptr %y) #0 {
entry:
; CHECK-LABEL: name: binop_cse
; CHECK: [[MOV32rm:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.0)
; CHECK: [[MOV32rm1:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.1, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.1, align 16)
; CHECK: [[MOVSDrm_alt:%[0-9]+]]:fr64 = MOVSDrm_alt %fixed-stack.3, 1, $noreg, 0, $noreg :: (load (s64) from %fixed-stack.3, align 16)
; CHECK: %3:fr64 = DIVSDrm [[MOVSDrm_alt]], %fixed-stack.2, 1, $noreg, 0, $noreg, implicit $mxcsr :: (load (s64) from %fixed-stack.2)
; CHECK: MOVSDmr killed [[MOV32rm1]], 1, $noreg, 0, $noreg, %3 :: (store (s64) into %ir.x, align 4)
; CHECK: MOVSDmr killed [[MOV32rm]], 1, $noreg, 0, $noreg, %3 :: (store (s64) into %ir.y, align 4)
; CHECK: RET 0
%div = call double @llvm.experimental.constrained.fdiv.f64(double %a, double %b, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
%div2 = call double @llvm.experimental.constrained.fdiv.f64(double %a, double %b, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
store double %div, ptr %x
store double %div2, ptr %y
ret void
}

; These 2 sitofps only differ in their exception behavior and will be CSEd. Make
; sure the nofpexcept flag is not set on the combined node.
define void @sitofp_cse(i32 %a, ptr %x, ptr %y) #0 {
entry:
; CHECK-LABEL: name: sitofp_cse
; CHECK: [[MOV32rm:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.0, align 8)
; CHECK: [[MOV32rm1:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.1, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.1)
; CHECK: %2:fr64 = CVTSI2SDrm %fixed-stack.2, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.2, align 16)
; CHECK: MOVSDmr killed [[MOV32rm1]], 1, $noreg, 0, $noreg, %2 :: (store (s64) into %ir.x, align 4)
; CHECK: MOVSDmr killed [[MOV32rm]], 1, $noreg, 0, $noreg, %2 :: (store (s64) into %ir.y, align 4)
; CHECK: RET 0
%result = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 %a, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
%result2 = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 %a, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
store double %result, ptr %x
store double %result2, ptr %y
ret void
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the issue with this test? I'd hate for us to lose test coverage because a bug was discovered. Wouldn't a FIXME and a good comment be better?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put back the tests and slightly modified them to check changes introduced by this patch. Previously the two operation in the test were merged because the nodes were identical, due to the same value of chain argument. If the instructions are sequenced, which this patch implements, the chain arguments become different and the nodes are not merged.

It is not an error that the nodes were merged previously, but it was made using CSE mechanism. With sequenced node it does not work. Merging such nodes require another mechanism, probably at IR level.

attributes #0 = { strictfp }

declare double @llvm.experimental.constrained.sitofp.f64.i8(i8, metadata, metadata)
Expand Down
Loading