-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[SimplifyCFG] Use range check in simplifyBranchOnICmpChain if possible #165105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-llvm-transforms Author: Yingwei Zheng (dtcxzyw) ChangesIn Note: I cannot guarantee that such an infinite loop cannot be encountered after this patch. FWICT, we don't have a method to efficiently detect this bug for CFG optimizations (For peephole optimizations, see Termination-Checking for LLVM Peephole Optimizations). Closes #165088. Full diff: https://github.com/llvm/llvm-project/pull/165105.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index d831c2737e5f8..50272ddc46699 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -5228,32 +5228,53 @@ bool SimplifyCFGOpt::simplifyBranchOnICmpChain(BranchInst *BI,
CompVal, DL.getIntPtrType(CompVal->getType()), "magicptr");
}
- // Create the new switch instruction now.
- SwitchInst *New = Builder.CreateSwitch(CompVal, DefaultBB, Values.size());
- if (HasProfile) {
- // We know the weight of the default case. We don't know the weight of the
- // other cases, but rather than completely lose profiling info, we split
- // the remaining probability equally over them.
- SmallVector<uint32_t> NewWeights(Values.size() + 1);
- NewWeights[0] = BranchWeights[1]; // this is the default, and we swapped if
- // TrueWhenEqual.
- for (auto &V : drop_begin(NewWeights))
- V = BranchWeights[0] / Values.size();
- setBranchWeights(*New, NewWeights, /*IsExpected=*/false);
- }
-
- // Add all of the 'cases' to the switch instruction.
- for (ConstantInt *Val : Values)
- New->addCase(Val, EdgeBB);
-
- // We added edges from PI to the EdgeBB. As such, if there were any
- // PHI nodes in EdgeBB, they need entries to be added corresponding to
- // the number of edges added.
- for (BasicBlock::iterator BBI = EdgeBB->begin(); isa<PHINode>(BBI); ++BBI) {
- PHINode *PN = cast<PHINode>(BBI);
- Value *InVal = PN->getIncomingValueForBlock(BB);
- for (unsigned i = 0, e = Values.size() - 1; i != e; ++i)
- PN->addIncoming(InVal, BB);
+ // Check if we can represent the values can be represented as a contiguous
+ // range. If so, we use a range check + conditional branch instead of a
+ // switch.
+ if (Values.front()->getValue() - Values.back()->getValue() ==
+ Values.size() - 1) {
+ ConstantRange RangeToCheck = ConstantRange::getNonEmpty(
+ Values.back()->getValue(), Values.front()->getValue() + 1);
+ APInt Offset, RHS;
+ ICmpInst::Predicate Pred;
+ RangeToCheck.getEquivalentICmp(Pred, RHS, Offset);
+ Value *X = CompVal;
+ if (!Offset.isZero())
+ X = Builder.CreateAdd(X, ConstantInt::get(CompVal->getType(), Offset));
+ Value *Cond =
+ Builder.CreateICmp(Pred, X, ConstantInt::get(CompVal->getType(), RHS));
+ BranchInst *NewBI = Builder.CreateCondBr(Cond, EdgeBB, DefaultBB);
+ if (HasProfile)
+ setBranchWeights(*NewBI, BranchWeights, /*IsExpected=*/false);
+ // We don't need to update PHI nodes since we don't add any new edges.
+ } else {
+ // Create the new switch instruction now.
+ SwitchInst *New = Builder.CreateSwitch(CompVal, DefaultBB, Values.size());
+ if (HasProfile) {
+ // We know the weight of the default case. We don't know the weight of the
+ // other cases, but rather than completely lose profiling info, we split
+ // the remaining probability equally over them.
+ SmallVector<uint32_t> NewWeights(Values.size() + 1);
+ NewWeights[0] = BranchWeights[1]; // this is the default, and we swapped
+ // if TrueWhenEqual.
+ for (auto &V : drop_begin(NewWeights))
+ V = BranchWeights[0] / Values.size();
+ setBranchWeights(*New, NewWeights, /*IsExpected=*/false);
+ }
+
+ // Add all of the 'cases' to the switch instruction.
+ for (ConstantInt *Val : Values)
+ New->addCase(Val, EdgeBB);
+
+ // We added edges from PI to the EdgeBB. As such, if there were any
+ // PHI nodes in EdgeBB, they need entries to be added corresponding to
+ // the number of edges added.
+ for (BasicBlock::iterator BBI = EdgeBB->begin(); isa<PHINode>(BBI); ++BBI) {
+ PHINode *PN = cast<PHINode>(BBI);
+ Value *InVal = PN->getIncomingValueForBlock(BB);
+ for (unsigned i = 0, e = Values.size() - 1; i != e; ++i)
+ PN->addIncoming(InVal, BB);
+ }
}
// Erase the old branch instruction.
diff --git a/llvm/test/Transforms/SimplifyCFG/pr165088.ll b/llvm/test/Transforms/SimplifyCFG/pr165088.ll
new file mode 100644
index 0000000000000..4514a1927b586
--- /dev/null
+++ b/llvm/test/Transforms/SimplifyCFG/pr165088.ll
@@ -0,0 +1,186 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
+; RUN: opt -S -passes="simplifycfg<switch-range-to-icmp>" < %s | FileCheck %s
+
+; Avoid getting stuck in the cycle pr165088_cycle_[1-4].
+
+define void @pr165088_cycle_1(i8 %x) {
+; CHECK-LABEL: define void @pr165088_cycle_1(
+; CHECK-SAME: i8 [[X:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: [[TMP0:%.*]] = icmp ult i8 [[X]], 2
+; CHECK-NEXT: br i1 [[TMP0]], label %[[BLOCK2:.*]], label %[[BLOCK3:.*]]
+; CHECK: [[BLOCK1:.*]]:
+; CHECK-NEXT: [[COND2:%.*]] = icmp ugt i8 [[X]], 1
+; CHECK-NEXT: br i1 [[COND2]], label %[[BLOCK3]], label %[[BLOCK2]]
+; CHECK: [[BLOCK2]]:
+; CHECK-NEXT: br label %[[BLOCK3]]
+; CHECK: [[BLOCK3]]:
+; CHECK-NEXT: [[COND3:%.*]] = icmp eq i8 [[X]], 0
+; CHECK-NEXT: br i1 [[COND3]], label %[[EXIT:.*]], label %[[BLOCK1]]
+; CHECK: [[EXIT]]:
+; CHECK-NEXT: ret void
+;
+entry:
+ %switch = icmp uge i8 %x, 2
+ %cond1 = icmp ugt i8 %x, 1
+ %or.cond = and i1 %switch, %cond1
+ br i1 %or.cond, label %block3, label %block2
+
+block1:
+ %cond2 = icmp ugt i8 %x, 1
+ br i1 %cond2, label %block3, label %block2
+
+block2:
+ br label %block3
+
+block3:
+ %cond3 = icmp eq i8 %x, 0
+ br i1 %cond3, label %exit, label %block1
+
+exit:
+ ret void
+}
+
+define void @pr165088_cycle_2(i8 %x) {
+; CHECK-LABEL: define void @pr165088_cycle_2(
+; CHECK-SAME: i8 [[X:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: [[SWITCH:%.*]] = icmp ult i8 [[X]], 2
+; CHECK-NEXT: br i1 [[SWITCH]], label %[[BLOCK2:.*]], label %[[BLOCK3:.*]]
+; CHECK: [[BLOCK1:.*]]:
+; CHECK-NEXT: [[COND2:%.*]] = icmp ugt i8 [[X]], 1
+; CHECK-NEXT: br i1 [[COND2]], label %[[BLOCK3]], label %[[BLOCK2]]
+; CHECK: [[BLOCK2]]:
+; CHECK-NEXT: br label %[[BLOCK3]]
+; CHECK: [[BLOCK3]]:
+; CHECK-NEXT: [[COND3:%.*]] = icmp eq i8 [[X]], 0
+; CHECK-NEXT: br i1 [[COND3]], label %[[EXIT:.*]], label %[[BLOCK1]]
+; CHECK: [[EXIT]]:
+; CHECK-NEXT: ret void
+;
+entry:
+ switch i8 %x, label %block3 [
+ i8 1, label %block2
+ i8 0, label %block2
+ ]
+
+block1: ; preds = %block3
+ %cond2 = icmp ugt i8 %x, 1
+ br i1 %cond2, label %block3, label %block2
+
+block2: ; preds = %entry, %entry, %block1
+ br label %block3
+
+block3: ; preds = %entry, %block2, %block1
+ %cond3 = icmp eq i8 %x, 0
+ br i1 %cond3, label %exit, label %block1
+
+exit: ; preds = %block3
+ ret void
+}
+
+define void @pr165088_cycle_3(i8 %x) {
+; CHECK-LABEL: define void @pr165088_cycle_3(
+; CHECK-SAME: i8 [[X:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: br label %[[BLOCK3:.*]]
+; CHECK: [[BLOCK3]]:
+; CHECK-NEXT: [[COND3:%.*]] = icmp eq i8 [[X]], 0
+; CHECK-NEXT: br i1 [[COND3]], label %[[EXIT:.*]], label %[[BLOCK3]]
+; CHECK: [[EXIT]]:
+; CHECK-NEXT: ret void
+;
+entry:
+ switch i8 %x, label %block1 [
+ i8 1, label %block2
+ i8 0, label %block2
+ ]
+
+block1: ; preds = %entry, %block3
+ %cond2 = icmp ugt i8 %x, 1
+ br i1 %cond2, label %block3, label %block2
+
+block2: ; preds = %entry, %entry, %block1
+ br label %block3
+
+block3: ; preds = %block2, %block1
+ %cond3 = icmp eq i8 %x, 0
+ br i1 %cond3, label %exit, label %block1
+
+exit: ; preds = %block3
+ ret void
+}
+
+define void @pr165088_cycle_4(i8 %x) {
+; CHECK-LABEL: define void @pr165088_cycle_4(
+; CHECK-SAME: i8 [[X:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: [[TMP0:%.*]] = icmp ult i8 [[X]], 2
+; CHECK-NEXT: br i1 [[TMP0]], label %[[BLOCK2:.*]], label %[[BLOCK3:.*]]
+; CHECK: [[BLOCK1:.*]]:
+; CHECK-NEXT: [[COND2_OLD:%.*]] = icmp ugt i8 [[X]], 1
+; CHECK-NEXT: br i1 [[COND2_OLD]], label %[[BLOCK3]], label %[[BLOCK2]]
+; CHECK: [[BLOCK2]]:
+; CHECK-NEXT: br label %[[BLOCK3]]
+; CHECK: [[BLOCK3]]:
+; CHECK-NEXT: [[COND3:%.*]] = icmp eq i8 [[X]], 0
+; CHECK-NEXT: br i1 [[COND3]], label %[[EXIT:.*]], label %[[BLOCK1]]
+; CHECK: [[EXIT]]:
+; CHECK-NEXT: ret void
+;
+entry:
+ %switch = icmp ult i8 %x, 2
+ br i1 %switch, label %block2, label %block1
+
+block1: ; preds = %entry, %block3
+ %cond2 = icmp ugt i8 %x, 1
+ br i1 %cond2, label %block3, label %block2
+
+block2: ; preds = %entry, %block1
+ br label %block3
+
+block3: ; preds = %block2, %block1
+ %cond3 = icmp eq i8 %x, 0
+ br i1 %cond3, label %exit, label %block1
+
+exit: ; preds = %block3
+ ret void
+}
+
+define void @pr165088_original(i8 %x) {
+; CHECK-LABEL: define void @pr165088_original(
+; CHECK-SAME: i8 [[X:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: [[TMP0:%.*]] = icmp ult i8 [[X]], 2
+; CHECK-NEXT: br i1 [[TMP0]], label %[[BLOCK2:.*]], label %[[BLOCK3:.*]]
+; CHECK: [[BLOCK1:.*]]:
+; CHECK-NEXT: [[COND3_OLD_OLD:%.*]] = icmp ugt i8 [[X]], 1
+; CHECK-NEXT: br i1 [[COND3_OLD_OLD]], label %[[BLOCK3]], label %[[BLOCK2]]
+; CHECK: [[BLOCK2]]:
+; CHECK-NEXT: br label %[[BLOCK3]]
+; CHECK: [[BLOCK3]]:
+; CHECK-NEXT: [[COND4:%.*]] = icmp eq i8 [[X]], 0
+; CHECK-NEXT: br i1 [[COND4]], label %[[EXIT:.*]], label %[[BLOCK1]]
+; CHECK: [[EXIT]]:
+; CHECK-NEXT: ret void
+;
+entry:
+ %cond = icmp ne i8 %x, 0
+ %cond3 = icmp ne i8 %x, 0
+ %or.cond = and i1 %cond, %cond3
+ br i1 %or.cond, label %block3, label %block2
+
+block1: ; preds = %block3
+ %cond3.old = icmp ugt i8 %x, 1
+ br i1 %cond3.old, label %block3, label %block2
+
+block2: ; preds = %block1, %entry
+ br label %block3
+
+block3: ; preds = %block2, %block1, %entry
+ %cond4 = icmp eq i8 %x, 0
+ br i1 %cond4, label %exit, label %block1
+
+exit: ; preds = %block3
+ ret void
+}
|
andjo403
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
| Value *InVal = PN->getIncomingValueForBlock(BB); | ||
| for (unsigned i = 0, e = Values.size() - 1; i != e; ++i) | ||
| PN->addIncoming(InVal, BB); | ||
| // Check if we can represent the values can be represented as a contiguous |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Check if we can represent the values can be represented as a contiguous | |
| // Check if we can represent the values as a contiguous |
| for (BasicBlock::iterator BBI = EdgeBB->begin(); isa<PHINode>(BBI); ++BBI) { | ||
| PHINode *PN = cast<PHINode>(BBI); | ||
| Value *InVal = PN->getIncomingValueForBlock(BB); | ||
| for (unsigned i = 0, e = Values.size() - 1; i != e; ++i) | ||
| PN->addIncoming(InVal, BB); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While at it, I think this should be replaceable with addPredecessorToBlock(EdgeBB, BB, BB)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to call addPredecessorToBlock |Values| - 1 times :( getIncomingValueForBlock may become a performance bottleneck as we cannot reuse the result.
|
Can you please add to the PR description what the previously cycling transforms were? |
Done. |
nikic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
This also fixed #58521. Thanks! |
llvm#165105) In `simplifyBranchOnICmpChain`, if we can merge the comparisons into a range check, use a conditional branch instead. This change also breaks the cycle found in llvm#165088. Closes llvm#165088. Detailed description of the cycle: ``` define void @pr165088_cycle_1(i8 %x) { entry: %switch = icmp uge i8 %x, 2 %cond1 = icmp ugt i8 %x, 1 %or.cond = and i1 %switch, %cond1 br i1 %or.cond, label %block3, label %block2 block1: %cond2 = icmp ugt i8 %x, 1 br i1 %cond2, label %block3, label %block2 block2: br label %block3 block3: %cond3 = icmp eq i8 %x, 0 br i1 %cond3, label %exit, label %block1 exit: ret void } ``` `simplifyBranchOnICmpChain` folds the branch in `entry` to a switch. Then we get: ``` entry: switch i8 %x, label %block3 [ i8 1, label %block2 i8 0, label %block2 ] ... ``` `performValueComparisonIntoPredecessorFolding` redirects the default target `block3` into `block1` because `%x` is never zero. ``` entry: switch i8 %x, label %block1 [ i8 1, label %block2 i8 0, label %block2 ] ... ``` Then `turnSwitchRangeIntoICmp` will convert the switch back into a branch on icmp. ``` entry: %switch = icmp ult i8 %x, 2 br i1 %switch, label %block2, label %block1 ... ``` Since `block1` and `block2` share the same successor `block3`, `performBranchToCommonDestFolding` merges the conditions of `entry` and `block1`, resulting in the original pattern again.
llvm#165105) In `simplifyBranchOnICmpChain`, if we can merge the comparisons into a range check, use a conditional branch instead. This change also breaks the cycle found in llvm#165088. Closes llvm#165088. Detailed description of the cycle: ``` define void @pr165088_cycle_1(i8 %x) { entry: %switch = icmp uge i8 %x, 2 %cond1 = icmp ugt i8 %x, 1 %or.cond = and i1 %switch, %cond1 br i1 %or.cond, label %block3, label %block2 block1: %cond2 = icmp ugt i8 %x, 1 br i1 %cond2, label %block3, label %block2 block2: br label %block3 block3: %cond3 = icmp eq i8 %x, 0 br i1 %cond3, label %exit, label %block1 exit: ret void } ``` `simplifyBranchOnICmpChain` folds the branch in `entry` to a switch. Then we get: ``` entry: switch i8 %x, label %block3 [ i8 1, label %block2 i8 0, label %block2 ] ... ``` `performValueComparisonIntoPredecessorFolding` redirects the default target `block3` into `block1` because `%x` is never zero. ``` entry: switch i8 %x, label %block1 [ i8 1, label %block2 i8 0, label %block2 ] ... ``` Then `turnSwitchRangeIntoICmp` will convert the switch back into a branch on icmp. ``` entry: %switch = icmp ult i8 %x, 2 br i1 %switch, label %block2, label %block1 ... ``` Since `block1` and `block2` share the same successor `block3`, `performBranchToCommonDestFolding` merges the conditions of `entry` and `block1`, resulting in the original pattern again.
|
Hello, |
In
simplifyBranchOnICmpChain, if we can merge the comparisons into a range check, use a conditional branch instead. This change also breaks the cycle found in #165088.Note: I cannot guarantee that such an infinite loop cannot be encountered after this patch. FWICT, we don't have a method to efficiently detect this bug for CFG optimizations (For peephole optimizations, see Termination-Checking for LLVM Peephole Optimizations).
Closes #165088.
Detailed description of the cycle:
simplifyBranchOnICmpChainfolds the branch inentryto a switch. Then we get:performValueComparisonIntoPredecessorFoldingredirects the default targetblock3intoblock1because%xis never zero.Then
turnSwitchRangeIntoICmpwill convert the switch back into a branch on icmp.Since
block1andblock2share the same successorblock3,performBranchToCommonDestFoldingmerges the conditions ofentryandblock1, resulting in the original pattern again.