-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[InstCombinePHI] Enhance PHI CSE to remove redundant phis #163453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6432743
to
306b429
Compare
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-llvm-transforms Author: Congzhe (CongzheUalberta) ChangesEnhanced PHI CSE to eliminate redundant PHIs, which could clean up the IR and open up opportunities for other passes such as loop vectorization. Motivation:Given the following range() function,
The IR that is right before loop vectorization is shown below. Here
llvm trunk is not able to vectorize it because there are redundant phi ( Those redundant instructions just act exactly the same as How the redundant phi was generated: It was initially introduced by GVN that did load-in-loop-pre, which partially eliminated the load of Compiler explorer: https://godbolt.org/z/f4ncn3Kjo Full diff: https://github.com/llvm/llvm-project/pull/163453.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp b/llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
index 9815644f5f43d..e736e89a3a146 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
@@ -1621,11 +1621,90 @@ Instruction *InstCombinerImpl::visitPHINode(PHINode &PN) {
// Note that even though we've just canonicalized this PHI, due to the
// worklist visitation order, there are no guarantess that *every* PHI
// has been canonicalized, so we can't just compare operands ranges.
- if (!PN.isIdenticalToWhenDefined(&IdenticalPN))
- continue;
- // Just use that PHI instead then.
- ++NumPHICSEs;
- return replaceInstUsesWith(PN, &IdenticalPN);
+ if (PN.isIdenticalToWhenDefined(&IdenticalPN)) {
+ // Just use that PHI instead then.
+ ++NumPHICSEs;
+ return replaceInstUsesWith(PN, &IdenticalPN);
+ }
+
+ // Look for the following pattern and do PHI CSE to clean up the
+ // redundant %phi. Here %phi, %1 and %phi.next perform the same
+ // functionality as %identicalPhi and hence %phi can be eliminated.
+ //
+ // BB1:
+ // %identicalPhi = phi [ X, %BB0 ], [ %identicalPhi.next, %BB1 ]
+ // %phi = phi [ X, %BB0 ], [ %phi.next, %BB1 ]
+ // ...
+ // %identicalPhi.next = select %cmp, %val, %identicalPhi
+ // %1 = select %cmp2, %identicalPhi, float %phi
+ // %phi.next = select %cmp, %val, %1
+ //
+ // Prove that %phi and %identicalPhi are the same by induction:
+ //
+ // Base case: Both %phi and %identicalPhi are equal on entry to the loop.
+ // Inductive case:
+ // Suppose %phi and %identicalPhi are equal at iteration i.
+ // We look at their values at iteration i+1 which are %phi.next and
+ // %identicalPhi.next. They would have become different only when %cmp is
+ // false and the corresponding values %1 and %identicalPhi differ.
+ //
+ // The only condition when %1 and %identicalPh could differ is when %cmp2
+ // is false and %1 is %phi, which contradicts our inductive hypothesis
+ // that %phi and %identicalPhi are equal. Thus %phi and %identicalPhi are
+ // always equal at iteration i+1.
+
+ if (PN.getNumIncomingValues() == 2 && PN.getNumUses() == 1) {
+ unsigned diffVals = 0;
+ unsigned diffValIdx = 0;
+ // Check that only the backedge incoming value is different.
+ for (unsigned i = 0; i < 2; i++) {
+ if (PN.getIncomingValue(i) != IdenticalPN.getIncomingValue(i)) {
+ diffVals++;
+ diffValIdx = i;
+ }
+ }
+ BasicBlock *CurBB = PN.getParent();
+ if (diffVals == 2 || PN.getIncomingBlock(diffValIdx) != CurBB)
+ continue;
+ // Now check that the backedge incoming values are two select
+ // instructions that are in the same BB, and have the same condition,
+ // true value.
+ auto *Val = PN.getIncomingValue(diffValIdx);
+ auto *IdenticalVal = IdenticalPN.getIncomingValue(diffValIdx);
+ if (!isa<SelectInst>(Val) || !isa<SelectInst>(IdenticalVal))
+ continue;
+
+ auto *SI = cast<SelectInst>(Val);
+ auto *IdenticalSI = cast<SelectInst>(IdenticalVal);
+ if (SI->getParent() != CurBB || IdenticalSI->getParent() != CurBB)
+ continue;
+ if (SI->getCondition() != IdenticalSI->getCondition() ||
+ SI->getTrueValue() != IdenticalSI->getTrueValue())
+ continue;
+
+ // Now check that the false values, i.e., %1 and %identicalPhi,
+ // are essentially the same value within the same BB.
+ auto SameSelAndPhi = [&](SelectInst *SI, PHINode *IdenticalPN,
+ PHINode *PN) {
+ if (SI->getTrueValue() == IdenticalPN) {
+ return SI->getFalseValue() == PN;
+ }
+ return false;
+ };
+ auto *FalseVal = SI->getFalseValue();
+ auto *IdenticalSIFalseVal =
+ dyn_cast<PHINode>(IdenticalSI->getFalseValue());
+ if (!isa<SelectInst>(FalseVal) || !IdenticalSIFalseVal ||
+ IdenticalSIFalseVal != &IdenticalPN)
+ continue;
+ auto *FalseValSI = cast<SelectInst>(FalseVal);
+ if (FalseValSI->getParent() != CurBB ||
+ !SameSelAndPhi(FalseValSI, &IdenticalPN, &PN))
+ continue;
+
+ ++NumPHICSEs;
+ return replaceInstUsesWith(PN, &IdenticalPN);
+ }
}
// If this is an integer PHI and we know that it has an illegal type, see if
diff --git a/llvm/test/Transforms/InstCombine/enhanced-phi-cse.ll b/llvm/test/Transforms/InstCombine/enhanced-phi-cse.ll
new file mode 100644
index 0000000000000..ae589b7450465
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/enhanced-phi-cse.ll
@@ -0,0 +1,61 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -S -passes=instcombine -instcombine-enhanced-phi-cse=true | FileCheck %s
+@A = extern_weak global float, align 4
+
+; %phi.to.remove acts the same as %v1, and can be eliminated with PHI CSE.
+define void @enhanced_phi_cse(ptr %m, ptr %n, i32 %count) {
+; CHECK-LABEL: @enhanced_phi_cse(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: br label [[FOR_BODY:%.*]]
+; CHECK: for.body:
+; CHECK-NEXT: [[V0:%.*]] = phi float [ 0x4415AF1D80000000, [[ENTRY:%.*]] ], [ [[V0_1:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT: [[V1:%.*]] = phi float [ 0xC415AF1D80000000, [[ENTRY]] ], [ [[V1_1:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT: [[I:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[INC_I:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT: [[Q:%.*]] = phi ptr [ [[M:%.*]], [[ENTRY]] ], [ [[Q_NEXT:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT: [[C:%.*]] = phi ptr [ [[N:%.*]], [[ENTRY]] ], [ [[C_NEXT:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT: [[Q_LOAD:%.*]] = load float, ptr [[Q]], align 4
+; CHECK-NEXT: [[C_LOAD:%.*]] = load float, ptr [[C]], align 4
+; CHECK-NEXT: [[SUB:%.*]] = fsub float [[Q_LOAD]], [[C_LOAD]]
+; CHECK-NEXT: [[CMP1:%.*]] = fcmp olt float [[SUB]], [[V0]]
+; CHECK-NEXT: [[V0_1]] = select i1 [[CMP1]], float [[SUB]], float [[V0]]
+; CHECK-NEXT: [[CMP2:%.*]] = fcmp ogt float [[SUB]], [[V1]]
+; CHECK-NEXT: [[V1_1]] = select i1 [[CMP2]], float [[SUB]], float [[V1]]
+; CHECK-NEXT: [[INC_I]] = add nuw nsw i32 [[I]], 1
+; CHECK-NEXT: [[Q_NEXT]] = getelementptr inbounds float, ptr [[Q]], i64 1
+; CHECK-NEXT: [[C_NEXT]] = getelementptr inbounds float, ptr [[C]], i64 1
+; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC_I]], [[COUNT:%.*]]
+; CHECK-NEXT: br i1 [[EXITCOND]], label [[EXIT:%.*]], label [[FOR_BODY]]
+; CHECK: exit:
+; CHECK-NEXT: store float [[V1_1]], ptr @A, align 4
+; CHECK-NEXT: ret void
+;
+entry:
+ br label %for.body
+
+for.body: ; preds = %entry, %for.body
+ %v0 = phi float [ 0x4415AF1D80000000, %entry ], [ %v0.1, %for.body ]
+ %v1 = phi float [ 0xC415AF1D80000000, %entry ], [ %v1.1, %for.body ]
+ %phi.to.remove = phi float [ 0xC415AF1D80000000, %entry ], [ %phi.to.remove.next, %for.body ]
+ %i = phi i32 [ 0, %entry ], [ %inc.i, %for.body ]
+ %q = phi ptr [ %m, %entry ], [ %q.next, %for.body ]
+ %c = phi ptr [ %n, %entry ], [ %c.next, %for.body ]
+ %q.load = load float, ptr %q
+ %c.load = load float, ptr %c
+ %sub = fsub float %q.load, %c.load
+ %cmp1 = fcmp olt float %sub, %v0
+ %v0.1 = select i1 %cmp1, float %sub, float %v0
+ %same.as.v1 = select i1 %cmp1, float %v1, float %phi.to.remove
+ %cmp2 = fcmp ogt float %sub, %same.as.v1
+ %v1.1 = select i1 %cmp2, float %sub, float %v1
+ %phi.to.remove.next = select i1 %cmp2, float %sub, float %same.as.v1
+ %inc.i = add nuw nsw i32 %i, 1
+ %q.next = getelementptr inbounds float, ptr %q, i64 1
+ %c.next = getelementptr inbounds float, ptr %c, i64 1
+ %exitcond = icmp eq i32 %inc.i, %count
+ br i1 %exitcond, label %exit, label %for.body
+
+exit:
+ %vl.1.lcssa = phi float [ %v1.1, %for.body ]
+ store float %vl.1.lcssa, ptr @A
+ ret void
+}
|
Enhanced PHI CSE to eliminate redundant PHIs, which could clean up the IR and open up opportunities for other passes such as loop vectorization.
5843a60
to
4715789
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect that this fold should be implemented in simplifySelectInst
and simplify %1 = select %cmp2, %identicalPhi, %phi -> %identicalPhi
. It is not a strong requirement, so it's up to you.
// that %phi and %identicalPhi are equal. Thus %phi and %identicalPhi are | ||
// always equal at iteration i+1. | ||
|
||
if (PN.getNumIncomingValues() == 2 && PN.getNumUses() == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for the one-use checking here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it is supposed to check for the target pattern that is the redundant cycle of phi (%phi.to.remove) and two select instructions (%phi.to.remove.next, %same.as.v1). Here and for the rest of my replies I'm using the notion of those variables (%phi.to.remove
, %phi.to.remove.next
, %same.as.v1
) from the motivative IR in the description which is the same as enhanced_phi_cse() in the test file.
I've now improved the code and I'm checking SI->getNumUses() == 1
on line 1866 instead, to make sure that %phi.to.remove.next
(which had been checked to be a select) has only one use that is the backedge incoming value of %phi.to.remove
. Hope it is more readable now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said before, we can do the same fold by simplifying %1 = select %cmp2, %identicalPhi, %phi -> %identicalPhi
. Since we never check the uses of instructions in InstSimplify, I don't think it is necessary in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thought about it more and I do agree with you that the one-use checking is not necessary in this case. I've now deleted the one-use checking. Thanks for pointing it out!
} | ||
} | ||
BasicBlock *CurBB = PN.getParent(); | ||
if (diffVals == 2 || PN.getIncomingBlock(diffValIdx) != CurBB) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The header and latch block can be different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this patch I limited it to the case where the loop is a single BB, hence redundant cycle of phi and selects are in the same BB and control flow equivalent. With multiple BB and control flow divergence, it would become more complex and not very straightforward to prove that they are purely redundant. Therefore I'd like to make it work for the single BB case, and then I could possibly extend it to multi-BB scenarios. Does it sound reasonable to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please provide a counterexample to demonstrate that removing this constraint may lead to a crash or miscompilation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought more about it and I think you are right. I could not come up with a counterexample that would break my code when this constraint was removed - it would work correctly without the single BB constraint. I've now removed the constraint and simplified the code. Thanks for this comment.
ff369b4
to
78bfbef
Compare
78bfbef
to
fbc4a5f
Compare
Crash reproducer:
|
1e9166c
to
4cfc7bf
Compare
Thanks for the case. I've also checked other failures in the last CI, they occurred because with I've fixed it by using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find that it is more natural to handle this pattern in simplifySelectInst
:
bool isSimplifierIdenticalPHI(PHINode &PHI, PHINode &IdenticalPHI) {
if (PHI.getParent() != IdenticalPHI.getParent())
return false;
// Check incoming values
...
// Check next values
...
}
Value *simplifySelectInst(Value *Cond, Value *TrueVal, Value *FalseVal,
const SimplifyQuery &Q, unsigned MaxRecurse) {
...
if (auto *TruePHI = dyn_cast<PHINode>(TrueVal)) {
if (auto *FalsePHI = dyn_cast<PHINode>(FalseVal)) {
if (isSimplifierIdenticalPHI(*TruePHI, *FalsePHI))
return FalseVal;
if (isSimplifierIdenticalPHI(*TruePHI, *FalsePHI))
return FalseVal;
if (isSimplifierIdenticalPHI(*FalsePHI, *TruePHI))
return TrueVal;
}
}
return nullptr;
}
if (!isa<SelectInst>(Val) || !isa<SelectInst>(IdenticalVal)) | ||
continue; | ||
|
||
auto *SI = cast<SelectInst>(Val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use dyn_cast
instead of isa + cast
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've used dyn_cast
now.
continue; | ||
if (cast<PHINode>(IdenticalSIOtherVal) != &IdenticalPN) | ||
continue; | ||
auto *SIOtherValAsSel = cast<SelectInst>(SIOtherVal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use dyn_cast
instead of isa + cast
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've used dyn_cast
now.
}; | ||
if (!isa<SelectInst>(SIOtherVal) || !isa<PHINode>(IdenticalSIOtherVal)) | ||
continue; | ||
if (cast<PHINode>(IdenticalSIOtherVal) != &IdenticalPN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (cast<PHINode>(IdenticalSIOtherVal) != &IdenticalPN) | |
if (IdenticalSIOtherVal != &IdenticalPN) |
Cast doesn't change the pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed accordingly.
if (SI->getTrueValue() == IdenticalPN) { | ||
return SI->getFalseValue() == PN; | ||
} | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (SI->getTrueValue() == IdenticalPN) { | |
return SI->getFalseValue() == PN; | |
} | |
return false; | |
return SI->getTrueValue() == IdenticalPN && SI->getFalseValue() == PN; |
We don't need a lambda function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the lambda function.
// ... | ||
// %identicalPhi.next = select %cmp, %val, %identicalPhi | ||
// (or select %cmp, %identicalPhi, %val) | ||
// %1 = select %cmp2, %identicalPhi, %phi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please also add a commuted test with %1 = select %cmp2, %phi, %identicalPhi
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Twe more test cases added that reflect the commuted test, i.e., select_with_identical_phi_3()
and select_with_identical_phi_4()
.
I have updated the patch so that the feature is implemented in simplifySelectInst now. Aslo added two more test cases. |
fc743e2
to
ee823bc
Compare
// The only condition when %1 and %identicalPh could differ is when %cmp2 | ||
// is false and %1 is %phi, which contradicts our inductive hypothesis | ||
// that %phi and %identicalPhi are equal. Thus %phi and %identicalPhi are | ||
// always equal at iteration i+1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use ///
for header comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've used ///
now.
if (SI->getCondition() != IdenticalSI->getCondition() || | ||
(SI->getTrueValue() != IdenticalSI->getTrueValue() && | ||
SI->getFalseValue() != IdenticalSI->getFalseValue())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (SI->getCondition() != IdenticalSI->getCondition() || | |
(SI->getTrueValue() != IdenticalSI->getTrueValue() && | |
SI->getFalseValue() != IdenticalSI->getFalseValue())) | |
if (SI->getCondition() != IdenticalSI->getCondition()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated accordingly.
if (SI->getTrueValue() == IdenticalSI->getTrueValue()) { | ||
SIOtherVal = dyn_cast<SelectInst>(SI->getFalseValue()); | ||
IdenticalSIOtherVal = IdenticalSI->getFalseValue(); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else { | |
} else if (SI->getFalseValue() == IdenticalSI->getFalseValue()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated accordingly.
} else { | ||
SIOtherVal = dyn_cast<SelectInst>(SI->getTrueValue()); | ||
IdenticalSIOtherVal = IdenticalSI->getTrueValue(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | |
} else { | |
return false; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG
} | ||
|
||
/// Look for the following pattern and simplify %1 to %identicalPhi. | ||
/// Here %phi, %1 and %phi.next perform the same functionality as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to give %1
a meaningful name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed %1
to %to_fold
.
Thanks a lot for the review! |
Enhanced PHI CSE to eliminate redundant PHIs, which could clean up the IR and open up opportunities for other passes such as loop vectorization.
Motivation:
Given the following range() function,
The IR that is right before loop vectorization is shown below. Here
range()
is inlined into its caller and becomes the single BBfor.body
.llvm trunk is not able to vectorize it because there are redundant phi (
%phi.to.remove
) and redundant select instructions (%phi.to.remove.next
,%same.as.v1
).Those redundant instructions just act exactly the same as
%v1
, hence they are purely redundant and should be eliminated.This patch identifies the redundant phi and eliminates it, as a result the loop could get vectorized and it could improve one of our internal workloads by 10%.
How the redundant phi was generated:
It was initially introduced by GVN that did load-in-loop-pre, which partially eliminated the load of
%v1
and introduced in one of its predecessors this load%.pre = load float, ptr %v1
.%.pre
eventually became the redundant phi that was not cleaned up.Compiler explorer: https://godbolt.org/z/f4ncn3Kjo
Please refer to the IR before vectorization on main(), and IR before and after GVN on range().