-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[InstCombine] Push freeze through non-recurrence PHIs #157678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[InstCombine] Push freeze through non-recurrence PHIs #157678
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Cullen Rhodes (c-rhodes) Changes#157112 adds a test '@fold_phi_noundef_start_value' where we can't currently remove the freeze, even though it's possible. The two places in instcombine that deal with freezes are Therefore, I've updated 'pushFreezeToPreventPoisonFromPropagating' to support pushing the freeze to the incoming PHI values, as long as the BB of the frozen PHI does not dominate the BB of the incoming value(s). This fixes the test case added in #157112 and the other test changes look sensible, although perhaps I'm missing some edge cases (?). The 'match(U.get(), m_Undef())' check for undef PHI incoming value looks necessary to catch the case covered by '@two_undef' test in freeze-phi.ll. Without this check the undef becomes zero which doesnt seem right. Full diff: https://github.com/llvm/llvm-project/pull/157678.diff 4 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
index a74f292524b4d..4c562c5500de1 100644
--- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
@@ -5042,10 +5042,18 @@ InstCombinerImpl::pushFreezeToPreventPoisonFromPropagating(FreezeInst &OrigFI) {
// Op1.fr = Freeze(Op1)
// ... = Inst(Op1.fr, NonPoisonOps...)
- auto CanPushFreeze = [](Value *V) {
- if (!isa<Instruction>(V) || isa<PHINode>(V))
+ auto CanPushFreeze = [this](Value *V) {
+ if (!isa<Instruction>(V))
return false;
+ if (auto *PN = dyn_cast<PHINode>(V)) {
+ if (llvm::any_of(PN->incoming_values(), [this, &PN](Use &U) {
+ return DT.dominates(PN->getParent(), PN->getIncomingBlock(U)) ||
+ match(U.get(), m_Undef());
+ }))
+ return false;
+ }
+
// We can't push the freeze through an instruction which can itself create
// poison. If the only source of new poison is flags, we can simply
// strip them (since we know the only use is the freeze and nothing can
@@ -5070,7 +5078,10 @@ InstCombinerImpl::pushFreezeToPreventPoisonFromPropagating(FreezeInst &OrigFI) {
return nullptr;
auto *UserI = cast<Instruction>(U->getUser());
- Builder.SetInsertPoint(UserI);
+ if (auto *PN = dyn_cast<PHINode>(UserI))
+ Builder.SetInsertPoint(PN->getIncomingBlock(*U)->getTerminator());
+ else
+ Builder.SetInsertPoint(UserI);
Value *Frozen = Builder.CreateFreeze(V, V->getName() + ".fr");
U->set(Frozen);
continue;
diff --git a/llvm/test/Transforms/InstCombine/freeze-landingpad.ll b/llvm/test/Transforms/InstCombine/freeze-landingpad.ll
index 29167958c857f..259808c046490 100644
--- a/llvm/test/Transforms/InstCombine/freeze-landingpad.ll
+++ b/llvm/test/Transforms/InstCombine/freeze-landingpad.ll
@@ -14,14 +14,13 @@ define i32 @propagate_freeze_in_landingpad() personality ptr null {
; CHECK-NEXT: [[RES1:%.*]] = invoke i32 @foo()
; CHECK-NEXT: to label [[NORMAL_RETURN]] unwind label [[EXCEPTIONAL_RETURN]]
; CHECK: normal_return:
-; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[X]], 1
+; CHECK-NEXT: [[INC]] = add i32 [[X]], 1
; CHECK-NEXT: br label [[INVOKE_BB1]]
; CHECK: exceptional_return:
; CHECK-NEXT: [[PHI:%.*]] = phi i32 [ [[X]], [[INVOKE_BB1]] ], [ 0, [[INVOKE_BB2]] ]
; CHECK-NEXT: [[LANDING_PAD:%.*]] = landingpad { ptr, i32 }
; CHECK-NEXT: cleanup
-; CHECK-NEXT: [[FR:%.*]] = freeze i32 [[PHI]]
-; CHECK-NEXT: [[RES:%.*]] = shl i32 [[FR]], 1
+; CHECK-NEXT: [[RES:%.*]] = shl i32 [[PHI]], 1
; CHECK-NEXT: ret i32 [[RES]]
;
entry:
diff --git a/llvm/test/Transforms/InstCombine/freeze-phi.ll b/llvm/test/Transforms/InstCombine/freeze-phi.ll
index cdc9a5efe5933..098e6c5f6cdbb 100644
--- a/llvm/test/Transforms/InstCombine/freeze-phi.ll
+++ b/llvm/test/Transforms/InstCombine/freeze-phi.ll
@@ -94,13 +94,14 @@ define i32 @two(i1 %cond, i32 %x, i32 %x2) {
; CHECK-LABEL: @two(
; CHECK-NEXT: br i1 [[COND:%.*]], label [[A:%.*]], label [[B:%.*]]
; CHECK: A:
+; CHECK-NEXT: [[X_FR:%.*]] = freeze i32 [[X:%.*]]
; CHECK-NEXT: br label [[C:%.*]]
; CHECK: B:
+; CHECK-NEXT: [[X2_FR:%.*]] = freeze i32 [[X2:%.*]]
; CHECK-NEXT: br label [[C]]
; CHECK: C:
-; CHECK-NEXT: [[Y:%.*]] = phi i32 [ [[X:%.*]], [[A]] ], [ [[X2:%.*]], [[B]] ]
-; CHECK-NEXT: [[Y_FR:%.*]] = freeze i32 [[Y]]
-; CHECK-NEXT: ret i32 [[Y_FR]]
+; CHECK-NEXT: [[Y:%.*]] = phi i32 [ [[X_FR]], [[A]] ], [ [[X2_FR]], [[B]] ]
+; CHECK-NEXT: ret i32 [[Y]]
;
br i1 %cond, label %A, label %B
A:
diff --git a/llvm/test/Transforms/InstCombine/freeze.ll b/llvm/test/Transforms/InstCombine/freeze.ll
index af5cb0c75537b..91e89b3b5a332 100644
--- a/llvm/test/Transforms/InstCombine/freeze.ll
+++ b/llvm/test/Transforms/InstCombine/freeze.ll
@@ -269,15 +269,15 @@ define void @freeze_dominated_uses_catchswitch(i1 %c, i32 %x) personality ptr @_
; CHECK-NEXT: invoke void @use_i32(i32 0)
; CHECK-NEXT: to label %[[CLEANUP:.*]] unwind label %[[CATCH_DISPATCH:.*]]
; CHECK: [[IF_ELSE]]:
+; CHECK-NEXT: [[X_FR:%.*]] = freeze i32 [[X]]
; CHECK-NEXT: invoke void @use_i32(i32 1)
; CHECK-NEXT: to label %[[CLEANUP]] unwind label %[[CATCH_DISPATCH]]
; CHECK: [[CATCH_DISPATCH]]:
-; CHECK-NEXT: [[PHI:%.*]] = phi i32 [ 0, %[[IF_THEN]] ], [ [[X]], %[[IF_ELSE]] ]
+; CHECK-NEXT: [[PHI:%.*]] = phi i32 [ 0, %[[IF_THEN]] ], [ [[X_FR]], %[[IF_ELSE]] ]
; CHECK-NEXT: [[CS:%.*]] = catchswitch within none [label %[[CATCH:.*]], label %catch2] unwind to caller
; CHECK: [[CATCH]]:
; CHECK-NEXT: [[CP:%.*]] = catchpad within [[CS]] [ptr null, i32 64, ptr null]
-; CHECK-NEXT: [[PHI_FREEZE:%.*]] = freeze i32 [[PHI]]
-; CHECK-NEXT: call void @use_i32(i32 [[PHI_FREEZE]]) [ "funclet"(token [[CP]]) ]
+; CHECK-NEXT: call void @use_i32(i32 [[PHI]]) [ "funclet"(token [[CP]]) ]
; CHECK-NEXT: unreachable
; CHECK: [[CATCH2:.*:]]
; CHECK-NEXT: [[CP2:%.*]] = catchpad within [[CS]] [ptr null, i32 64, ptr null]
@@ -384,15 +384,16 @@ define i32 @freeze_phi_followed_by_phi(i1 %c, i32 %y, i32 %z) {
; CHECK-LABEL: define i32 @freeze_phi_followed_by_phi(
; CHECK-SAME: i1 [[C:%.*]], i32 [[Y:%.*]], i32 [[Z:%.*]]) {
; CHECK-NEXT: [[ENTRY:.*]]:
+; CHECK-NEXT: [[Z_FR:%.*]] = freeze i32 [[Z]]
+; CHECK-NEXT: [[Y_FR:%.*]] = freeze i32 [[Y]]
; CHECK-NEXT: br i1 [[C]], label %[[IF:.*]], label %[[JOIN:.*]]
; CHECK: [[IF]]:
; CHECK-NEXT: br label %[[JOIN]]
; CHECK: [[JOIN]]:
-; CHECK-NEXT: [[X:%.*]] = phi i32 [ [[Y]], %[[IF]] ], [ [[Z]], %[[ENTRY]] ]
-; CHECK-NEXT: [[PHI:%.*]] = phi i32 [ [[Z]], %[[IF]] ], [ [[Y]], %[[ENTRY]] ]
-; CHECK-NEXT: [[FR:%.*]] = freeze i32 [[X]]
-; CHECK-NEXT: call void @use_i32(i32 [[FR]])
-; CHECK-NEXT: call void @use_i32(i32 [[FR]])
+; CHECK-NEXT: [[X:%.*]] = phi i32 [ [[Y_FR]], %[[IF]] ], [ [[Z_FR]], %[[ENTRY]] ]
+; CHECK-NEXT: [[PHI:%.*]] = phi i32 [ [[Z_FR]], %[[IF]] ], [ [[Y_FR]], %[[ENTRY]] ]
+; CHECK-NEXT: call void @use_i32(i32 [[X]])
+; CHECK-NEXT: call void @use_i32(i32 [[X]])
; CHECK-NEXT: ret i32 [[PHI]]
;
entry:
@@ -1141,8 +1142,7 @@ exit:
}
; We can remove this freeze as the incoming values to the PHI have the same
-; well-defined start value and the GEP can't produce poison, but this is
-; currently unsupported.
+; well-defined start value and the GEP can't produce poison.
define void @fold_phi_noundef_start_value(ptr noundef %init, i1 %cond.0, i1 %cond.1, i1 %cond.2) {
; CHECK-LABEL: define void @fold_phi_noundef_start_value(
; CHECK-SAME: ptr noundef [[INIT:%.*]], i1 [[COND_0:%.*]], i1 [[COND_1:%.*]], i1 [[COND_2:%.*]]) {
@@ -1156,8 +1156,7 @@ define void @fold_phi_noundef_start_value(ptr noundef %init, i1 %cond.0, i1 %con
; CHECK-NEXT: br label %[[LOOP_LATCH]]
; CHECK: [[LOOP_LATCH]]:
; CHECK-NEXT: [[IV_2:%.*]] = phi ptr [ [[IV_0]], %[[LOOP]] ], [ [[IV_1]], %[[IF_ELSE]] ]
-; CHECK-NEXT: [[IV_2_FR:%.*]] = freeze ptr [[IV_2]]
-; CHECK-NEXT: [[IV_2_FR_INT:%.*]] = ptrtoint ptr [[IV_2_FR]] to i64
+; CHECK-NEXT: [[IV_2_FR_INT:%.*]] = ptrtoint ptr [[IV_2]] to i64
; CHECK-NEXT: [[IV_0_INT:%.*]] = ptrtoint ptr [[IV_0]] to i64
; CHECK-NEXT: [[IDX:%.*]] = sub i64 [[IV_0_INT]], [[IV_2_FR_INT]]
; CHECK-NEXT: [[IV_0_NEXT]] = getelementptr i8, ptr [[IV_0]], i64 [[IDX]]
|
return false; | ||
|
||
if (auto *PN = dyn_cast<PHINode>(V)) { | ||
if (llvm::any_of(PN->incoming_values(), [this, &PN](Use &U) { | ||
return DT.dominates(PN->getParent(), PN->getIncomingBlock(U)) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check doesn't work for irreducible cycles. But we have an isBackEdge() helper in InstCombine for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a check for isBackEdge()
. Although unlike the other edge cases you pointed out I couldn't write a test case that triggered an issue for this one.
You can test this locally with the following command:git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 'HEAD~1' HEAD llvm/lib/Transforms/InstCombine/InstructionCombining.cpp llvm/test/Transforms/InstCombine/freeze-landingpad.ll llvm/test/Transforms/InstCombine/freeze-phi.ll llvm/test/Transforms/InstCombine/freeze.ll The following files introduce new uses of undef:
Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields In tests, avoid using For example, this is considered a bad practice: define void @fn() {
...
br i1 undef, ...
} Please use the following instead: define void @fn(i1 %cond) {
...
br i1 %cond, ...
} Please refer to the Undefined Behavior Manual for more information. |
ping |
@zyw-bot csmith-fuzz |
// invalidating the iterator. We simply don't support this case, but it | ||
// could be handled if there's a use case. | ||
if (isBackEdge(InBB, BB) || !VisitedBBs.insert(InBB).second || | ||
match(U.get(), m_Undef())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess in theory we could push the freeze to an undef argument, right? I noticed that this IR:
define i64 @foo() {
entry:
%res = freeze i64 undef
ret i64 %res
}
gets optimised by instcombine to just
define i64 @foo() {
entry:
ret i64 0
}
but this could be done in a later patch if we see a need for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without this check freeze is pushed to %x
in this test:
define i32 @two_undef(i8 %cond, i32 %x) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushing it through a freeze argument is correct, it's just likely to be non-profitable right now. I think it makes sense to skip that case.
@@ -5070,7 +5093,10 @@ InstCombinerImpl::pushFreezeToPreventPoisonFromPropagating(FreezeInst &OrigFI) { | |||
return nullptr; | |||
|
|||
auto *UserI = cast<Instruction>(U->getUser()); | |||
Builder.SetInsertPoint(UserI); | |||
if (auto *PN = dyn_cast<PHINode>(UserI)) | |||
Builder.SetInsertPoint(PN->getIncomingBlock(*U)->getTerminator()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure why this change is needed. Before this PR CanPushFreeze
always returned false for PHI nodes and the existing Builder.SetInsertPoint(UserI);
seemed to work perfectly fine. All that happens now is that we sometimes return false if it's a PHI node.
Is this fixing an existing bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the existing Builder.SetInsertPoint(UserI);
hasn't been removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I'm missing something, but before your patch CanPushFreeze always returned false for PHI nodes which meant we were calling Builder.SetInsertPoint(UserI);
for PHI nodes.
However, even in places where we still return false for PHI nodes same as before we're now calling Builder.SetInsertPoint(PN->getIncomingBlock(*U)->getTerminator());
. For example, it looks like you've changed the insertion point for PHI nodes with undef
as incoming value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that CanPushFreeze() is called on the use, while this is checking for the user being a phi node. Previously we could not have ended up with a phi user, as that would imply that we have pushed through a phi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok I think I understand now. So basically if CanPushFreeze returns true for a PHI node we fall through to code below that adds the operands of the PHI to the worklist. Then if CanPushFreeze returns false for one of those operands we'll end up asking for the user, which will be the PHI.
PR llvm#157112 adds a test '@fold_phi_noundef_start_value' where we can't currently remove the freeze, even though it's possible. The two places in instcombine that deal with freezes are 'pushFreezeToPreventPoisonFromPropagating' and 'foldFreezeIntoRecurrence'. The former doesn't support PHIs at all and the latter is restricted to recurrences. It doesn't consider this test a recurrence as the BB of the frozen PHI node '%loop.latch' does not dominate either of the BBs ('%loop', '%if.else') of the incoming values. Therefore, I've updated 'pushFreezeToPreventPoisonFromPropagating' to support pushing the freeze to the incoming PHI values, as long as the BB of the frozen PHI *does not* dominate the BB of the incoming value(s). This fixes the test case added in llvm#157112 and the other test changes look sensible, although perhaps I'm missing some edge cases (?). The 'match(U.get(), m_Undef())' check for undef PHI incoming value looks necessary to catch the case covered by '@two_undef' test in freeze-phi.ll. Without this check the undef becomes zero which doesnt seem right.
- remove unused %cond.2 from tests - bail on backedge from BB of incoming value of phi to BB of freeze - bail if incoming value of phi is from invoke and add a test - bail if phi has multiple values from same predecessor and test
This reverts commit 790e7ede98c32b4a8c2f0fa69a2817edfc63e475.
Whilst doing some further testing I found 527.cam4_r in SPEC2017 is hanging with this patch. I see:
is repeated when dumping OrigFI/OrigUse in index 6004feb1d885..be983ed63daf 100644
--- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
@@ -5026,6 +5026,9 @@ InstCombinerImpl::pushFreezeToPreventPoisonFromPropagating(FreezeInst &OrigFI) {
if (U == OrigUse)
return nullptr;
+ if (match(V, m_Poison()))
+ return nullptr;
+
auto *UserI = cast<Instruction>(U->getUser());
if (auto *PN = dyn_cast<PHINode>(UserI))
Builder.SetInsertPoint(PN->getIncomingBlock(*U)->getTerminator()); which fixes it, but still trying to write a test case. I did isolate the function in cam4 and attempt to reduce it will llvm-reduce, but just get testcases that take exactly N seconds to compile. I also collected data on the # of freezes in SPEC2017 (Ofast, LTO) after this patch and there's a ~13% increase. #154336 also caused an increase of ~27%. Although as previously mentioned #154336 (comment), I'm not convinced this is a particularly useful metric, but just wanted to share this. |
e1ef7e0
to
05d8a51
Compare
I've rebased and added a fix for the compile-time issue in spec2017 527.cam4_r, although I'm unable to write a test for it. This IR hits the fix I've added: https://godbolt.org/z/KrTcWroqr, initially I bailed out here but it caused a regression, so I changed it to not freeze the poison value instead. |
I don't understand the fix you have applied. Doesn't this mean that we will now fail to freeze direct poison operands? Can you provide some more context on what causes the compile-time issue? |
The tradeoff here is basically between having less freeze or having more freeze in positions where it has less impact. That the total number of freezes increases is expected. |
Sorry I admit it's not ideal, but I'm at a bit of a loss as to what to do about it. I'm seeing lots of freezes being created on direct poison operands, e.g.
and the 10 second one only has 31 more lines than the 5 second one and looking at the diff quite a few more PHIs: https://gist.github.com/c-rhodes/9042f77580a24df6dfcd849b55d5de64 but I'm struggling to isolate the specific issue or create a small reproducer. The previous godbolt link i shared is from llvm-reduce, but the loops and control-flow is gone. |
Sharing some findings:
although I'm not sure why that's the case or if it's sensible. |
@nikic any ideas here? Not sure if this is an acceptable fix. |
I don't think so. This probably just fixes it by accident, by changing worklist order or something. Something that may be worth doing (also independently of this PR) is teaching isGuaranteedNotToBePoison about the splat pattern. It includes two poison values, but only the poison-ness of the splatted value actually matters (plus some complications due to FMF probably). Doing this might reduce the number of freezes we generate and the eliminate again. Though it doesn't really address whatever the underlying problem here is. |
I think the problem here is similar to what I ran into with the original change in #154336, just for phi nodes. For cyclic phis, we only push the freeze to that phi, and then it will get pushed through in a separate transform. This kind of incrementally pushing freeze (instead of freezing the whole graph) ends up being slow in some cases, though I can't say I fully understand what causes it in this specific case. Probably if we want to push through freeze we need to integrate the recurrence and non-recurrence cases... |
Do you mean perhaps doing this as part of #155638 or going even further and integrating |
Integrating both functions is what I had in mind, but I'm not sure how hard that would be... |
#157112 adds a test '@fold_phi_noundef_start_value' where we can't currently remove the freeze, even though it's possible. The two places in instcombine that deal with freezes are
'pushFreezeToPreventPoisonFromPropagating' and
'foldFreezeIntoRecurrence'. The former doesn't support PHIs at all and the latter is restricted to recurrences. It doesn't consider this test a recurrence as the BB of the frozen PHI node '%loop.latch' does not dominate either of the BBs ('%loop', '%if.else') of the incoming values.
Therefore, I've updated 'pushFreezeToPreventPoisonFromPropagating' to support pushing the freeze to the incoming PHI values, as long as the BB of the frozen PHI does not dominate the BB of the incoming value(s). This fixes the test case added in #157112 and the other test changes look sensible, although perhaps I'm missing some edge cases (?).
The 'match(U.get(), m_Undef())' check for undef PHI incoming value looks necessary to catch the case covered by '@two_undef' test in freeze-phi.ll. Without this check the undef becomes zero which doesnt seem right.