-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[LoopInterchange] Also look at lcssa phis in outer loop latch block #160889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Sjoerd Meijer (sjoerdmeijer) ChangesThis deals with a corner case of LCSSA phi nodes in the outer loop latch block: the loop was in LCSSA form, some transformations can come along (e.g. unswitch) and create an empty block:
Interchange then brings it in LCSSA form again and we get:
Which means that we have a chain of LCSSA phi nodes from %new.cond.lcssa to %old.cond.lcssa. The problem is that interchange can reoder blocks BB4 and BB5 placing the use before the def if we don't check this. The observation is that %old.cond.lcssa is unused, so instead of moving and renaming these phi nodes, just delete it if it's trivially dead. If it isn't trivially dead, it is handled elsewhere. The loop should still be in LCSSA form, and if it isn't, formLCSSARecursively is called after the interchange rewrite. Fixes #160068 Full diff: https://github.com/llvm/llvm-project/pull/160889.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
index 28ae4f0a0aad9..e42d82d1533e1 100644
--- a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
@@ -44,6 +44,7 @@
#include "llvm/Transforms/Scalar/LoopPassManager.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/LoopUtils.h"
+#include "llvm/Transforms/Utils/Local.h"
#include <cassert>
#include <utility>
#include <vector>
@@ -1837,6 +1838,38 @@ static void moveLCSSAPhis(BasicBlock *InnerExit, BasicBlock *InnerHeader,
for (PHINode *P : LcssaInnerLatch)
P->moveBefore(InnerExit->getFirstNonPHIIt());
+ // This deals with a corner case of LCSSA phi nodes in the outer loop latch
+ // block: the loop was in LCSSA form, some transformations can come along
+ // (e.g. unswitch) and create an empty block:
+ //
+ // BB4:
+ // br label %BB5
+ // BB5:
+ // %old.cond.lcssa = phi i16 [ %cond, %BB4 ]
+ // br outer.header
+ //
+ // Interchange then brings it in LCSSA form again and we get:
+ //
+ // BB4:
+ // %new.cond.lcssa = phi i16 [ %cond, %BB3 ]
+ // br label %BB5
+ // BB5:
+ // %old.cond.lcssa = phi i16 [ %new.cond.lcssa, %BB4 ]
+ //
+ // Which means that we have a chain of LCSSA phi nodes from %new.cond.lcssa
+ // to %old.cond.lcssa. The problem is that interchange can reoder blocks BB4
+ // and BB5 placing the use before the def if we don't check this. The
+ // observation is that %old.cond.lcssa is unused, so instead of moving and
+ // renaming these phi nodes, just delete it if it's trivially dead. If it
+ // isn't trivially dead, it is handled above. The loop should still be in
+ // LCSSA form, and if it isn't, formLCSSARecursively is called after the
+ // interchange rewrite.
+ SmallVector<PHINode *, 8> LcssaOuterLatch(
+ llvm::make_pointer_range(OuterLatch->phis()));
+ for (PHINode *P : LcssaOuterLatch)
+ if (isInstructionTriviallyDead(P))
+ P->eraseFromParent();
+
// Deal with LCSSA PHI nodes in the loop nest exit block. For PHIs that have
// incoming values defined in the outer loop, we have to add a new PHI
// in the inner loop latch, which became the exit block of the outer loop,
diff --git a/llvm/test/Transforms/LoopInterchange/lcssa-phi-outer-latch.ll b/llvm/test/Transforms/LoopInterchange/lcssa-phi-outer-latch.ll
new file mode 100644
index 0000000000000..482db85fe33e8
--- /dev/null
+++ b/llvm/test/Transforms/LoopInterchange/lcssa-phi-outer-latch.ll
@@ -0,0 +1,75 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --prefix-filecheck-ir-name SJM --version 6
+; RUN: opt < %s -passes=loop-interchange -cache-line-size=64 -verify-dom-info -verify-loop-info -verify-scev -verify-loop-lcssa -S | FileCheck %s
+
+; This test is checking that blocks BB4 and BB5, where BB4 is the exit
+; block of the inner loop and BB5 the latch of the outer loop, correctly
+; deal with the phi-node use-def chain %new.cond.lcssa -> %old.cond.lcssa.
+
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+
+define i16 @main() {
+; CHECK-LABEL: define i16 @main() {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: br label %[[BB2_PREHEADER:.*]]
+; CHECK: [[BB1_PREHEADER:.*]]:
+; CHECK-NEXT: br label %[[SJMBB1:.*]]
+; CHECK: [[SJMBB1]]:
+; CHECK-NEXT: [[I:%.*]] = phi i64 [ [[I_NEXT:%.*]], %[[BB5:.*]] ], [ 1, %[[BB1_PREHEADER]] ]
+; CHECK-NEXT: br label %[[BB2_SPLIT:.*]]
+; CHECK: [[BB2_PREHEADER]]:
+; CHECK-NEXT: br label %[[SJMBB2:.*]]
+; CHECK: [[SJMBB2]]:
+; CHECK-NEXT: [[J:%.*]] = phi i16 [ [[TMP1:%.*]], %[[BB3_SPLIT:.*]] ], [ 0, %[[BB2_PREHEADER]] ]
+; CHECK-NEXT: br label %[[BB1_PREHEADER]]
+; CHECK: [[BB2_SPLIT]]:
+; CHECK-NEXT: [[ARRAYIDX_US_US:%.*]] = getelementptr i16, ptr null, i16 [[J]]
+; CHECK-NEXT: [[TMP0:%.*]] = load i16, ptr [[ARRAYIDX_US_US]], align 1
+; CHECK-NEXT: [[COND:%.*]] = select i1 false, i16 0, i16 0
+; CHECK-NEXT: br label %[[SJMBB3:.*]]
+; CHECK: [[SJMBB3]]:
+; CHECK-NEXT: [[J_NEXT:%.*]] = add i16 [[J]], 1
+; CHECK-NEXT: br label %[[SJMBB4:.*]]
+; CHECK: [[BB3_SPLIT]]:
+; CHECK-NEXT: [[NEW_COND_LCSSA:%.*]] = phi i16 [ [[COND]], %[[BB5]] ]
+; CHECK-NEXT: [[TMP1]] = add i16 [[J]], 1
+; CHECK-NEXT: br i1 true, label %[[EXIT:.*]], label %[[SJMBB2]]
+; CHECK: [[SJMBB4]]:
+; CHECK-NEXT: br label %[[BB5]]
+; CHECK: [[BB5]]:
+; CHECK-NEXT: [[I_NEXT]] = add i64 [[I]], 1
+; CHECK-NEXT: [[CMP286_US:%.*]] = icmp ugt i64 [[I]], 0
+; CHECK-NEXT: br i1 [[CMP286_US]], label %[[SJMBB1]], label %[[BB3_SPLIT]]
+; CHECK: [[EXIT]]:
+; CHECK-NEXT: ret i16 0
+;
+entry:
+ br label %BB1
+
+BB1:
+ %i = phi i64 [ 1, %entry ], [ %i.next, %BB5 ]
+ br label %BB2
+
+BB2:
+ %j = phi i16 [ 0, %BB1 ], [ %j.next, %BB3 ]
+ %arrayidx.us.us = getelementptr i16, ptr null, i16 %j
+ %0 = load i16, ptr %arrayidx.us.us, align 1
+ %cond = select i1 false, i16 0, i16 0
+ br label %BB3
+
+BB3:
+ %j.next = add i16 %j, 1
+ br i1 true, label %BB4, label %BB2
+
+BB4:
+ %new.cond.lcssa = phi i16 [ %cond, %BB3 ]
+ br label %BB5
+
+BB5:
+ %old.cond.lcssa = phi i16 [ %new.cond.lcssa, %BB4 ]
+ %i.next = add i64 %i, 1
+ %cmp286.us = icmp ugt i64 %i, 0
+ br i1 %cmp286.us, label %BB1, label %exit
+
+exit:
+ ret i16 0
+}
|
You can test this locally with the following command:git-clang-format --diff origin/main HEAD --extensions cpp -- llvm/lib/Transforms/Scalar/LoopInterchange.cpp
View the diff from clang-format here.diff --git a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
index b4003f8cd..30d4b03e8 100644
--- a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
@@ -43,8 +43,8 @@
#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Scalar/LoopPassManager.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"
-#include "llvm/Transforms/Utils/LoopUtils.h"
#include "llvm/Transforms/Utils/Local.h"
+#include "llvm/Transforms/Utils/LoopUtils.h"
#include <cassert>
#include <utility>
#include <vector>
|
This deals with a corner case of LCSSA phi nodes in the outer loop latch block: the loop was in LCSSA form, some transformations can come along (e.g. unswitch) and create an empty block: BB4: br label %BB5 BB5: %old.cond.lcssa = phi i16 [ %cond, %BB4 ] br outer.header Interchange then brings it in LCSSA form again and we get: BB4: %new.cond.lcssa = phi i16 [ %cond, %BB3 ] br label %BB5 BB5: %old.cond.lcssa = phi i16 [ %new.cond.lcssa, %BB4 ] Which means that we have a chain of LCSSA phi nodes from %new.cond.lcssa to %old.cond.lcssa. The problem is that interchange can reoder blocks BB4 and BB5 placing the use before the def if we don't check this. The observation is that %old.cond.lcssa is unused, so instead of moving and renaming these phi nodes, just delete it if it's trivially dead. If it isn't trivially dead, it is handled elsewhere. The loop should still be in LCSSA form, and if it isn't, formLCSSARecursively is called after the interchange rewrite. Fixes llvm#160068
70503bc
to
18e6679
Compare
entry: | ||
br label %BB1 | ||
|
||
BB1: | ||
%i = phi i64 [ 1, %entry ], [ %i.next, %BB5 ] | ||
br label %BB2 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have more descriptive names for the blocks and variables to make the test easier to read, e.g.,
entry:
...
outer.header: ; BB1
...
inner.header: ; BB2
...
inner.latch: ; BB3
...
outer.body: ; BB4
...
outer.latch: ; BB5
...
exit:
...
The same would apply for the variables.
%arrayidx.us.us = getelementptr i16, ptr null, i16 %j | ||
%0 = load i16, ptr %arrayidx.us.us, align 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this always UB?
%j = phi i16 [ 0, %BB1 ], [ %j.next, %BB3 ] | ||
%arrayidx.us.us = getelementptr i16, ptr null, i16 %j | ||
%0 = load i16, ptr %arrayidx.us.us, align 1 | ||
%cond = select i1 false, i16 0, i16 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to define an arbitrary value, it might be better to use freeze i16 poison
.
// BB5: | ||
// %old.cond.lcssa = phi i16 [ %cond, %BB4 ] | ||
// br outer.header | ||
// | ||
// Interchange then brings it in LCSSA form again and we get: | ||
// | ||
// BB4: | ||
// %new.cond.lcssa = phi i16 [ %cond, %BB3 ] | ||
// br label %BB5 | ||
// BB5: | ||
// %old.cond.lcssa = phi i16 [ %new.cond.lcssa, %BB4 ] | ||
// | ||
// Which means that we have a chain of LCSSA phi nodes from %new.cond.lcssa | ||
// to %old.cond.lcssa. The problem is that interchange can reoder blocks BB4 | ||
// and BB5 placing the use before the def if we don't check this. The | ||
// observation is that %old.cond.lcssa is unused, so instead of moving and | ||
// renaming these phi nodes, just delete it if it's trivially dead. If it | ||
// isn't trivially dead, it is handled above. The loop should still be in | ||
// LCSSA form, and if it isn't, formLCSSARecursively is called after the | ||
// interchange rewrite. | ||
SmallVector<PHINode *, 8> LcssaOuterLatch( | ||
llvm::make_pointer_range(OuterLatch->phis())); | ||
for (PHINode *P : LcssaOuterLatch) | ||
if (isInstructionTriviallyDead(P)) | ||
P->eraseFromParent(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure, but what happens if the phi node isn't dead? For example, in this case, if %old.cond.lcssa
is used inside BB5, would interchange still generate ill-formed IR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review.
It's a good point. I was claiming they will then be handled by the other checks. But let me go back and look at some examples and double check, and I guess at least some asserts are required here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If LoopInterchange has to build LCSSA form again anyway, have you considered cleaning this up at the same time? Then to be on the save side, do not any PHI nodes in OuterLoopLatch
unless InnerLoopExit == OuterLoopLatch
(like there is already a check InnerLoopPreHeader != OuterLoopHeader
in tightlyNested
). In addition to the case where %old.cond.lcssa
is not trivally dead, I could imagine other dubious patterns, such as a PHI node referencing itself.
SmallVector<PHINode *, 8> LcssaOuterLatch( | ||
llvm::make_pointer_range(OuterLatch->phis())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider iterating over OuterLatch->phis()
, and only adding those instructions to the list to be erased. That's the more established pattern.
I was just looking at this again to address earlier comments, and noticed your comments, thanks @Meinersbur !
Are you suggesting that instead of moving LCSSA nodes around, delete (all of) them, because interchange brings it back into LCSSA anyway? Maybe this is all related, and I will admit that I am struggling with fixing this. The reason is that the whole CFG is rewired, new blocks are created, and a lot of instructions are moved around including these lcssa phis. And it's difficult to see what guarantees what, and like you said, I have doubts about other patterns. So, if the suggestion is to not move all these lcssa phis around, maybe that helps. |
Not removing all nodes, but only look for single-input PHI nodes that are not in a loop's exit block. A reason to not do it would be it is more computationally expensive (another iteration over all BBs), but would result in fewer special cases in LoopInterchange itself. So it might be worth the trade-off. A PHI in the OuterLatch (if != InnerExit) would not be need to be considered tightly nested, the additional pass would just exist to normalize IR output from other passes, like a mini-SimplifyCFG. |
outer loop latch blocks that are not exit blocks.
I think I've addressed all comments. I've introduced a new helper that is now more intentional and explicit about checking non-exit blocks and single-input phis. I've added an assert there that I haven't managed to trigger in testing. I have changed the test case and made that a bit more interesting: the problematic phi was trivially dead before, but now it has a user in the exit block, which shows that we simplify the phi-chain and replace all uses. |
This deals with a corner case of LCSSA phi nodes in the outer loop latch block: the loop was in LCSSA form, some transformations can come along (e.g. unswitch) and create an empty block:
Interchange then brings it in LCSSA form again and we get:
Which means that we have a chain of LCSSA phi nodes from %new.cond.lcssa to %old.cond.lcssa. The problem is that interchange can reoder blocks BB4 and BB5 placing the use before the def if we don't check this. The observation is that %old.cond.lcssa is unused, so instead of moving and renaming these phi nodes, just delete it if it's trivially dead. If it isn't trivially dead, it is handled elsewhere. The loop should still be in LCSSA form, and if it isn't, formLCSSARecursively is called after the interchange rewrite.
Fixes #160068