Skip to content

Conversation

@sjoerdmeijer
Copy link
Collaborator

@sjoerdmeijer sjoerdmeijer commented Sep 26, 2025

This deals with a corner case of LCSSA phi nodes in the outer loop latch block: the loop was in LCSSA form, some transformations can come along (e.g. unswitch) and create an empty block:

 BB4:
   br label %BB5
 BB5:
   %old.cond.lcssa = phi i16 [ %cond, %BB4 ]
   br outer.header

Interchange then brings it in LCSSA form again and we get:

 BB4:
   %new.cond.lcssa = phi i16 [ %cond, %BB3 ]
   br label %BB5
 BB5:
   %old.cond.lcssa = phi i16 [ %new.cond.lcssa, %BB4 ]

Which means that we have a chain of LCSSA phi nodes from %new.cond.lcssa to %old.cond.lcssa. The problem is that interchange can reoder blocks BB4 and BB5 placing the use before the def if we don't check this. The solution is to simplify lcssa phis, and remove them from non-exit blocks if they are 1-input phi nodes.

Fixes #160068

@llvmbot
Copy link
Member

llvmbot commented Sep 26, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Sjoerd Meijer (sjoerdmeijer)

Changes

This deals with a corner case of LCSSA phi nodes in the outer loop latch block: the loop was in LCSSA form, some transformations can come along (e.g. unswitch) and create an empty block:

 BB4:
   br label %BB5
 BB5:
   %old.cond.lcssa = phi i16 [ %cond, %BB4 ]
   br outer.header

Interchange then brings it in LCSSA form again and we get:

 BB4:
   %new.cond.lcssa = phi i16 [ %cond, %BB3 ]
   br label %BB5
 BB5:
   %old.cond.lcssa = phi i16 [ %new.cond.lcssa, %BB4 ]

Which means that we have a chain of LCSSA phi nodes from %new.cond.lcssa to %old.cond.lcssa. The problem is that interchange can reoder blocks BB4 and BB5 placing the use before the def if we don't check this. The observation is that %old.cond.lcssa is unused, so instead of moving and renaming these phi nodes, just delete it if it's trivially dead. If it isn't trivially dead, it is handled elsewhere. The loop should still be in LCSSA form, and if it isn't, formLCSSARecursively is called after the interchange rewrite.

Fixes #160068


Full diff: https://github.com/llvm/llvm-project/pull/160889.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/LoopInterchange.cpp (+33)
  • (added) llvm/test/Transforms/LoopInterchange/lcssa-phi-outer-latch.ll (+75)
diff --git a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
index 28ae4f0a0aad9..e42d82d1533e1 100644
--- a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
@@ -44,6 +44,7 @@
 #include "llvm/Transforms/Scalar/LoopPassManager.h"
 #include "llvm/Transforms/Utils/BasicBlockUtils.h"
 #include "llvm/Transforms/Utils/LoopUtils.h"
+#include "llvm/Transforms/Utils/Local.h"
 #include <cassert>
 #include <utility>
 #include <vector>
@@ -1837,6 +1838,38 @@ static void moveLCSSAPhis(BasicBlock *InnerExit, BasicBlock *InnerHeader,
   for (PHINode *P : LcssaInnerLatch)
     P->moveBefore(InnerExit->getFirstNonPHIIt());
 
+  // This deals with a corner case of LCSSA phi nodes in the outer loop latch
+  // block: the loop was in LCSSA form, some transformations can come along
+  // (e.g. unswitch) and create an empty block:
+  //
+  //   BB4:
+  //     br label %BB5
+  //   BB5:
+  //     %old.cond.lcssa = phi i16 [ %cond, %BB4 ]
+  //     br outer.header
+  //
+  // Interchange then brings it in LCSSA form again and we get:
+  //
+  //   BB4:
+  //     %new.cond.lcssa = phi i16 [ %cond, %BB3 ]
+  //     br label %BB5
+  //   BB5:
+  //     %old.cond.lcssa = phi i16 [ %new.cond.lcssa, %BB4 ]
+  //
+  // Which means that we have a chain of LCSSA phi nodes from %new.cond.lcssa
+  // to %old.cond.lcssa. The problem is that interchange can reoder blocks BB4
+  // and BB5 placing the use before the def if we don't check this. The
+  // observation is that %old.cond.lcssa is unused, so instead of moving and
+  // renaming these phi nodes, just delete it if it's trivially dead. If it
+  // isn't trivially dead, it is handled above. The loop should still be in
+  // LCSSA form, and if it isn't, formLCSSARecursively is called after the
+  // interchange rewrite.
+  SmallVector<PHINode *, 8> LcssaOuterLatch(
+      llvm::make_pointer_range(OuterLatch->phis()));
+  for (PHINode *P : LcssaOuterLatch)
+     if (isInstructionTriviallyDead(P))
+       P->eraseFromParent();
+
   // Deal with LCSSA PHI nodes in the loop nest exit block. For PHIs that have
   // incoming values defined in the outer loop, we have to add a new PHI
   // in the inner loop latch, which became the exit block of the outer loop,
diff --git a/llvm/test/Transforms/LoopInterchange/lcssa-phi-outer-latch.ll b/llvm/test/Transforms/LoopInterchange/lcssa-phi-outer-latch.ll
new file mode 100644
index 0000000000000..482db85fe33e8
--- /dev/null
+++ b/llvm/test/Transforms/LoopInterchange/lcssa-phi-outer-latch.ll
@@ -0,0 +1,75 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --prefix-filecheck-ir-name SJM --version 6
+; RUN: opt < %s -passes=loop-interchange -cache-line-size=64 -verify-dom-info -verify-loop-info -verify-scev -verify-loop-lcssa -S | FileCheck %s
+
+; This test is checking that blocks BB4 and BB5, where BB4 is the exit
+; block of the inner loop and BB5 the latch of the outer loop, correctly
+; deal with the phi-node use-def chain %new.cond.lcssa -> %old.cond.lcssa.
+
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+
+define i16 @main() {
+; CHECK-LABEL: define i16 @main() {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    br label %[[BB2_PREHEADER:.*]]
+; CHECK:       [[BB1_PREHEADER:.*]]:
+; CHECK-NEXT:    br label %[[SJMBB1:.*]]
+; CHECK:       [[SJMBB1]]:
+; CHECK-NEXT:    [[I:%.*]] = phi i64 [ [[I_NEXT:%.*]], %[[BB5:.*]] ], [ 1, %[[BB1_PREHEADER]] ]
+; CHECK-NEXT:    br label %[[BB2_SPLIT:.*]]
+; CHECK:       [[BB2_PREHEADER]]:
+; CHECK-NEXT:    br label %[[SJMBB2:.*]]
+; CHECK:       [[SJMBB2]]:
+; CHECK-NEXT:    [[J:%.*]] = phi i16 [ [[TMP1:%.*]], %[[BB3_SPLIT:.*]] ], [ 0, %[[BB2_PREHEADER]] ]
+; CHECK-NEXT:    br label %[[BB1_PREHEADER]]
+; CHECK:       [[BB2_SPLIT]]:
+; CHECK-NEXT:    [[ARRAYIDX_US_US:%.*]] = getelementptr i16, ptr null, i16 [[J]]
+; CHECK-NEXT:    [[TMP0:%.*]] = load i16, ptr [[ARRAYIDX_US_US]], align 1
+; CHECK-NEXT:    [[COND:%.*]] = select i1 false, i16 0, i16 0
+; CHECK-NEXT:    br label %[[SJMBB3:.*]]
+; CHECK:       [[SJMBB3]]:
+; CHECK-NEXT:    [[J_NEXT:%.*]] = add i16 [[J]], 1
+; CHECK-NEXT:    br label %[[SJMBB4:.*]]
+; CHECK:       [[BB3_SPLIT]]:
+; CHECK-NEXT:    [[NEW_COND_LCSSA:%.*]] = phi i16 [ [[COND]], %[[BB5]] ]
+; CHECK-NEXT:    [[TMP1]] = add i16 [[J]], 1
+; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SJMBB2]]
+; CHECK:       [[SJMBB4]]:
+; CHECK-NEXT:    br label %[[BB5]]
+; CHECK:       [[BB5]]:
+; CHECK-NEXT:    [[I_NEXT]] = add i64 [[I]], 1
+; CHECK-NEXT:    [[CMP286_US:%.*]] = icmp ugt i64 [[I]], 0
+; CHECK-NEXT:    br i1 [[CMP286_US]], label %[[SJMBB1]], label %[[BB3_SPLIT]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret i16 0
+;
+entry:
+  br label %BB1
+
+BB1:
+  %i = phi i64 [ 1, %entry ], [ %i.next, %BB5 ]
+  br label %BB2
+
+BB2:
+  %j = phi i16 [ 0, %BB1 ], [ %j.next, %BB3 ]
+  %arrayidx.us.us = getelementptr i16, ptr null, i16 %j
+  %0 = load i16, ptr %arrayidx.us.us, align 1
+  %cond = select i1 false, i16 0, i16 0
+  br label %BB3
+
+BB3:
+  %j.next = add i16 %j, 1
+  br i1 true, label %BB4, label %BB2
+
+BB4:
+  %new.cond.lcssa = phi i16 [ %cond, %BB3 ]
+  br label %BB5
+
+BB5:
+  %old.cond.lcssa = phi i16 [ %new.cond.lcssa, %BB4 ]
+  %i.next = add i64 %i, 1
+  %cmp286.us = icmp ugt i64 %i, 0
+  br i1 %cmp286.us, label %BB1, label %exit
+
+exit:
+  ret i16 0
+}

@github-actions
Copy link

github-actions bot commented Sep 26, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

This deals with a corner case of LCSSA phi nodes in the outer loop latch
block: the loop was in LCSSA form, some transformations can come along
(e.g. unswitch) and create an empty block:

     BB4:
       br label %BB5
     BB5:
       %old.cond.lcssa = phi i16 [ %cond, %BB4 ]
       br outer.header

Interchange then brings it in LCSSA form again and we get:

     BB4:
       %new.cond.lcssa = phi i16 [ %cond, %BB3 ]
       br label %BB5
     BB5:
       %old.cond.lcssa = phi i16 [ %new.cond.lcssa, %BB4 ]

Which means that we have a chain of LCSSA phi nodes from %new.cond.lcssa
to %old.cond.lcssa. The problem is that interchange can reoder blocks
BB4 and BB5 placing the use before the def if we don't check this. The
observation is that %old.cond.lcssa is unused, so instead of moving and
renaming these phi nodes, just delete it if it's trivially dead. If it
isn't trivially dead, it is handled elsewhere. The loop should still be
in LCSSA form, and if it isn't, formLCSSARecursively is called after the
interchange rewrite.

Fixes llvm#160068
Comment on lines 54 to 55
%arrayidx.us.us = getelementptr i16, ptr null, i16 %j
%0 = load i16, ptr %arrayidx.us.us, align 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this always UB?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UB in tests are unfortunately common, but LoopInterchange does not care. Could replace null in getelementptr with a funcation argument or a global

%j = phi i16 [ 0, %BB1 ], [ %j.next, %BB3 ]
%arrayidx.us.us = getelementptr i16, ptr null, i16 %j
%0 = load i16, ptr %arrayidx.us.us, align 1
%cond = select i1 false, i16 0, i16 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to define an arbitrary value, it might be better to use freeze i16 poison.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, interchange doesn't care, it's the reproducer from the bug report, and it's short.
But yeah, it's no effort to read from a function argument, so will do that and change this accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, interchange doesn't care,

Future DA might care, when checking AliasAnalysis on whether the base pointers themselves alias.

Comment on lines 1847 to 1872
// BB5:
// %old.cond.lcssa = phi i16 [ %cond, %BB4 ]
// br outer.header
//
// Interchange then brings it in LCSSA form again and we get:
//
// BB4:
// %new.cond.lcssa = phi i16 [ %cond, %BB3 ]
// br label %BB5
// BB5:
// %old.cond.lcssa = phi i16 [ %new.cond.lcssa, %BB4 ]
//
// Which means that we have a chain of LCSSA phi nodes from %new.cond.lcssa
// to %old.cond.lcssa. The problem is that interchange can reoder blocks BB4
// and BB5 placing the use before the def if we don't check this. The
// observation is that %old.cond.lcssa is unused, so instead of moving and
// renaming these phi nodes, just delete it if it's trivially dead. If it
// isn't trivially dead, it is handled above. The loop should still be in
// LCSSA form, and if it isn't, formLCSSARecursively is called after the
// interchange rewrite.
SmallVector<PHINode *, 8> LcssaOuterLatch(
llvm::make_pointer_range(OuterLatch->phis()));
for (PHINode *P : LcssaOuterLatch)
if (isInstructionTriviallyDead(P))
P->eraseFromParent();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure, but what happens if the phi node isn't dead? For example, in this case, if %old.cond.lcssa is used inside BB5, would interchange still generate ill-formed IR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review.

It's a good point. I was claiming they will then be handled by the other checks. But let me go back and look at some examples and double check, and I guess at least some asserts are required here.

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If LoopInterchange has to build LCSSA form again anyway, have you considered cleaning this up at the same time? Then to be on the save side, do not any PHI nodes in OuterLoopLatch unless InnerLoopExit == OuterLoopLatch (like there is already a check InnerLoopPreHeader != OuterLoopHeader in tightlyNested). In addition to the case where %old.cond.lcssa is not trivally dead, I could imagine other dubious patterns, such as a PHI node referencing itself.

Comment on lines 1867 to 1868
SmallVector<PHINode *, 8> LcssaOuterLatch(
llvm::make_pointer_range(OuterLatch->phis()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider iterating over OuterLatch->phis(), and only adding those instructions to the list to be erased. That's the more established pattern.

@sjoerdmeijer
Copy link
Collaborator Author

sjoerdmeijer commented Oct 2, 2025

I was just looking at this again to address earlier comments, and noticed your comments, thanks @Meinersbur !
Just a quick question about your question to check if I understand:

If LoopInterchange has to build LCSSA form again anyway, have you considered cleaning this up at the same time?

Are you suggesting that instead of moving LCSSA nodes around, delete (all of) them, because interchange brings it back into LCSSA anyway? Maybe this is all related, and I will admit that I am struggling with fixing this. The reason is that the whole CFG is rewired, new blocks are created, and a lot of instructions are moved around including these lcssa phis. And it's difficult to see what guarantees what, and like you said, I have doubts about other patterns. So, if the suggestion is to not move all these lcssa phis around, maybe that helps.

@Meinersbur
Copy link
Member

Not removing all nodes, but only look for single-input PHI nodes that are not in a loop's exit block. A reason to not do it would be it is more computationally expensive (another iteration over all BBs), but would result in fewer special cases in LoopInterchange itself. So it might be worth the trade-off.

A PHI in the OuterLatch (if != InnerExit) would not be need to be considered tightly nested, the additional pass would just exist to normalize IR output from other passes, like a mini-SimplifyCFG.

outer loop latch blocks that are not exit blocks.
@sjoerdmeijer
Copy link
Collaborator Author

I think I've addressed all comments. I've introduced a new helper that is now more intentional and explicit about checking non-exit blocks and single-input phis. I've added an assert there that I haven't managed to trigger in testing. I have changed the test case and made that a bit more interesting: the problematic phi was trivially dead before, but now it has a user in the exit block, which shows that we simplify the phi-chain and replace all uses.

@kasuga-fj
Copy link
Contributor

kasuga-fj commented Oct 14, 2025

If I understand @Meinersbur 's comments correctly, they’re suggesting looking for single-input PHIs and bailing out if there’s one in a BB which is not InnerExit. So I imagined adding something like the following to tightlyNested.

for (BasicBlock *BB : OuterLoop->blocks()) {
  if (InnerLoop->contains(BB))
    continue;
  if (BB == InnerExit)
    continue;

  for (PHINode &Phi : BB->phis())
    if (Phi.getNumIncomingValues() != 1)
      return false;
}

I'm not entirely sure if we need to check all BBs. However, such special cases would not be common in practice, so I agree that it's reasonable to just bail out early rather than trying to apply loop-interchange.

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sjoerdmeijer 👍 This is what I meant. Can add assertion in moveLCSSAPhis ensuring that all those BBs are really empty.

@kasuga-fj We cannot just bail out in moveLCSSAPhis, when some changes have aleady been applied. If we check beforehand, we can also just remove the redundant PHI.

LGTM, modulo the null dereference, future DA might not like that. Also consider (remaining) remarks from @kasuga-fj before landing.

InnerLatch->replacePhiUsesWith(InnerLatch, OuterLatch);
}

// This deals with a corner case when a LCSSA phi node appears in a non-exit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This deals with a corner case when a LCSSA phi node appears in a non-exit
/// This deals with a corner case when a LCSSA phi node appears in a non-exit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems to only explain the problem, but not how to address it. It may be better to describe what this function performs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem description is important, because it is not obvious and very subtle, so is worth explaining. I don't think one would get all of that from reading the implementation a small helper function. The implementation, this helper function, describes what this function performs, also with a few comments. But don't get me wrong, I don't disagree with your statement, so will add a sentence to these comments to make the solution explicit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with that describing the problem is important. My suggestion wasn't to remove it, but to add a comment explaining what this function does. So, the current state aligns with what I intended.

Comment on lines 54 to 55
%arrayidx.us.us = getelementptr i16, ptr null, i16 %j
%0 = load i16, ptr %arrayidx.us.us, align 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UB in tests are unfortunately common, but LoopInterchange does not care. Could replace null in getelementptr with a funcation argument or a global

Copy link
Contributor

@kasuga-fj kasuga-fj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kasuga-fj We cannot just bail out in moveLCSSAPhis, when some changes have aleady been applied. If we check beforehand, we can also just remove the redundant PHI.

I was imagining an implementation that adds an additional check (or replaces redundant PHIs) to tightlyNested, since you also mentioned a PHI which refers to itself. That said, the current approach also seems reasonable.

InnerLatch->replacePhiUsesWith(InnerLatch, OuterLatch);
}

// This deals with a corner case when a LCSSA phi node appears in a non-exit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems to only explain the problem, but not how to address it. It may be better to describe what this function performs.

Comment on lines +1930 to +1932

simplifyLCSSAPhis(OuterLoop, InnerLoop);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is this the right place to call this function? I think it might be better to call this from moveLCSSAPhis directly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also wondering, and I have played with the place where to put this. But between this place here and moveLCSSAPhis, the CFG is massively rewired and sometimes in an funny state (i.e. it is under construction). I thought about doing this relatively early when the CFG is relatively stable before we start completely turning things around and adding and splitting things, and moving things around, which is here. Thus, I think it is fine here, and overall doesn't matter that much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought moveLCSSAPhis is calling adjustLoopBranches, but it's actually the other way around. Never mind.

Copy link
Collaborator Author

@sjoerdmeijer sjoerdmeijer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was imagining an implementation that adds an additional check (or replaces redundant PHIs) to tightlyNested, since you also mentioned a PHI which refers to itself. That said, the current approach also seems reasonable.

I think with the code snippet and check you suggested earlier, adding that to thightlyNested, this will result in a regression. I.e., I think it would result in the loop of the added test case being classified as not tightly nested. I thus think the current solution is fine, but we can revise this later should that be necessary.

Thanks a lot @kasuga-fj and @Meinersbur for reviewing!
I saw a LGTM earlier. I have addressed the minor comments: removed the undefined behaviour from the test, changed the comments slightly, and replied to some queries. I will let this sit for a day, and merge this tomorrow if there are not further objections.

InnerLatch->replacePhiUsesWith(InnerLatch, OuterLatch);
}

// This deals with a corner case when a LCSSA phi node appears in a non-exit
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem description is important, because it is not obvious and very subtle, so is worth explaining. I don't think one would get all of that from reading the implementation a small helper function. The implementation, this helper function, describes what this function performs, also with a few comments. But don't get me wrong, I don't disagree with your statement, so will add a sentence to these comments to make the solution explicit.

Comment on lines +1930 to +1932

simplifyLCSSAPhis(OuterLoop, InnerLoop);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also wondering, and I have played with the place where to put this. But between this place here and moveLCSSAPhis, the CFG is massively rewired and sometimes in an funny state (i.e. it is under construction). I thought about doing this relatively early when the CFG is relatively stable before we start completely turning things around and adding and splitting things, and moving things around, which is here. Thus, I think it is fine here, and overall doesn't matter that much.

%j = phi i16 [ 0, %BB1 ], [ %j.next, %BB3 ]
%arrayidx.us.us = getelementptr i16, ptr null, i16 %j
%0 = load i16, ptr %arrayidx.us.us, align 1
%cond = select i1 false, i16 0, i16 0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, interchange doesn't care, it's the reproducer from the bug report, and it's short.
But yeah, it's no effort to read from a function argument, so will do that and change this accordingly.

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider fixing using clang-format and adding an assert into moveLCSSAPhis whether all BBs from LcssaInnerExit to OuterLatch are really empty.

Copy link
Contributor

@kasuga-fj kasuga-fj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +1930 to +1932

simplifyLCSSAPhis(OuterLoop, InnerLoop);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought moveLCSSAPhis is calling adjustLoopBranches, but it's actually the other way around. Never mind.

InnerLatch->replacePhiUsesWith(InnerLatch, OuterLatch);
}

// This deals with a corner case when a LCSSA phi node appears in a non-exit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with that describing the problem is important. My suggestion wasn't to remove it, but to add a comment explaining what this function does. So, the current state aligns with what I intended.

@sjoerdmeijer
Copy link
Collaborator Author

I found a problem while testing this. I will follow up here and add a reproducer and fix.

Copy link
Contributor

@kasuga-fj kasuga-fj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to update the PR title.

@sjoerdmeijer
Copy link
Collaborator Author

False alarm. I did find an issue, but it is an unrelated new one, #163954, which I will fix next. But I will land this Monday, so can I deal better with fall out if there's any.

@sjoerdmeijer sjoerdmeijer changed the title [LoopInterchange] Also look at lcssa phis in outer loop latch block [LoopInterchange] Add simplifyLCSSAPhis: remove phi from non-exit bb Oct 17, 2025
@sjoerdmeijer sjoerdmeijer merged commit b90a8d3 into llvm:main Oct 20, 2025
9 of 10 checks passed
@sjoerdmeijer sjoerdmeijer deleted the interchange-lcssa-phis branch October 20, 2025 09:23
@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 20, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-aarch64-darwin running on doug-worker-4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/190/builds/29348

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'Clang :: Driver/debug-options.c' FAILED ********************
Exit Code: 127

Command Output (stdout):
--
# RUN: at line 4
/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/clang -### -c -g /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c -target x86_64-linux-gnu 2>&1              | /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=G_LIMITED -check-prefix=G_GDB /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/clang '-###' -c -g /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c -target x86_64-linux-gnu
# note: command had no output on stdout or stderr
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=G_LIMITED -check-prefix=G_GDB /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c
# note: command had no output on stdout or stderr
# RUN: at line 6
/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/clang -### -c -g2 /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c -target x86_64-linux-gnu 2>&1              | /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=G_LIMITED -check-prefix=G_GDB /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/clang '-###' -c -g2 /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c -target x86_64-linux-gnu
# note: command had no output on stdout or stderr
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=G_LIMITED -check-prefix=G_GDB /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c
# note: command had no output on stdout or stderr
# RUN: at line 8
/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/clang -### -c -g3 /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c -target x86_64-linux-gnu 2>&1              | /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=G_LIMITED -check-prefix=G_GDB /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/clang '-###' -c -g3 /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c -target x86_64-linux-gnu
# note: command had no output on stdout or stderr
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=G_LIMITED -check-prefix=G_GDB /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c
# note: command had no output on stdout or stderr
# RUN: at line 10
/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/clang -### -c -ggdb /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c -target x86_64-linux-gnu 2>&1              | /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=G_LIMITED -check-prefix=G_GDB /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/clang '-###' -c -ggdb /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c -target x86_64-linux-gnu
# note: command had no output on stdout or stderr
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=G_LIMITED -check-prefix=G_GDB /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c
# note: command had no output on stdout or stderr
# RUN: at line 12
/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/clang -### -c -ggdb1 /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c -target x86_64-linux-gnu 2>&1              | /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=GLTO_ONLY -check-prefix=G_GDB /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/clang '-###' -c -ggdb1 /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c -target x86_64-linux-gnu
# note: command had no output on stdout or stderr
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=GLTO_ONLY -check-prefix=G_GDB /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c
# note: command had no output on stdout or stderr
# RUN: at line 14
/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/clang -### -c -ggdb3 /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c -target x86_64-linux-gnu 2>&1              | /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=G_LIMITED -check-prefix=G_GDB /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/clang '-###' -c -ggdb3 /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c -target x86_64-linux-gnu
# note: command had no output on stdout or stderr
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=G_LIMITED -check-prefix=G_GDB /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c
# note: command had no output on stdout or stderr
# RUN: at line 16
/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/clang -### -c -glldb /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c -target x86_64-linux-gnu 2>&1              | /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=G_STANDALONE -check-prefix=G_LLDB /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/clang '-###' -c -glldb /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c -target x86_64-linux-gnu
# note: command had no output on stdout or stderr
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=G_STANDALONE -check-prefix=G_LLDB /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c
# note: command had no output on stdout or stderr
# RUN: at line 18
/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/clang -### -c -gsce /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c -target x86_64-linux-gnu 2>&1              | /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=G_LIMITED -check-prefix=G_SCE /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/clang '-###' -c -gsce /Users/buildbot/buildbot-root/llvm-project/clang/test/Driver/debug-options.c -target x86_64-linux-gnu
...

Lukacma pushed a commit to Lukacma/llvm-project that referenced this pull request Oct 29, 2025
…lvm#160889)

This deals with a corner case of LCSSA phi nodes in the outer loop latch
block: the loop was in LCSSA form, some transformations can come along
(e.g. unswitch) and create an empty block:

     BB4:
       br label %BB5
     BB5:
       %old.cond.lcssa = phi i16 [ %cond, %BB4 ]
       br outer.header

Interchange then brings it in LCSSA form again and we get:

     BB4:
       %new.cond.lcssa = phi i16 [ %cond, %BB3 ]
       br label %BB5
     BB5:
       %old.cond.lcssa = phi i16 [ %new.cond.lcssa, %BB4 ]

Which means that we have a chain of LCSSA phi nodes from %new.cond.lcssa
to %old.cond.lcssa. The problem is that interchange can reoder blocks
BB4 and BB5 placing the use before the def if we don't check this. The
solution is to simplify lcssa phis, and remove them from non-exit blocks
if they are 1-input phi nodes.

Fixes llvm#160068
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025
…lvm#160889)

This deals with a corner case of LCSSA phi nodes in the outer loop latch
block: the loop was in LCSSA form, some transformations can come along
(e.g. unswitch) and create an empty block:

     BB4:
       br label %BB5
     BB5:
       %old.cond.lcssa = phi i16 [ %cond, %BB4 ]
       br outer.header

Interchange then brings it in LCSSA form again and we get:

     BB4:
       %new.cond.lcssa = phi i16 [ %cond, %BB3 ]
       br label %BB5
     BB5:
       %old.cond.lcssa = phi i16 [ %new.cond.lcssa, %BB4 ]

Which means that we have a chain of LCSSA phi nodes from %new.cond.lcssa
to %old.cond.lcssa. The problem is that interchange can reoder blocks
BB4 and BB5 placing the use before the def if we don't check this. The
solution is to simplify lcssa phis, and remove them from non-exit blocks
if they are 1-input phi nodes.

Fixes llvm#160068
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

opt -passes=loop-interchange fails with "Instruction does not dominate all uses!"

5 participants