Skip to content

Conversation

CongzheUalberta
Copy link
Contributor

@CongzheUalberta CongzheUalberta commented Oct 14, 2025

Enhanced PHI CSE to eliminate redundant PHIs, which could clean up the IR and open up opportunities for other passes such as loop vectorization.

Motivation:

Given the following range() function,

void range(float *q, float *c, float &vl, float &vr)
{
    vl = +1e20;
    vr = -1e20;    
    for (int i = 0; i < 128; i++) {
        float tmp = (*q) - (*c);
        if (tmp < vl)
            vl = tmp;
        if (tmp > vr)
            vr = tmp;
        q++;
        c++;
    }
    return;
}

The IR that is right before loop vectorization is shown below. Here range() is inlined into its caller and becomes the single BB for.body.

for.body:                                    ; preds = %entry, %for.body
  %v0 = phi float [ 0x4415AF1D80000000, %entry ], [ %v0.1, %for.body ]
  %v1 = phi float [ 0xC415AF1D80000000, %entry ], [ %v1.1, %for.body ]
  %phi.to.remove = phi float [ 0xC415AF1D80000000, %entry ], [ %phi.to.remove.next, %for.body ]  (<= redundant, needs clean-up)
  %i = phi i32 [ 0, %entry ], [ %inc.i, %for.body ]
  %q = phi ptr [ %m, %entry ], [ %q.next, %for.body ]
  %c = phi ptr [ %n, %entry ], [ %c.next, %for.body ]
  %q.load = load float, ptr %q
  %c.load = load float, ptr %c
  %sub = fsub float %q.load, %c.load
  %cmp1 = fcmp olt float %sub, %v0
  %v0.1 = select i1 %cmp1, float %sub, float %v0
  %same.as.v1 = select i1 %cmp1, float %v1, float %phi.to.remove  (<= redundant, needs clean-up)
  %cmp2 = fcmp ogt float  %sub, %same.as.v1
  %v1.1 = select i1 %cmp2, float %sub, float %v1
  %phi.to.remove.next = select i1 %cmp2, float %sub, float %same.as.v1  (<= redundant, needs clean-up)
  %inc.i = add nuw nsw i32 %i, 1
  %q.next = getelementptr inbounds float, ptr %q, i64 1
  %c.next = getelementptr inbounds float, ptr %c, i64 1
  %exitcond = icmp eq i32 %inc.i, %count
  br i1 %exitcond, label %exit, label %for.body

llvm trunk is not able to vectorize it because there are redundant phi (%phi.to.remove) and redundant select instructions (%phi.to.remove.next, %same.as.v1).

Those redundant instructions just act exactly the same as %v1, hence they are purely redundant and should be eliminated.
This patch identifies the redundant phi and eliminates it, as a result the loop could get vectorized and it could improve one of our internal workloads by 10%.


How the redundant phi was generated:

It was initially introduced by GVN that did load-in-loop-pre, which partially eliminated the load of %v1 and introduced in one of its predecessors this load %.pre = load float, ptr %v1. %.pre eventually became the redundant phi that was not cleaned up.

Compiler explorer: https://godbolt.org/z/f4ncn3Kjo
Please refer to the IR before vectorization on main(), and IR before and after GVN on range().

@CongzheUalberta CongzheUalberta marked this pull request as ready for review October 14, 2025 22:10
@llvmbot llvmbot added llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Oct 14, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 14, 2025

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: Congzhe (CongzheUalberta)

Changes

Enhanced PHI CSE to eliminate redundant PHIs, which could clean up the IR and open up opportunities for other passes such as loop vectorization.

Motivation:

Given the following range() function,

void range(float *q, float *c, float &amp;vl, float &amp;vr)
{
    vl = +1e20;
    vr = -1e20;    
    for (int i = 0; i &lt; 128; i++) {
        float tmp = (*q) - (*c);
        if (tmp &lt; vl)
            vl = tmp;
        if (tmp &gt; vr)
            vr = tmp;
        q++;
        c++;
    }
    return;
}

The IR that is right before loop vectorization is shown below. Here range() is inlined into its caller and becomes the single BB for.body.

for.body:                                    ; preds = %entry, %for.body
  %v0 = phi float [ 0x4415AF1D80000000, %entry ], [ %v0.1, %for.body ]
  %v1 = phi float [ 0xC415AF1D80000000, %entry ], [ %v1.1, %for.body ]
  %phi.to.remove = phi float [ 0xC415AF1D80000000, %entry ], [ %phi.to.remove.next, %for.body ]  (&lt;= redundant, needs clean-up)
  %i = phi i32 [ 0, %entry ], [ %inc.i, %for.body ]
  %q = phi ptr [ %m, %entry ], [ %q.next, %for.body ]
  %c = phi ptr [ %n, %entry ], [ %c.next, %for.body ]
  %q.load = load float, ptr %q
  %c.load = load float, ptr %c
  %sub = fsub float %q.load, %c.load
  %cmp1 = fcmp olt float %sub, %v0
  %v0.1 = select i1 %cmp1, float %sub, float %v0
  %same.as.v1 = select i1 %cmp1, float %v1, float %phi.to.remove  (&lt;= redundant, needs clean-up)
  %cmp2 = fcmp ogt float  %sub, %same.as.v1
  %v1.1 = select i1 %cmp2, float %sub, float %v1
  %phi.to.remove.next = select i1 %cmp2, float %sub, float %same.as.v1  (&lt;= redundant, needs clean-up)
  %inc.i = add nuw nsw i32 %i, 1
  %q.next = getelementptr inbounds float, ptr %q, i64 1
  %c.next = getelementptr inbounds float, ptr %c, i64 1
  %exitcond = icmp eq i32 %inc.i, %count
  br i1 %exitcond, label %exit, label %for.body

llvm trunk is not able to vectorize it because there are redundant phi (%phi.to.remove) and redundant select instructions (%phi.to.remove.next, %same.as.v1).

Those redundant instructions just act exactly the same as %v1, hence they are purely redundant and should be eliminated.
This patch identifies the redundant phi and eliminates it, as a result the loop could get vectorized and performance could get improved.

How the redundant phi was generated:

It was initially introduced by GVN that did load-in-loop-pre, which partially eliminated the load of %v1 and introduced in one of its predecessors this load %.pre = load float, ptr %v1. %.pre eventually became the redundant phi that was not cleaned up.

Compiler explorer: https://godbolt.org/z/f4ncn3Kjo
Please refer to the IR before vectorization on main(), and IR before and after GVN on range().


Full diff: https://github.com/llvm/llvm-project/pull/163453.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp (+84-5)
  • (added) llvm/test/Transforms/InstCombine/enhanced-phi-cse.ll (+61)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp b/llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
index 9815644f5f43d..e736e89a3a146 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
@@ -1621,11 +1621,90 @@ Instruction *InstCombinerImpl::visitPHINode(PHINode &PN) {
     // Note that even though we've just canonicalized this PHI, due to the
     // worklist visitation order, there are no guarantess that *every* PHI
     // has been canonicalized, so we can't just compare operands ranges.
-    if (!PN.isIdenticalToWhenDefined(&IdenticalPN))
-      continue;
-    // Just use that PHI instead then.
-    ++NumPHICSEs;
-    return replaceInstUsesWith(PN, &IdenticalPN);
+    if (PN.isIdenticalToWhenDefined(&IdenticalPN)) {
+      // Just use that PHI instead then.
+      ++NumPHICSEs;
+      return replaceInstUsesWith(PN, &IdenticalPN);
+    }
+
+    // Look for the following pattern and do PHI CSE to clean up the
+    // redundant %phi. Here %phi, %1 and %phi.next perform the same
+    // functionality as %identicalPhi and hence %phi can be eliminated.
+    //
+    // BB1:
+    //   %identicalPhi = phi [ X, %BB0 ], [ %identicalPhi.next, %BB1 ]
+    //   %phi = phi [ X, %BB0 ], [ %phi.next, %BB1 ]
+    //   ...
+    //   %identicalPhi.next = select %cmp, %val, %identicalPhi
+    //   %1 = select %cmp2, %identicalPhi, float %phi
+    //   %phi.next = select %cmp, %val, %1
+    //
+    // Prove that %phi and %identicalPhi are the same by induction:
+    //
+    // Base case: Both %phi and %identicalPhi are equal on entry to the loop.
+    // Inductive case:
+    // Suppose %phi and %identicalPhi are equal at iteration i.
+    // We look at their values at iteration i+1 which are %phi.next and
+    // %identicalPhi.next. They would have become different only when %cmp is
+    // false and the corresponding values %1 and %identicalPhi differ.
+    //
+    // The only condition when %1 and %identicalPh could differ is when %cmp2
+    // is false and %1 is %phi, which contradicts our inductive hypothesis
+    // that %phi and %identicalPhi are equal. Thus %phi and %identicalPhi are
+    // always equal at iteration i+1.
+
+    if (PN.getNumIncomingValues() == 2 && PN.getNumUses() == 1) {
+      unsigned diffVals = 0;
+      unsigned diffValIdx = 0;
+      // Check that only the backedge incoming value is different.
+      for (unsigned i = 0; i < 2; i++) {
+        if (PN.getIncomingValue(i) != IdenticalPN.getIncomingValue(i)) {
+          diffVals++;
+          diffValIdx = i;
+        }
+      }
+      BasicBlock *CurBB = PN.getParent();
+      if (diffVals == 2 || PN.getIncomingBlock(diffValIdx) != CurBB)
+        continue;
+      // Now check that the backedge incoming values are two select
+      // instructions that are in the same BB, and have the same condition,
+      // true value.
+      auto *Val = PN.getIncomingValue(diffValIdx);
+      auto *IdenticalVal = IdenticalPN.getIncomingValue(diffValIdx);
+      if (!isa<SelectInst>(Val) || !isa<SelectInst>(IdenticalVal))
+        continue;
+
+      auto *SI = cast<SelectInst>(Val);
+      auto *IdenticalSI = cast<SelectInst>(IdenticalVal);
+      if (SI->getParent() != CurBB || IdenticalSI->getParent() != CurBB)
+        continue;
+      if (SI->getCondition() != IdenticalSI->getCondition() ||
+          SI->getTrueValue() != IdenticalSI->getTrueValue())
+        continue;
+
+      // Now check that the false values, i.e., %1 and %identicalPhi,
+      // are essentially the same value within the same BB.
+      auto SameSelAndPhi = [&](SelectInst *SI, PHINode *IdenticalPN,
+                               PHINode *PN) {
+        if (SI->getTrueValue() == IdenticalPN) {
+          return SI->getFalseValue() == PN;
+        }
+        return false;
+      };
+      auto *FalseVal = SI->getFalseValue();
+      auto *IdenticalSIFalseVal =
+          dyn_cast<PHINode>(IdenticalSI->getFalseValue());
+      if (!isa<SelectInst>(FalseVal) || !IdenticalSIFalseVal ||
+          IdenticalSIFalseVal != &IdenticalPN)
+        continue;
+      auto *FalseValSI = cast<SelectInst>(FalseVal);
+      if (FalseValSI->getParent() != CurBB ||
+          !SameSelAndPhi(FalseValSI, &IdenticalPN, &PN))
+        continue;
+
+      ++NumPHICSEs;
+      return replaceInstUsesWith(PN, &IdenticalPN);
+    }
   }
 
   // If this is an integer PHI and we know that it has an illegal type, see if
diff --git a/llvm/test/Transforms/InstCombine/enhanced-phi-cse.ll b/llvm/test/Transforms/InstCombine/enhanced-phi-cse.ll
new file mode 100644
index 0000000000000..ae589b7450465
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/enhanced-phi-cse.ll
@@ -0,0 +1,61 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -S -passes=instcombine -instcombine-enhanced-phi-cse=true | FileCheck %s
+@A = extern_weak global float, align 4
+
+; %phi.to.remove acts the same as %v1, and can be eliminated with PHI CSE.
+define void @enhanced_phi_cse(ptr %m, ptr %n, i32 %count) {
+; CHECK-LABEL: @enhanced_phi_cse(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
+; CHECK:       for.body:
+; CHECK-NEXT:    [[V0:%.*]] = phi float [ 0x4415AF1D80000000, [[ENTRY:%.*]] ], [ [[V0_1:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[V1:%.*]] = phi float [ 0xC415AF1D80000000, [[ENTRY]] ], [ [[V1_1:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[I:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[INC_I:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[Q:%.*]] = phi ptr [ [[M:%.*]], [[ENTRY]] ], [ [[Q_NEXT:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[C:%.*]] = phi ptr [ [[N:%.*]], [[ENTRY]] ], [ [[C_NEXT:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[Q_LOAD:%.*]] = load float, ptr [[Q]], align 4
+; CHECK-NEXT:    [[C_LOAD:%.*]] = load float, ptr [[C]], align 4
+; CHECK-NEXT:    [[SUB:%.*]] = fsub float [[Q_LOAD]], [[C_LOAD]]
+; CHECK-NEXT:    [[CMP1:%.*]] = fcmp olt float [[SUB]], [[V0]]
+; CHECK-NEXT:    [[V0_1]] = select i1 [[CMP1]], float [[SUB]], float [[V0]]
+; CHECK-NEXT:    [[CMP2:%.*]] = fcmp ogt float [[SUB]], [[V1]]
+; CHECK-NEXT:    [[V1_1]] = select i1 [[CMP2]], float [[SUB]], float [[V1]]
+; CHECK-NEXT:    [[INC_I]] = add nuw nsw i32 [[I]], 1
+; CHECK-NEXT:    [[Q_NEXT]] = getelementptr inbounds float, ptr [[Q]], i64 1
+; CHECK-NEXT:    [[C_NEXT]] = getelementptr inbounds float, ptr [[C]], i64 1
+; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp eq i32 [[INC_I]], [[COUNT:%.*]]
+; CHECK-NEXT:    br i1 [[EXITCOND]], label [[EXIT:%.*]], label [[FOR_BODY]]
+; CHECK:       exit:
+; CHECK-NEXT:    store float [[V1_1]], ptr @A, align 4
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %for.body
+
+for.body:                                    ; preds = %entry, %for.body
+  %v0 = phi float [ 0x4415AF1D80000000, %entry ], [ %v0.1, %for.body ]
+  %v1 = phi float [ 0xC415AF1D80000000, %entry ], [ %v1.1, %for.body ]
+  %phi.to.remove = phi float [ 0xC415AF1D80000000, %entry ], [ %phi.to.remove.next, %for.body ]
+  %i = phi i32 [ 0, %entry ], [ %inc.i, %for.body ]
+  %q = phi ptr [ %m, %entry ], [ %q.next, %for.body ]
+  %c = phi ptr [ %n, %entry ], [ %c.next, %for.body ]
+  %q.load = load float, ptr %q
+  %c.load = load float, ptr %c
+  %sub = fsub float %q.load, %c.load
+  %cmp1 = fcmp olt float %sub, %v0
+  %v0.1 = select i1 %cmp1, float %sub, float %v0
+  %same.as.v1 = select i1 %cmp1, float %v1, float %phi.to.remove
+  %cmp2 = fcmp ogt float  %sub, %same.as.v1
+  %v1.1 = select i1 %cmp2, float %sub, float %v1
+  %phi.to.remove.next = select i1 %cmp2, float %sub, float %same.as.v1
+  %inc.i = add nuw nsw i32 %i, 1
+  %q.next = getelementptr inbounds float, ptr %q, i64 1
+  %c.next = getelementptr inbounds float, ptr %c, i64 1
+  %exitcond = icmp eq i32 %inc.i, %count
+  br i1 %exitcond, label %exit, label %for.body
+
+exit:
+  %vl.1.lcssa = phi float [ %v1.1, %for.body ]
+  store float %vl.1.lcssa, ptr @A
+  ret void
+}

Enhanced PHI CSE to eliminate redundant PHIs, which could clean up
the IR and open up opportunities for other passes such as loop
vectorization.
Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect that this fold should be implemented in simplifySelectInst and simplify %1 = select %cmp2, %identicalPhi, %phi -> %identicalPhi. It is not a strong requirement, so it's up to you.

// that %phi and %identicalPhi are equal. Thus %phi and %identicalPhi are
// always equal at iteration i+1.

if (PN.getNumIncomingValues() == 2 && PN.getNumUses() == 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for the one-use checking here?

Copy link
Contributor Author

@CongzheUalberta CongzheUalberta Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it is supposed to check for the target pattern that is the redundant cycle of phi (%phi.to.remove) and two select instructions (%phi.to.remove.next, %same.as.v1). Here and for the rest of my replies I'm using the notion of those variables (%phi.to.remove, %phi.to.remove.next, %same.as.v1) from the motivative IR in the description which is the same as enhanced_phi_cse() in the test file.

I've now improved the code and I'm checking SI->getNumUses() == 1 on line 1866 instead, to make sure that %phi.to.remove.next (which had been checked to be a select) has only one use that is the backedge incoming value of %phi.to.remove. Hope it is more readable now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said before, we can do the same fold by simplifying %1 = select %cmp2, %identicalPhi, %phi -> %identicalPhi. Since we never check the uses of instructions in InstSimplify, I don't think it is necessary in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about it more and I do agree with you that the one-use checking is not necessary in this case. I've now deleted the one-use checking. Thanks for pointing it out!

}
}
BasicBlock *CurBB = PN.getParent();
if (diffVals == 2 || PN.getIncomingBlock(diffValIdx) != CurBB)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header and latch block can be different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this patch I limited it to the case where the loop is a single BB, hence redundant cycle of phi and selects are in the same BB and control flow equivalent. With multiple BB and control flow divergence, it would become more complex and not very straightforward to prove that they are purely redundant. Therefore I'd like to make it work for the single BB case, and then I could possibly extend it to multi-BB scenarios. Does it sound reasonable to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please provide a counterexample to demonstrate that removing this constraint may lead to a crash or miscompilation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought more about it and I think you are right. I could not come up with a counterexample that would break my code when this constraint was removed - it would work correctly without the single BB constraint. I've now removed the constraint and simplified the code. Thanks for this comment.

@dtcxzyw
Copy link
Member

dtcxzyw commented Oct 17, 2025

Crash reproducer:

; bin/opt -passes=instcombine test.ll -S
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

define fastcc i32 @vduse_queue_check_inflights() {
._crit_edge:
  br label %.lr.ph54

.lr.ph54:
  %0 = phi ptr [ null, %._crit_edge ], [ %3, %.lr.ph54 ]
  %1 = phi ptr [ %3, %.lr.ph54 ], [ null, %._crit_edge ]
  %2 = load i8, ptr %1, align 8
  %.not51 = icmp eq i8 %2, 0
  %3 = select i1 %.not51, ptr %0, ptr null
  br label %.lr.ph54
}
opt: /home/dtcxzyw/WorkSpace/Projects/compilers/llvm-project/llvm/include/llvm/IR/Instructions.h:2817: llvm::Value* llvm::PHINode::getIncomingValueForBlock(const llvm::BasicBlock*) const: Assertion `Idx >= 0 && "Invalid basic block argument!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug.
Stack dump:
0.      Program arguments: bin/opt -passes=instcombine reduced.ll -S
1.      Running pass "function(instcombine<max-iterations=1;verify-fixpoint>)" on module "reduced.ll"
2.      Running pass "instcombine<max-iterations=1;verify-fixpoint>" on function "vduse_queue_check_inflights"
 #0 0x000071702d854192 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/libLLVMSupport.so.22.0git+0x254192)
 #1 0x000071702d85075f llvm::sys::RunSignalHandlers() (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/libLLVMSupport.so.22.0git+0x25075f)
 #2 0x000071702d8508ac SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x000071702d245330 (/lib/x86_64-linux-gnu/libc.so.6+0x45330)
 #4 0x000071702d29eb2c __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x000071702d29eb2c __pthread_kill_internal ./nptl/pthread_kill.c:78:10
 #6 0x000071702d29eb2c pthread_kill ./nptl/pthread_kill.c:89:10
 #7 0x000071702d24527e raise ./signal/../sysdeps/posix/raise.c:27:6
 #8 0x000071702d2288ff abort ./stdlib/abort.c:81:7
 #9 0x000071702d22881b _nl_load_domain ./intl/loadmsgcat.c:1177:9
#10 0x000071702d23b517 (/lib/x86_64-linux-gnu/libc.so.6+0x3b517)
#11 0x0000717027c38c4a llvm::PHINode::getIncomingValueForBlock(llvm::BasicBlock const*) const (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMInstCombine.so.22.0git+0x38c4a)
#12 0x0000717027d5bfa9 llvm::InstCombinerImpl::visitPHINode(llvm::PHINode&) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMInstCombine.so.22.0git+0x15bfa9)
#13 0x0000717027c68ff8 llvm::InstCombinerImpl::run() (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMInstCombine.so.22.0git+0x68ff8)
#14 0x0000717027c6a1c2 combineInstructionsOverFunction(llvm::Function&, llvm::InstructionWorklist&, llvm::AAResults*, llvm::AssumptionCache&, llvm::TargetLibraryInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::OptimizationRemarkEmitter&, llvm::BlockFrequencyInfo*, llvm::BranchProbabilityInfo*, llvm::ProfileSummaryInfo*, llvm::InstCombineOptions const&) InstructionCombining.cpp:0:0
#15 0x0000717027c6b204 llvm::InstCombinePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMInstCombine.so.22.0git+0x6b204)
#16 0x000071702a3a8ee5 llvm::detail::PassModel<llvm::Function, llvm::InstCombinePass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libPolly.so.22.0git+0x1a8ee5)
#17 0x0000717026b2c299 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMCore.so.22.0git+0x32c299)
#18 0x000071702cadd335 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMX86CodeGen.so.22.0git+0xdd335)
#19 0x0000717026b2a722 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMCore.so.22.0git+0x32a722)
#20 0x000071702dad5285 llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/libLLVMOptDriver.so.22.0git+0x20285)
#21 0x0000717026b2af4d llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMCore.so.22.0git+0x32af4d)
#22 0x000071702dae2261 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool, bool) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/libLLVMOptDriver.so.22.0git+0x2d261)
#23 0x000071702daed436 optMain (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/libLLVMOptDriver.so.22.0git+0x38436)
#24 0x000071702d22a1ca __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#25 0x000071702d22a28b call_init ./csu/../csu/libc-start.c:128:20
#26 0x000071702d22a28b __libc_start_main ./csu/../csu/libc-start.c:347:5
#27 0x0000622cb7c80095 _start (bin/opt+0x1095)
Aborted (core dumped)

@CongzheUalberta
Copy link
Contributor Author

Crash reproducer:

; bin/opt -passes=instcombine test.ll -S
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

define fastcc i32 @vduse_queue_check_inflights() {
._crit_edge:
  br label %.lr.ph54

.lr.ph54:
  %0 = phi ptr [ null, %._crit_edge ], [ %3, %.lr.ph54 ]
  %1 = phi ptr [ %3, %.lr.ph54 ], [ null, %._crit_edge ]
  %2 = load i8, ptr %1, align 8
  %.not51 = icmp eq i8 %2, 0
  %3 = select i1 %.not51, ptr %0, ptr null
  br label %.lr.ph54
}
opt: /home/dtcxzyw/WorkSpace/Projects/compilers/llvm-project/llvm/include/llvm/IR/Instructions.h:2817: llvm::Value* llvm::PHINode::getIncomingValueForBlock(const llvm::BasicBlock*) const: Assertion `Idx >= 0 && "Invalid basic block argument!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug.
Stack dump:
0.      Program arguments: bin/opt -passes=instcombine reduced.ll -S
1.      Running pass "function(instcombine<max-iterations=1;verify-fixpoint>)" on module "reduced.ll"
2.      Running pass "instcombine<max-iterations=1;verify-fixpoint>" on function "vduse_queue_check_inflights"
 #0 0x000071702d854192 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/libLLVMSupport.so.22.0git+0x254192)
 #1 0x000071702d85075f llvm::sys::RunSignalHandlers() (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/libLLVMSupport.so.22.0git+0x25075f)
 #2 0x000071702d8508ac SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x000071702d245330 (/lib/x86_64-linux-gnu/libc.so.6+0x45330)
 #4 0x000071702d29eb2c __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x000071702d29eb2c __pthread_kill_internal ./nptl/pthread_kill.c:78:10
 #6 0x000071702d29eb2c pthread_kill ./nptl/pthread_kill.c:89:10
 #7 0x000071702d24527e raise ./signal/../sysdeps/posix/raise.c:27:6
 #8 0x000071702d2288ff abort ./stdlib/abort.c:81:7
 #9 0x000071702d22881b _nl_load_domain ./intl/loadmsgcat.c:1177:9
#10 0x000071702d23b517 (/lib/x86_64-linux-gnu/libc.so.6+0x3b517)
#11 0x0000717027c38c4a llvm::PHINode::getIncomingValueForBlock(llvm::BasicBlock const*) const (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMInstCombine.so.22.0git+0x38c4a)
#12 0x0000717027d5bfa9 llvm::InstCombinerImpl::visitPHINode(llvm::PHINode&) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMInstCombine.so.22.0git+0x15bfa9)
#13 0x0000717027c68ff8 llvm::InstCombinerImpl::run() (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMInstCombine.so.22.0git+0x68ff8)
#14 0x0000717027c6a1c2 combineInstructionsOverFunction(llvm::Function&, llvm::InstructionWorklist&, llvm::AAResults*, llvm::AssumptionCache&, llvm::TargetLibraryInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::OptimizationRemarkEmitter&, llvm::BlockFrequencyInfo*, llvm::BranchProbabilityInfo*, llvm::ProfileSummaryInfo*, llvm::InstCombineOptions const&) InstructionCombining.cpp:0:0
#15 0x0000717027c6b204 llvm::InstCombinePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMInstCombine.so.22.0git+0x6b204)
#16 0x000071702a3a8ee5 llvm::detail::PassModel<llvm::Function, llvm::InstCombinePass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libPolly.so.22.0git+0x1a8ee5)
#17 0x0000717026b2c299 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMCore.so.22.0git+0x32c299)
#18 0x000071702cadd335 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMX86CodeGen.so.22.0git+0xdd335)
#19 0x0000717026b2a722 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMCore.so.22.0git+0x32a722)
#20 0x000071702dad5285 llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/libLLVMOptDriver.so.22.0git+0x20285)
#21 0x0000717026b2af4d llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMCore.so.22.0git+0x32af4d)
#22 0x000071702dae2261 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool, bool) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/libLLVMOptDriver.so.22.0git+0x2d261)
#23 0x000071702daed436 optMain (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/libLLVMOptDriver.so.22.0git+0x38436)
#24 0x000071702d22a1ca __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#25 0x000071702d22a28b call_init ./csu/../csu/libc-start.c:128:20
#26 0x000071702d22a28b __libc_start_main ./csu/../csu/libc-start.c:347:5
#27 0x0000622cb7c80095 _start (bin/opt+0x1095)
Aborted (core dumped)

Thanks for the case.

I've also checked other failures in the last CI, they occurred because with %0 = phi ptr [ null, %._crit_edge ], [ %3, %.lr.ph54 ] and %1 = phi ptr [ %3, %.lr.ph54 ], [ null, %._crit_edge ], DiffVals actually is 0 and this case slipped into my code since I did not explicitly have if (DiffVals == 0) continue.

I've fixed it by using if (DiffVals != 1) continue because the target pattern of this patch is explicitly that one of the two incoming values of phis differ. CI has passed, and I've also checked that your example IR works fine.

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find that it is more natural to handle this pattern in simplifySelectInst:

bool isSimplifierIdenticalPHI(PHINode &PHI, PHINode &IdenticalPHI) {
  if (PHI.getParent() != IdenticalPHI.getParent())
    return false;
  // Check incoming values
  ...
  // Check next values
  ...
}
Value *simplifySelectInst(Value *Cond, Value *TrueVal, Value *FalseVal,
                                 const SimplifyQuery &Q, unsigned MaxRecurse) {
  ...

  if (auto *TruePHI = dyn_cast<PHINode>(TrueVal)) {
    if (auto *FalsePHI = dyn_cast<PHINode>(FalseVal)) {
    if (isSimplifierIdenticalPHI(*TruePHI, *FalsePHI))
      return FalseVal;
    if (isSimplifierIdenticalPHI(*TruePHI, *FalsePHI))
      return FalseVal;
    if (isSimplifierIdenticalPHI(*FalsePHI, *TruePHI))
      return TrueVal;
  }
  }
  return nullptr;
}

if (!isa<SelectInst>(Val) || !isa<SelectInst>(IdenticalVal))
continue;

auto *SI = cast<SelectInst>(Val);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use dyn_cast instead of isa + cast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've used dyn_cast now.

continue;
if (cast<PHINode>(IdenticalSIOtherVal) != &IdenticalPN)
continue;
auto *SIOtherValAsSel = cast<SelectInst>(SIOtherVal);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use dyn_cast instead of isa + cast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've used dyn_cast now.

};
if (!isa<SelectInst>(SIOtherVal) || !isa<PHINode>(IdenticalSIOtherVal))
continue;
if (cast<PHINode>(IdenticalSIOtherVal) != &IdenticalPN)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (cast<PHINode>(IdenticalSIOtherVal) != &IdenticalPN)
if (IdenticalSIOtherVal != &IdenticalPN)

Cast doesn't change the pointer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed accordingly.

Comment on lines 1703 to 1706
if (SI->getTrueValue() == IdenticalPN) {
return SI->getFalseValue() == PN;
}
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (SI->getTrueValue() == IdenticalPN) {
return SI->getFalseValue() == PN;
}
return false;
return SI->getTrueValue() == IdenticalPN && SI->getFalseValue() == PN;

We don't need a lambda function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the lambda function.

// ...
// %identicalPhi.next = select %cmp, %val, %identicalPhi
// (or select %cmp, %identicalPhi, %val)
// %1 = select %cmp2, %identicalPhi, %phi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please also add a commuted test with %1 = select %cmp2, %phi, %identicalPhi?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Twe more test cases added that reflect the commuted test, i.e., select_with_identical_phi_3() and select_with_identical_phi_4().

@llvmbot llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Oct 20, 2025
@CongzheUalberta
Copy link
Contributor Author

CongzheUalberta commented Oct 20, 2025

I find that it is more natural to handle this pattern in simplifySelectInst:

bool isSimplifierIdenticalPHI(PHINode &PHI, PHINode &IdenticalPHI) {
  if (PHI.getParent() != IdenticalPHI.getParent())
    return false;
  // Check incoming values
  ...
  // Check next values
  ...
}
Value *simplifySelectInst(Value *Cond, Value *TrueVal, Value *FalseVal,
                                 const SimplifyQuery &Q, unsigned MaxRecurse) {
  ...

  if (auto *TruePHI = dyn_cast<PHINode>(TrueVal)) {
    if (auto *FalsePHI = dyn_cast<PHINode>(FalseVal)) {
    if (isSimplifierIdenticalPHI(*TruePHI, *FalsePHI))
      return FalseVal;
    if (isSimplifierIdenticalPHI(*TruePHI, *FalsePHI))
      return FalseVal;
    if (isSimplifierIdenticalPHI(*FalsePHI, *TruePHI))
      return TrueVal;
  }
  }
  return nullptr;
}

I have updated the patch so that the feature is implemented in simplifySelectInst now. Aslo added two more test cases.

// The only condition when %1 and %identicalPh could differ is when %cmp2
// is false and %1 is %phi, which contradicts our inductive hypothesis
// that %phi and %identicalPhi are equal. Thus %phi and %identicalPhi are
// always equal at iteration i+1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use /// for header comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've used /// now.

Comment on lines 4925 to 4927
if (SI->getCondition() != IdenticalSI->getCondition() ||
(SI->getTrueValue() != IdenticalSI->getTrueValue() &&
SI->getFalseValue() != IdenticalSI->getFalseValue()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (SI->getCondition() != IdenticalSI->getCondition() ||
(SI->getTrueValue() != IdenticalSI->getTrueValue() &&
SI->getFalseValue() != IdenticalSI->getFalseValue()))
if (SI->getCondition() != IdenticalSI->getCondition())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated accordingly.

if (SI->getTrueValue() == IdenticalSI->getTrueValue()) {
SIOtherVal = dyn_cast<SelectInst>(SI->getFalseValue());
IdenticalSIOtherVal = IdenticalSI->getFalseValue();
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else {
} else if (SI->getFalseValue() == IdenticalSI->getFalseValue()) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated accordingly.

} else {
SIOtherVal = dyn_cast<SelectInst>(SI->getTrueValue());
IdenticalSIOtherVal = IdenticalSI->getTrueValue();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
} else {
return false;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated accordingly.

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

}

/// Look for the following pattern and simplify %1 to %identicalPhi.
/// Here %phi, %1 and %phi.next perform the same functionality as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to give %1 a meaningful name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed %1 to %to_fold.

@CongzheUalberta
Copy link
Contributor Author

LG

Thanks a lot for the review!

@CongzheUalberta CongzheUalberta merged commit 9a9fbbb into llvm:main Oct 21, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llvm:analysis Includes value tracking, cost tables and constant folding llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants