Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
f413520
[LoopPeel] Fix branch weights' effect on block frequencies
jdenny-ornl Mar 19, 2025
f821eeb
Run update_test_checks.py on a test
jdenny-ornl Mar 26, 2025
af8ec56
Fix typo
jdenny-ornl Apr 4, 2025
a0264ad
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl Apr 8, 2025
fd29a49
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl Apr 9, 2025
6303177
Document new metadata
jdenny-ornl Apr 10, 2025
bbd0e95
Improve LangRef.rst entry
jdenny-ornl May 1, 2025
715cb0a
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl May 5, 2025
67fa67d
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl Jun 10, 2025
37ce859
Update fixmes
jdenny-ornl Jun 16, 2025
4337dcd
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl Jun 16, 2025
5193158
Update test for AArch4, which I did not build before
jdenny-ornl Jun 17, 2025
bbd2f22
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl Jul 10, 2025
b23f467
Run update script on test changed by merge from main
jdenny-ornl Jul 10, 2025
13d1fbb
[PGO] Add `llvm.loop.estimated_trip_count` metadata
jdenny-ornl Jul 15, 2025
e250cfc
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl Jul 15, 2025
859b84d
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Jul 15, 2025
db5920a
Merge branch 'main' into pgo-estimated-trip-count
jdenny-ornl Jul 21, 2025
47fbe85
Add PGOEstimateTripCounts in more cases
jdenny-ornl Jul 21, 2025
f8097fb
Add unused initialization
jdenny-ornl Jul 21, 2025
7b27203
Simplify some test changes
jdenny-ornl Jul 22, 2025
4c4669a
Extend verify pass to cover new metadata
jdenny-ornl Jul 24, 2025
0f40efd
Fix test for some builds
jdenny-ornl Jul 24, 2025
2791a1c
Merge branch 'main' into pgo-estimated-trip-count
jdenny-ornl Jul 24, 2025
6148922
Apply some small reviewer suggestions
jdenny-ornl Jul 24, 2025
3f6a91a
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Jul 24, 2025
e5a0a26
Update for merge from pgo-estimated-trip-count
jdenny-ornl Jul 24, 2025
3a49b43
Attempt to fix windows pre-commit CI
jdenny-ornl Jul 24, 2025
c283ebe
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Jul 24, 2025
2f7daa8
Merge branch 'main' into pgo-estimated-trip-count
jdenny-ornl Jul 28, 2025
ecbf6e0
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Jul 28, 2025
c627fc5
Merge branch 'main' into pgo-estimated-trip-count
jdenny-ornl Aug 7, 2025
f1fa8d9
Run update script on new test from last merge
jdenny-ornl Aug 7, 2025
38ace1e
Reapply 3a18fe33f0763cd9276c99c276448412100f6270
jdenny-ornl Aug 7, 2025
92ddaa0
Convert to function pass, avoid needless pass invalidation
jdenny-ornl Aug 8, 2025
a3e0d72
Fix layering violation
jdenny-ornl Aug 8, 2025
67f22cd
Apply clang-format
jdenny-ornl Aug 8, 2025
f0ff2e2
Merge branch 'main' into pgo-estimated-trip-count
jdenny-ornl Aug 9, 2025
69fe051
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Aug 9, 2025
e7eb1fe
Merge branch 'main' into pgo-estimated-trip-count
jdenny-ornl Aug 13, 2025
e4f68c3
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Aug 13, 2025
0973ab3
Merge branch 'main' into pgo-estimated-trip-count
jdenny-ornl Aug 18, 2025
680bdc2
Remove PGOEstimateTripCountsPass and no-value form of metadata
jdenny-ornl Aug 18, 2025
83531b3
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Aug 19, 2025
59cd184
Fix case where nested loops share latch
jdenny-ornl Aug 19, 2025
47051ce
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Aug 19, 2025
5d00250
Merge branch 'main' into pgo-estimated-trip-count
jdenny-ornl Aug 25, 2025
5719779
Remove redundant code
jdenny-ornl Aug 25, 2025
98cab7b
Clarify recent comments some
jdenny-ornl Aug 25, 2025
3cbe07d
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Aug 25, 2025
b3831b6
Merge branch 'main' into pgo-estimated-trip-count
jdenny-ornl Sep 1, 2025
59ab013
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Sep 2, 2025
12ce70e
[LoopUnroll] Skip remainder loop guard if skip unrolled loop
jdenny-ornl Sep 2, 2025
415cb8f
Improve comments
jdenny-ornl Sep 3, 2025
b8aed9b
Merge branch 'main' into pgo-estimated-trip-count
jdenny-ornl Sep 9, 2025
5c9e43e
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Sep 9, 2025
cc3283d
Merge branch 'fix-peel-branch-weights' into skip-unroll-epilog-guard
jdenny-ornl Sep 9, 2025
83ac767
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl Sep 15, 2025
1f81310
Empty commit to try to restart pre-commit CI
jdenny-ornl Sep 15, 2025
2382fbd
Merge branch 'fix-peel-branch-weights' into skip-unroll-epilog-guard
jdenny-ornl Sep 15, 2025
04c8ade
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl Sep 22, 2025
9b80c13
Merge branch 'fix-peel-branch-weights' into skip-unroll-epilog-guard
jdenny-ornl Sep 22, 2025
a1a5460
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl Sep 30, 2025
df9cf8c
Merge branch 'fix-peel-branch-weights' into skip-unroll-epilog-guard
jdenny-ornl Sep 30, 2025
4353f1f
Merge branch 'main' into skip-unroll-epilog-guard
jdenny-ornl Oct 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 59 additions & 33 deletions llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -201,18 +201,27 @@ static void ConnectProlog(Loop *L, Value *BECount, unsigned Count,
/// unroll count is non-zero.
///
/// This function performs the following:
/// - Update PHI nodes at the unrolling loop exit and epilog loop exit
/// - Create PHI nodes at the unrolling loop exit to combine
/// values that exit the unrolling loop code and jump around it.
/// - Update PHI nodes at the epilog loop exit
/// - Create PHI nodes at the unrolling loop exit and epilog preheader to
/// combine values that exit the unrolling loop code and jump around it.
/// - Update PHI operands in the epilog loop by the new PHI nodes
/// - Branch around the epilog loop if extra iters (ModVal) is zero.
/// - At the unrolling loop exit, branch around the epilog loop if extra iters
// (ModVal) is zero.
/// - At the epilog preheader, add an llvm.assume call that extra iters is
/// non-zero. If the unrolling loop exit is the predecessor, the above new
/// branch guarantees that assumption. If the unrolling loop preheader is the
/// predecessor, then the required first iteration from the original loop has
/// yet to be executed, so it must be executed in the epilog loop. If we
/// later unroll the epilog loop, that llvm.assume call somehow enables
/// ScalarEvolution to compute a epilog loop maximum trip count, which enables
/// eliminating the branch at the end of the final unrolled epilog iteration.
///
static void ConnectEpilog(Loop *L, Value *ModVal, BasicBlock *NewExit,
BasicBlock *Exit, BasicBlock *PreHeader,
BasicBlock *EpilogPreHeader, BasicBlock *NewPreHeader,
ValueToValueMapTy &VMap, DominatorTree *DT,
LoopInfo *LI, bool PreserveLCSSA, ScalarEvolution &SE,
unsigned Count) {
unsigned Count, AssumptionCache &AC) {
BasicBlock *Latch = L->getLoopLatch();
assert(Latch && "Loop must have a latch");
BasicBlock *EpilogLatch = cast<BasicBlock>(VMap[Latch]);
Expand All @@ -231,7 +240,7 @@ static void ConnectEpilog(Loop *L, Value *ModVal, BasicBlock *NewExit,
// EpilogLatch
// Exit (EpilogPN)

// Update PHI nodes at NewExit and Exit.
// Update PHI nodes at Exit.
for (PHINode &PN : NewExit->phis()) {
// PN should be used in another PHI located in Exit block as
// Exit was split by SplitBlockPredecessors into Exit and NewExit
Expand All @@ -246,15 +255,11 @@ static void ConnectEpilog(Loop *L, Value *ModVal, BasicBlock *NewExit,
// epilogue edges have already been added.
//
// There is EpilogPreHeader incoming block instead of NewExit as
// NewExit was spilt 1 more time to get EpilogPreHeader.
// NewExit was split 1 more time to get EpilogPreHeader.
assert(PN.hasOneUse() && "The phi should have 1 use");
PHINode *EpilogPN = cast<PHINode>(PN.use_begin()->getUser());
assert(EpilogPN->getParent() == Exit && "EpilogPN should be in Exit block");

// Add incoming PreHeader from branch around the Loop
PN.addIncoming(PoisonValue::get(PN.getType()), PreHeader);
SE.forgetValue(&PN);

Value *V = PN.getIncomingValueForBlock(Latch);
Instruction *I = dyn_cast<Instruction>(V);
if (I && L->contains(I))
Expand All @@ -271,35 +276,52 @@ static void ConnectEpilog(Loop *L, Value *ModVal, BasicBlock *NewExit,
NewExit);
// Now PHIs should look like:
// NewExit:
// PN = PHI [I, Latch], [poison, PreHeader]
// PN = PHI [I, Latch]
// ...
// Exit:
// EpilogPN = PHI [PN, NewExit], [VMap[I], EpilogLatch]
}

// Create PHI nodes at NewExit (from the unrolling loop Latch and PreHeader).
// Update corresponding PHI nodes in epilog loop.
// Create PHI nodes at NewExit (from the unrolling loop Latch) and at
// EpilogPreHeader (from PreHeader and NewExit). Update corresponding PHI
// nodes in epilog loop.
for (BasicBlock *Succ : successors(Latch)) {
// Skip this as we already updated phis in exit blocks.
if (!L->contains(Succ))
continue;

// Succ here appears to always be just L->getHeader(). Otherwise, how do we
// know its corresponding epilog block (from VMap) is EpilogHeader and thus
// EpilogPreHeader is the right incoming block for VPN, as set below?
// TODO: Can we thus avoid the enclosing loop over successors?
assert(Succ == L->getHeader() &&
"Expect the only in-loop successor of latch to be the loop header");

for (PHINode &PN : Succ->phis()) {
// Add new PHI nodes to the loop exit block and update epilog
// PHIs with the new PHI values.
PHINode *NewPN = PHINode::Create(PN.getType(), 2, PN.getName() + ".unr");
NewPN->insertBefore(NewExit->getFirstNonPHIIt());
// Adding a value to the new PHI node from the unrolling loop preheader.
NewPN->addIncoming(PN.getIncomingValueForBlock(NewPreHeader), PreHeader);
// Adding a value to the new PHI node from the unrolling loop latch.
NewPN->addIncoming(PN.getIncomingValueForBlock(Latch), Latch);
// Add new PHI nodes to the loop exit block.
PHINode *NewPN0 = PHINode::Create(PN.getType(), /*NumReservedValues=*/1,
PN.getName() + ".unr");
NewPN0->insertBefore(NewExit->getFirstNonPHIIt());
// Add value to the new PHI node from the unrolling loop latch.
NewPN0->addIncoming(PN.getIncomingValueForBlock(Latch), Latch);

// Add new PHI nodes to EpilogPreHeader.
PHINode *NewPN1 = PHINode::Create(PN.getType(), /*NumReservedValues=*/2,
PN.getName() + ".epil.init");
NewPN1->insertBefore(EpilogPreHeader->getFirstNonPHIIt());
// Add value to the new PHI node from the unrolling loop preheader.
NewPN1->addIncoming(PN.getIncomingValueForBlock(NewPreHeader), PreHeader);
// Add value to the new PHI node from the epilog loop guard.
NewPN1->addIncoming(NewPN0, NewExit);

// Update the existing PHI node operand with the value from the new PHI
// node. Corresponding instruction in epilog loop should be PHI.
PHINode *VPN = cast<PHINode>(VMap[&PN]);
VPN->setIncomingValueForBlock(EpilogPreHeader, NewPN);
VPN->setIncomingValueForBlock(EpilogPreHeader, NewPN1);
}
}

// In NewExit, branch around the epilog loop if no extra iters.
Instruction *InsertPt = NewExit->getTerminator();
IRBuilder<> B(InsertPt);
Value *BrLoopExit = B.CreateIsNotNull(ModVal, "lcmp.mod");
Expand All @@ -308,7 +330,7 @@ static void ConnectEpilog(Loop *L, Value *ModVal, BasicBlock *NewExit,
SmallVector<BasicBlock*, 4> Preds(predecessors(Exit));
SplitBlockPredecessors(Exit, Preds, ".epilog-lcssa", DT, LI, nullptr,
PreserveLCSSA);
// Add the branch to the exit block (around the unrolling loop)
// Add the branch to the exit block (around the epilog loop)
MDNode *BranchWeights = nullptr;
if (hasBranchWeightMD(*Latch->getTerminator())) {
// Assume equal distribution in interval [0, Count).
Expand All @@ -322,10 +344,11 @@ static void ConnectEpilog(Loop *L, Value *ModVal, BasicBlock *NewExit,
DT->changeImmediateDominator(Exit, NewDom);
}

// Split the main loop exit to maintain canonicalization guarantees.
SmallVector<BasicBlock*, 4> NewExitPreds{Latch};
SplitBlockPredecessors(NewExit, NewExitPreds, ".loopexit", DT, LI, nullptr,
PreserveLCSSA);
// In EpilogPreHeader, assume extra iters is non-zero.
IRBuilder<> B2(EpilogPreHeader, EpilogPreHeader->getFirstNonPHIIt());
Value *ModIsNotNull = B2.CreateIsNotNull(ModVal, "lcmp.mod");
AssumeInst *AI = cast<AssumeInst>(B2.CreateAssumption(ModIsNotNull));
AC.registerAssumption(AI);
}

/// Create a clone of the blocks in a loop and connect them together. A new
Expand Down Expand Up @@ -795,7 +818,8 @@ bool llvm::UnrollRuntimeLoopRemainder(
ConstantInt::get(BECount->getType(),
Count - 1)) :
B.CreateIsNotNull(ModVal, "lcmp.mod");
BasicBlock *RemainderLoop = UseEpilogRemainder ? NewExit : PrologPreHeader;
BasicBlock *RemainderLoop =
UseEpilogRemainder ? EpilogPreHeader : PrologPreHeader;
BasicBlock *UnrollingLoop = UseEpilogRemainder ? NewPreHeader : PrologExit;
// Branch to either remainder (extra iterations) loop or unrolling loop.
MDNode *BranchWeights = nullptr;
Expand All @@ -808,7 +832,7 @@ bool llvm::UnrollRuntimeLoopRemainder(
PreHeaderBR->eraseFromParent();
if (DT) {
if (UseEpilogRemainder)
DT->changeImmediateDominator(NewExit, PreHeader);
DT->changeImmediateDominator(EpilogPreHeader, PreHeader);
else
DT->changeImmediateDominator(PrologExit, PreHeader);
}
Expand Down Expand Up @@ -880,7 +904,8 @@ bool llvm::UnrollRuntimeLoopRemainder(
// from both the original loop and the remainder code reaching the exit
// blocks. While the IDom of these exit blocks were from the original loop,
// now the IDom is the preheader (which decides whether the original loop or
// remainder code should run).
// remainder code should run) unless the block still has just the original
// predecessor (such as NewExit in the case of an epilog remainder).
if (DT && !L->getExitingBlock()) {
SmallVector<BasicBlock *, 16> ChildrenToUpdate;
// NB! We have to examine the dom children of all loop blocks, not just
Expand All @@ -891,7 +916,8 @@ bool llvm::UnrollRuntimeLoopRemainder(
auto *DomNodeBB = DT->getNode(BB);
for (auto *DomChild : DomNodeBB->children()) {
auto *DomChildBB = DomChild->getBlock();
if (!L->contains(LI->getLoopFor(DomChildBB)))
if (!L->contains(LI->getLoopFor(DomChildBB)) &&
DomChildBB->getUniquePredecessor() != BB)
ChildrenToUpdate.push_back(DomChildBB);
}
}
Expand Down Expand Up @@ -930,7 +956,7 @@ bool llvm::UnrollRuntimeLoopRemainder(
// Connect the epilog code to the original loop and update the
// PHI functions.
ConnectEpilog(L, ModVal, NewExit, LatchExit, PreHeader, EpilogPreHeader,
NewPreHeader, VMap, DT, LI, PreserveLCSSA, *SE, Count);
NewPreHeader, VMap, DT, LI, PreserveLCSSA, *SE, Count, *AC);

// Update counter in loop for unrolling.
// Use an incrementing IV. Pre-incr/post-incr is backedge/trip count.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
;; Check atoms are remapped for runtime unrolling.

; CHECK: for.body.epil:
; CHECK-NEXT: store i64 %indvars.iv.unr, ptr %p, align 4, !dbg [[G2R1:!.*]]
; CHECK-NEXT: store i64 %indvars.iv.epil.init, ptr %p, align 4, !dbg [[G2R1:!.*]]

; CHECK: for.body.epil.1:
; CHECK-NEXT: store i64 %indvars.iv.next.epil, ptr %p, align 4, !dbg [[G3R1:!.*]]
Expand Down
8 changes: 4 additions & 4 deletions llvm/test/Transforms/HardwareLoops/ARM/structure.ll
Original file line number Diff line number Diff line change
Expand Up @@ -321,10 +321,10 @@ for.inc: ; preds = %sw.bb, %sw.bb1, %fo
; CHECK-UNROLL-NOT: dls
; CHECK-UNROLL: [[LOOP:.LBB[0-9_]+]]: @ %for.body
; CHECK-UNROLL: le lr, [[LOOP]]
; CHECK-UNROLL: wls lr, r12, [[EXIT:.LBB[0-9_]+]]
; CHECK-UNROLL: dls lr, r12
; CHECK-UNROLL: [[EPIL:.LBB[0-9_]+]]:
; CHECK-UNROLL: le lr, [[EPIL]]
; CHECK-UNROLL-NEXT: [[EXIT]]
; CHECK-UNROLL-NEXT: {{\.LBB[0-9_]+}}: @ %for.cond.cleanup

define void @unroll_inc_int(ptr nocapture %a, ptr nocapture readonly %b, ptr nocapture readonly %c, i32 %N) {
entry:
Expand Down Expand Up @@ -357,10 +357,10 @@ for.body:
; CHECK-UNROLL-NOT: dls
; CHECK-UNROLL: [[LOOP:.LBB[0-9_]+]]: @ %for.body
; CHECK-UNROLL: le lr, [[LOOP]]
; CHECK-UNROLL: wls lr, r12, [[EPIL_EXIT:.LBB[0-9_]+]]
; CHECK-UNROLL: dls lr, r12
; CHECK-UNROLL: [[EPIL:.LBB[0-9_]+]]:
; CHECK-UNROLL: le lr, [[EPIL]]
; CHECK-UNROLL: [[EPIL_EXIT]]:
; CHECK-UNROLL: {{\.LBB[0-9_]+}}: @ %for.cond.cleanup
; CHECK-UNROLL: pop
define void @unroll_inc_unsigned(ptr nocapture %a, ptr nocapture readonly %b, ptr nocapture readonly %c, i32 %N) {
entry:
Expand Down
Loading