Skip to content
Merged
Show file tree
Hide file tree
Changes from 83 commits
Commits
Show all changes
94 commits
Select commit Hold shift + click to select a range
f413520
[LoopPeel] Fix branch weights' effect on block frequencies
jdenny-ornl Mar 19, 2025
f821eeb
Run update_test_checks.py on a test
jdenny-ornl Mar 26, 2025
af8ec56
Fix typo
jdenny-ornl Apr 4, 2025
a0264ad
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl Apr 8, 2025
fd29a49
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl Apr 9, 2025
6303177
Document new metadata
jdenny-ornl Apr 10, 2025
bbd0e95
Improve LangRef.rst entry
jdenny-ornl May 1, 2025
715cb0a
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl May 5, 2025
67fa67d
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl Jun 10, 2025
37ce859
Update fixmes
jdenny-ornl Jun 16, 2025
4337dcd
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl Jun 16, 2025
5193158
Update test for AArch4, which I did not build before
jdenny-ornl Jun 17, 2025
bbd2f22
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl Jul 10, 2025
b23f467
Run update script on test changed by merge from main
jdenny-ornl Jul 10, 2025
13d1fbb
[PGO] Add `llvm.loop.estimated_trip_count` metadata
jdenny-ornl Jul 15, 2025
e250cfc
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl Jul 15, 2025
859b84d
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Jul 15, 2025
db5920a
Merge branch 'main' into pgo-estimated-trip-count
jdenny-ornl Jul 21, 2025
47fbe85
Add PGOEstimateTripCounts in more cases
jdenny-ornl Jul 21, 2025
f8097fb
Add unused initialization
jdenny-ornl Jul 21, 2025
7b27203
Simplify some test changes
jdenny-ornl Jul 22, 2025
4c4669a
Extend verify pass to cover new metadata
jdenny-ornl Jul 24, 2025
0f40efd
Fix test for some builds
jdenny-ornl Jul 24, 2025
2791a1c
Merge branch 'main' into pgo-estimated-trip-count
jdenny-ornl Jul 24, 2025
6148922
Apply some small reviewer suggestions
jdenny-ornl Jul 24, 2025
3f6a91a
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Jul 24, 2025
e5a0a26
Update for merge from pgo-estimated-trip-count
jdenny-ornl Jul 24, 2025
3a49b43
Attempt to fix windows pre-commit CI
jdenny-ornl Jul 24, 2025
c283ebe
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Jul 24, 2025
2f7daa8
Merge branch 'main' into pgo-estimated-trip-count
jdenny-ornl Jul 28, 2025
ecbf6e0
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Jul 28, 2025
c627fc5
Merge branch 'main' into pgo-estimated-trip-count
jdenny-ornl Aug 7, 2025
f1fa8d9
Run update script on new test from last merge
jdenny-ornl Aug 7, 2025
38ace1e
Reapply 3a18fe33f0763cd9276c99c276448412100f6270
jdenny-ornl Aug 7, 2025
92ddaa0
Convert to function pass, avoid needless pass invalidation
jdenny-ornl Aug 8, 2025
a3e0d72
Fix layering violation
jdenny-ornl Aug 8, 2025
67f22cd
Apply clang-format
jdenny-ornl Aug 8, 2025
f0ff2e2
Merge branch 'main' into pgo-estimated-trip-count
jdenny-ornl Aug 9, 2025
69fe051
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Aug 9, 2025
e7eb1fe
Merge branch 'main' into pgo-estimated-trip-count
jdenny-ornl Aug 13, 2025
e4f68c3
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Aug 13, 2025
0973ab3
Merge branch 'main' into pgo-estimated-trip-count
jdenny-ornl Aug 18, 2025
680bdc2
Remove PGOEstimateTripCountsPass and no-value form of metadata
jdenny-ornl Aug 18, 2025
83531b3
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Aug 19, 2025
59cd184
Fix case where nested loops share latch
jdenny-ornl Aug 19, 2025
47051ce
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Aug 19, 2025
5d00250
Merge branch 'main' into pgo-estimated-trip-count
jdenny-ornl Aug 25, 2025
5719779
Remove redundant code
jdenny-ornl Aug 25, 2025
98cab7b
Clarify recent comments some
jdenny-ornl Aug 25, 2025
3cbe07d
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Aug 25, 2025
b3831b6
Merge branch 'main' into pgo-estimated-trip-count
jdenny-ornl Sep 1, 2025
59ab013
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Sep 2, 2025
12ce70e
[LoopUnroll] Skip remainder loop guard if skip unrolled loop
jdenny-ornl Sep 2, 2025
415cb8f
Improve comments
jdenny-ornl Sep 3, 2025
b8aed9b
Merge branch 'main' into pgo-estimated-trip-count
jdenny-ornl Sep 9, 2025
5c9e43e
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Sep 9, 2025
cc3283d
Merge branch 'fix-peel-branch-weights' into skip-unroll-epilog-guard
jdenny-ornl Sep 9, 2025
75a8df6
[LoopUnroll] Fix block frequencies when no runtime
jdenny-ornl Sep 9, 2025
83ac767
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl Sep 15, 2025
1f81310
Empty commit to try to restart pre-commit CI
jdenny-ornl Sep 15, 2025
2382fbd
Merge branch 'fix-peel-branch-weights' into skip-unroll-epilog-guard
jdenny-ornl Sep 15, 2025
967f8a1
Merge branch 'skip-unroll-epilog-guard' into fix-blockfreq-unroll-no-…
jdenny-ornl Sep 15, 2025
2897e64
Improve some code comments
jdenny-ornl Sep 16, 2025
5a99593
[LoopUnroll] Fix block frequencies for epilogue
jdenny-ornl Sep 16, 2025
a95ebd1
Revert accidental change from right before pushing
jdenny-ornl Sep 16, 2025
876e055
Remove xfails
jdenny-ornl Sep 17, 2025
eb00930
Merge branch 'fix-blockfreq-unroll-no-runtime' into fix-blockfreq-unr…
jdenny-ornl Sep 17, 2025
04c8ade
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl Sep 22, 2025
9b80c13
Merge branch 'fix-peel-branch-weights' into skip-unroll-epilog-guard
jdenny-ornl Sep 22, 2025
9215f47
Merge branch 'skip-unroll-epilog-guard' into fix-blockfreq-unroll-no-…
jdenny-ornl Sep 22, 2025
712b282
Merge branch 'fix-blockfreq-unroll-no-runtime' into fix-blockfreq-unr…
jdenny-ornl Sep 22, 2025
d85294e
Use BranchProbability
jdenny-ornl Sep 25, 2025
72c633f
Drop includes no longer needed
jdenny-ornl Sep 25, 2025
dc1f5f1
Clean up new LoopUtils comments
jdenny-ornl Sep 26, 2025
a1a5460
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl Sep 30, 2025
df9cf8c
Merge branch 'fix-peel-branch-weights' into skip-unroll-epilog-guard
jdenny-ornl Sep 30, 2025
99c95b1
Merge branch 'skip-unroll-epilog-guard' into fix-blockfreq-unroll-no-…
jdenny-ornl Sep 30, 2025
84e5223
Merge branch 'fix-blockfreq-unroll-no-runtime' into fix-blockfreq-unr…
jdenny-ornl Sep 30, 2025
4353f1f
Merge branch 'main' into skip-unroll-epilog-guard
jdenny-ornl Oct 6, 2025
f66ae02
Merge branch 'skip-unroll-epilog-guard' into fix-blockfreq-unroll-no-…
jdenny-ornl Oct 6, 2025
0eeada8
Merge branch 'fix-blockfreq-unroll-no-runtime' into fix-blockfreq-unr…
jdenny-ornl Oct 6, 2025
22fdacf
Merge branch 'main' into users/jdenny-ornl/fix-blockfreq-unroll-no-ru…
jdenny-ornl Oct 7, 2025
e57014c
Merge branch 'users/jdenny-ornl/fix-blockfreq-unroll-no-runtime' into…
jdenny-ornl Oct 7, 2025
f625d45
Merge branch 'main' into fix-blockfreq-unroll-no-runtime
jdenny-ornl Oct 13, 2025
a5a4a45
Merge branch 'fix-blockfreq-unroll-no-runtime' into fix-blockfreq-unr…
jdenny-ornl Oct 13, 2025
943e95e
`+= 1` -> `++`
jdenny-ornl Oct 14, 2025
1e2307e
Use setBranchWeights
jdenny-ornl Oct 14, 2025
662e60f
Merge branch 'main' into fix-blockfreq-unroll-no-runtime
jdenny-ornl Oct 22, 2025
2ca587b
Merge branch 'fix-blockfreq-unroll-no-runtime' into fix-blockfreq-unr…
jdenny-ornl Oct 22, 2025
093ad2a
Merge branch 'main' into fix-blockfreq-unroll-no-runtime
jdenny-ornl Oct 29, 2025
21c4f9c
Merge branch 'fix-blockfreq-unroll-no-runtime' into fix-blockfreq-unr…
jdenny-ornl Oct 29, 2025
663cf2b
Merge branch 'main' into fix-blockfreq-unroll-no-runtime
jdenny-ornl Oct 30, 2025
5ed2dec
Merge branch 'fix-blockfreq-unroll-no-runtime' into fix-blockfreq-unr…
jdenny-ornl Oct 30, 2025
b95a9e2
Merge branch 'main' into users/jdenny-ornl/fix-blockfreq-unroll-epilogue
jdenny-ornl Oct 31, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions llvm/include/llvm/Support/BranchProbability.h
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,9 @@ class BranchProbability {
/// \return \c Num divided by \c this.
LLVM_ABI uint64_t scaleByInverse(uint64_t Num) const;

/// Compute pow(Probability, N).
BranchProbability pow(unsigned N) const;

BranchProbability &operator+=(BranchProbability RHS) {
assert(N != UnknownN && RHS.N != UnknownN &&
"Unknown probability cannot participate in arithmetics.");
Expand Down
34 changes: 34 additions & 0 deletions llvm/include/llvm/Transforms/Utils/LoopUtils.h
Original file line number Diff line number Diff line change
Expand Up @@ -365,6 +365,40 @@ LLVM_ABI bool setLoopEstimatedTripCount(
Loop *L, unsigned EstimatedTripCount,
std::optional<unsigned> EstimatedLoopInvocationWeight = std::nullopt);

/// Based on branch weight metadata, return either:
/// - An unknown probability if the implementation is unable to handle the loop
/// form of \p L (e.g., \p L must have a latch block that controls the loop
/// exit).
/// - The probability \c P that, at the end of any iteration, the latch of \p L
/// will start another iteration such that `1 - P` is the probability of
/// exiting the loop.
BranchProbability getLoopProbability(Loop *L);

/// Set branch weight metadata for the latch of \p L to indicate that, at the
/// end of any iteration, \p P and `1 - P` are the probabilities of starting
/// another iteration and exiting the loop, respectively. Return false if the
/// implementation is unable to handle the loop form of \p L (e.g., \p L must
/// have a latch block that controls the loop exit). Otherwise, return true.
bool setLoopProbability(Loop *L, BranchProbability P);

/// Based on branch weight metadata, return either:
/// - An unknown probability if the implementation cannot extract the
/// probability (e.g., \p B must have exactly two target labels, so it must be
/// a conditional branch).
/// - The probability \c P that control flows from \p B to its first target
/// label such that `1 - P` is the probability of control flowing to its
/// second target label, or vice-versa if \p ForFirstTarget is false.
BranchProbability getBranchProbability(BranchInst *B, bool ForFirstTarget);

/// Set branch weight metadata for \p B to indicate that \p P and `1 - P` are
/// the probabilities of control flowing to its first and second target labels,
/// respectively, or vice-versa if \p ForFirstTarget is false. Return false if
/// the implementation cannot set the probability (e.g., \p B must have exactly
/// two target labels, so it must be a conditional branch). Otherwise, return
/// true.
bool setBranchProbability(BranchInst *B, BranchProbability P,
bool ForFirstTarget);

/// Check inner loop (L) backedge count is known to be invariant on all
/// iterations of its outer loop. If the loop has no parent, this is trivially
/// true.
Expand Down
4 changes: 3 additions & 1 deletion llvm/include/llvm/Transforms/Utils/UnrollLoop.h
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,9 @@ LLVM_ABI bool UnrollRuntimeLoopRemainder(
LoopInfo *LI, ScalarEvolution *SE, DominatorTree *DT, AssumptionCache *AC,
const TargetTransformInfo *TTI, bool PreserveLCSSA,
unsigned SCEVExpansionBudget, bool RuntimeUnrollMultiExit,
Loop **ResultLoop = nullptr);
Loop **ResultLoop = nullptr,
std::optional<unsigned> OriginalTripCount = std::nullopt,
BranchProbability OriginalLoopProb = BranchProbability::getUnknown());

LLVM_ABI LoopUnrollResult UnrollAndJamLoop(
Loop *L, unsigned Count, unsigned TripCount, unsigned TripMultiple,
Expand Down
7 changes: 7 additions & 0 deletions llvm/lib/Support/BranchProbability.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -111,3 +111,10 @@ uint64_t BranchProbability::scale(uint64_t Num) const {
uint64_t BranchProbability::scaleByInverse(uint64_t Num) const {
return ::scale<0>(Num, D, N);
}

BranchProbability BranchProbability::pow(unsigned N) const {
BranchProbability Res = BranchProbability::getOne();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some guards on N for overflow? like a LLVM_DEBUG maybe, and/or assert.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is an example that would cause overflow here, and why would the operator*= used here not already handle it? Keep in mind that BranchProbability represents a value from 0 to 1, so the result should always be a value that is also in that range.

for (unsigned I = 0; I < N; ++I)
Res *= *this;
return Res;
}
59 changes: 47 additions & 12 deletions llvm/lib/Transforms/Utils/LoopUnroll.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -499,9 +499,9 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI,

const unsigned MaxTripCount = SE->getSmallConstantMaxTripCount(L);
const bool MaxOrZero = SE->isBackedgeTakenCountMaxOrZero(L);
unsigned EstimatedLoopInvocationWeight = 0;
std::optional<unsigned> OriginalTripCount =
llvm::getLoopEstimatedTripCount(L, &EstimatedLoopInvocationWeight);
llvm::getLoopEstimatedTripCount(L);
BranchProbability OriginalLoopProb = llvm::getLoopProbability(L);

// Effectively "DCE" unrolled iterations that are beyond the max tripcount
// and will never be executed.
Expand Down Expand Up @@ -592,11 +592,11 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI,
: isEpilogProfitable(L);

if (ULO.Runtime &&
!UnrollRuntimeLoopRemainder(L, ULO.Count, ULO.AllowExpensiveTripCount,
EpilogProfitability, ULO.UnrollRemainder,
ULO.ForgetAllSCEV, LI, SE, DT, AC, TTI,
PreserveLCSSA, ULO.SCEVExpansionBudget,
ULO.RuntimeUnrollMultiExit, RemainderLoop)) {
!UnrollRuntimeLoopRemainder(
L, ULO.Count, ULO.AllowExpensiveTripCount, EpilogProfitability,
ULO.UnrollRemainder, ULO.ForgetAllSCEV, LI, SE, DT, AC, TTI,
PreserveLCSSA, ULO.SCEVExpansionBudget, ULO.RuntimeUnrollMultiExit,
RemainderLoop, OriginalTripCount, OriginalLoopProb)) {
if (ULO.Force)
ULO.Runtime = false;
else {
Expand Down Expand Up @@ -1131,11 +1131,46 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI,
LI->erase(L);
// We shouldn't try to use `L` anymore.
L = nullptr;
} else if (OriginalTripCount) {
// Update the trip count. Note that the remainder has already logic
// computing it in `UnrollRuntimeLoopRemainder`.
setLoopEstimatedTripCount(L, *OriginalTripCount / ULO.Count,
EstimatedLoopInvocationWeight);
} else {
// Update metadata for the loop's branch weights and estimated trip count:
// - If ULO.Runtime, UnrollRuntimeLoopRemainder sets the guard branch
// weights, latch branch weights, and estimated trip count of the
// remainder loop it creates. It also sets the branch weights for the
// unrolled loop guard it creates. The branch weights for the unrolled
// loop latch are adjusted below. FIXME: Handle prologue loops.
// - Otherwise, if unrolled loop iteration latches become unconditional,
// branch weights are adjusted above. FIXME: Actually handle such
// unconditional latches.
// - Otherwise, the original loop's branch weights are correct for the
// unrolled loop, so do not adjust them.
// - In all cases, the unrolled loop's estimated trip count is set below.
//
// As an example of the last case, consider what happens if the unroll count
// is 4 for a loop with an estimated trip count of 10 when we do not create
// a remainder loop and all iterations' latches remain conditional. Each
// unrolled iteration's latch still has the same probability of exiting the
// loop as it did when in the original loop, and thus it should still have
// the same branch weights. Each unrolled iteration's non-zero probability
// of exiting already appropriately reduces the probability of reaching the
// remaining iterations just as it did in the original loop. Trying to also
// adjust the branch weights of the final unrolled iteration's latch (i.e.,
// the backedge for the unrolled loop as a whole) to reflect its new trip
// count of 3 will erroneously further reduce its block frequencies.
// However, in case an analysis later needs to estimate the trip count of
// the unrolled loop as a whole without considering the branch weights for
// each unrolled iteration's latch within it, we store the new trip count as
// separate metadata.
if (!OriginalLoopProb.isUnknown() && ULO.Runtime && EpilogProfitability) {
// Where p is always the probability of executing at least 1 more
// iteration, the probability for at least n more iterations is p^n.
setLoopProbability(L, OriginalLoopProb.pow(ULO.Count));
}
if (OriginalTripCount) {
unsigned NewTripCount = *OriginalTripCount / ULO.Count;
if (!ULO.Runtime && *OriginalTripCount % ULO.Count)
NewTripCount += 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ++NewTripCount?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done. I was thinking in Python, I guess.

setLoopEstimatedTripCount(L, NewTripCount);
}
}

// LoopInfo should not be valid, confirm that.
Expand Down
100 changes: 82 additions & 18 deletions llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
#include "llvm/Transforms/Utils/LoopUtils.h"
#include "llvm/Transforms/Utils/ScalarEvolutionExpander.h"
#include "llvm/Transforms/Utils/UnrollLoop.h"
#include <cmath>

using namespace llvm;

Expand Down Expand Up @@ -195,6 +196,21 @@ static void ConnectProlog(Loop *L, Value *BECount, unsigned Count,
}
}

/// Assume, due to our position in the remainder loop or its guard, anywhere
/// from 0 to \p N more iterations can possibly execute. Among such cases in
/// the original loop (with loop probability \p OriginalLoopProb), what is the
/// probability of executing at least one more iteration?
static BranchProbability
probOfNextInRemainder(BranchProbability OriginalLoopProb, unsigned N) {
// Each of these variables holds the original loop's probability that the
// number of iterations it will execute is some m in the specified range.
BranchProbability ProbOne = OriginalLoopProb; // 1 <= m
BranchProbability ProbTooMany = ProbOne.pow(N + 1); // N + 1 <= m
BranchProbability ProbNotTooMany = ProbTooMany.getCompl(); // 0 <= m <= N
BranchProbability ProbOneNotTooMany = ProbOne - ProbTooMany; // 1 <= m <= N
return ProbOneNotTooMany / ProbNotTooMany;
}

/// Connect the unrolling epilog code to the original loop.
/// The unrolling epilog code contains code to execute the
/// 'extra' iterations if the run-time trip count modulo the
Expand All @@ -221,7 +237,8 @@ static void ConnectEpilog(Loop *L, Value *ModVal, BasicBlock *NewExit,
BasicBlock *EpilogPreHeader, BasicBlock *NewPreHeader,
ValueToValueMapTy &VMap, DominatorTree *DT,
LoopInfo *LI, bool PreserveLCSSA, ScalarEvolution &SE,
unsigned Count, AssumptionCache &AC) {
unsigned Count, AssumptionCache &AC,
BranchProbability OriginalLoopProb) {
BasicBlock *Latch = L->getLoopLatch();
assert(Latch && "Loop must have a latch");
BasicBlock *EpilogLatch = cast<BasicBlock>(VMap[Latch]);
Expand Down Expand Up @@ -332,12 +349,19 @@ static void ConnectEpilog(Loop *L, Value *ModVal, BasicBlock *NewExit,
PreserveLCSSA);
// Add the branch to the exit block (around the epilog loop)
MDNode *BranchWeights = nullptr;
if (hasBranchWeightMD(*Latch->getTerminator())) {
if (OriginalLoopProb.isUnknown() &&
hasBranchWeightMD(*Latch->getTerminator())) {
// Assume equal distribution in interval [0, Count).
MDBuilder MDB(B.getContext());
BranchWeights = MDB.createBranchWeights(1, Count - 1);
}
B.CreateCondBr(BrLoopExit, EpilogPreHeader, Exit, BranchWeights);
BranchInst *RemainderLoopGuard =
B.CreateCondBr(BrLoopExit, EpilogPreHeader, Exit, BranchWeights);
if (!OriginalLoopProb.isUnknown()) {
setBranchProbability(RemainderLoopGuard,
probOfNextInRemainder(OriginalLoopProb, Count - 1),
/*ForFirstTarget=*/true);
}
InsertPt->eraseFromParent();
if (DT) {
auto *NewDom = DT->findNearestCommonDominator(Exit, NewExit);
Expand All @@ -357,14 +381,15 @@ static void ConnectEpilog(Loop *L, Value *ModVal, BasicBlock *NewExit,
/// The cloned blocks should be inserted between InsertTop and InsertBot.
/// InsertTop should be new preheader, InsertBot new loop exit.
/// Returns the new cloned loop that is created.
static Loop *
CloneLoopBlocks(Loop *L, Value *NewIter, const bool UseEpilogRemainder,
const bool UnrollRemainder,
BasicBlock *InsertTop,
BasicBlock *InsertBot, BasicBlock *Preheader,
static Loop *CloneLoopBlocks(Loop *L, Value *NewIter,
const bool UseEpilogRemainder,
const bool UnrollRemainder, BasicBlock *InsertTop,
BasicBlock *InsertBot, BasicBlock *Preheader,
std::vector<BasicBlock *> &NewBlocks,
LoopBlocksDFS &LoopBlocks, ValueToValueMapTy &VMap,
DominatorTree *DT, LoopInfo *LI, unsigned Count) {
DominatorTree *DT, LoopInfo *LI, unsigned Count,
std::optional<unsigned> OriginalTripCount,
BranchProbability OriginalLoopProb) {
StringRef suffix = UseEpilogRemainder ? "epil" : "prol";
BasicBlock *Header = L->getHeader();
BasicBlock *Latch = L->getLoopLatch();
Expand Down Expand Up @@ -419,7 +444,8 @@ CloneLoopBlocks(Loop *L, Value *NewIter, const bool UseEpilogRemainder,
Builder.CreateAdd(NewIdx, One, NewIdx->getName() + ".next");
Value *IdxCmp = Builder.CreateICmpNE(IdxNext, NewIter, NewIdx->getName() + ".cmp");
MDNode *BranchWeights = nullptr;
if (hasBranchWeightMD(*LatchBR)) {
if ((OriginalLoopProb.isUnknown() || !UseEpilogRemainder) &&
hasBranchWeightMD(*LatchBR)) {
uint32_t ExitWeight;
uint32_t BackEdgeWeight;
if (Count >= 3) {
Expand All @@ -437,7 +463,29 @@ CloneLoopBlocks(Loop *L, Value *NewIter, const bool UseEpilogRemainder,
MDBuilder MDB(Builder.getContext());
BranchWeights = MDB.createBranchWeights(BackEdgeWeight, ExitWeight);
}
Builder.CreateCondBr(IdxCmp, FirstLoopBB, InsertBot, BranchWeights);
BranchInst *RemainderLoopLatch =
Builder.CreateCondBr(IdxCmp, FirstLoopBB, InsertBot, BranchWeights);
if (!OriginalLoopProb.isUnknown() && UseEpilogRemainder) {
// Compute the total frequency of the original loop body from the
// remainder iterations. Once we've reached them, the first of them
// always executes, so its frequency and probability are 1.
double FreqRemIters = 1;
if (Count > 2) {
BranchProbability ProbReaching = BranchProbability::getOne();
for (unsigned N = Count - 2; N >= 1; --N) {
ProbReaching *= probOfNextInRemainder(OriginalLoopProb, N);
FreqRemIters += double(ProbReaching.getNumerator()) /
ProbReaching.getDenominator();
}
}
// Solve for the loop probability that would produce that frequency.
// Sum(i=0..inf)(Prob^i) = 1/(1-Prob) = FreqRemIters.
double ProbDouble = 1 - 1 / FreqRemIters;
BranchProbability Prob = BranchProbability::getBranchProbability(
std::round(ProbDouble * BranchProbability::getDenominator()),
BranchProbability::getDenominator());
setBranchProbability(RemainderLoopLatch, Prob, /*ForFirstTarget=*/true);
}
NewIdx->addIncoming(Zero, InsertTop);
NewIdx->addIncoming(IdxNext, NewBB);
LatchBR->eraseFromParent();
Expand Down Expand Up @@ -469,6 +517,8 @@ CloneLoopBlocks(Loop *L, Value *NewIter, const bool UseEpilogRemainder,

std::optional<MDNode *> NewLoopID = makeFollowupLoopID(
LoopID, {LLVMLoopUnrollFollowupAll, LLVMLoopUnrollFollowupRemainder});
if (OriginalTripCount && UseEpilogRemainder)
setLoopEstimatedTripCount(NewLoop, *OriginalTripCount % Count);
if (NewLoopID) {
NewLoop->setLoopID(*NewLoopID);

Expand Down Expand Up @@ -603,7 +653,8 @@ bool llvm::UnrollRuntimeLoopRemainder(
LoopInfo *LI, ScalarEvolution *SE, DominatorTree *DT, AssumptionCache *AC,
const TargetTransformInfo *TTI, bool PreserveLCSSA,
unsigned SCEVExpansionBudget, bool RuntimeUnrollMultiExit,
Loop **ResultLoop) {
Loop **ResultLoop, std::optional<unsigned> OriginalTripCount,
BranchProbability OriginalLoopProb) {
LLVM_DEBUG(dbgs() << "Trying runtime unrolling on Loop: \n");
LLVM_DEBUG(L->dump());
LLVM_DEBUG(UseEpilogRemainder ? dbgs() << "Using epilog remainder.\n"
Expand Down Expand Up @@ -823,12 +874,23 @@ bool llvm::UnrollRuntimeLoopRemainder(
BasicBlock *UnrollingLoop = UseEpilogRemainder ? NewPreHeader : PrologExit;
// Branch to either remainder (extra iterations) loop or unrolling loop.
MDNode *BranchWeights = nullptr;
if (hasBranchWeightMD(*Latch->getTerminator())) {
if ((OriginalLoopProb.isUnknown() || !UseEpilogRemainder) &&
hasBranchWeightMD(*Latch->getTerminator())) {
// Assume loop is nearly always entered.
MDBuilder MDB(B.getContext());
BranchWeights = MDB.createBranchWeights(EpilogHeaderWeights);
}
B.CreateCondBr(BranchVal, RemainderLoop, UnrollingLoop, BranchWeights);
BranchInst *UnrollingLoopGuard =
B.CreateCondBr(BranchVal, RemainderLoop, UnrollingLoop, BranchWeights);
if (!OriginalLoopProb.isUnknown() && UseEpilogRemainder) {
// The original loop's first iteration always happens. Compute the
// probability of the original loop executing Count-1 iterations after that
// to complete the first iteration of the unrolled loop.
BranchProbability ProbOne = OriginalLoopProb;
BranchProbability ProbRest = ProbOne.pow(Count - 1);
setBranchProbability(UnrollingLoopGuard, ProbRest,
/*ForFirstTarget=*/false);
}
PreHeaderBR->eraseFromParent();
if (DT) {
if (UseEpilogRemainder)
Expand All @@ -855,9 +917,10 @@ bool llvm::UnrollRuntimeLoopRemainder(
// iterations. This function adds the appropriate CFG connections.
BasicBlock *InsertBot = UseEpilogRemainder ? LatchExit : PrologExit;
BasicBlock *InsertTop = UseEpilogRemainder ? EpilogPreHeader : PrologPreHeader;
Loop *remainderLoop = CloneLoopBlocks(
L, ModVal, UseEpilogRemainder, UnrollRemainder, InsertTop, InsertBot,
NewPreHeader, NewBlocks, LoopBlocks, VMap, DT, LI, Count);
Loop *remainderLoop =
CloneLoopBlocks(L, ModVal, UseEpilogRemainder, UnrollRemainder, InsertTop,
InsertBot, NewPreHeader, NewBlocks, LoopBlocks, VMap, DT,
LI, Count, OriginalTripCount, OriginalLoopProb);

// Insert the cloned blocks into the function.
F->splice(InsertBot->getIterator(), F, NewBlocks[0]->getIterator(), F->end());
Expand Down Expand Up @@ -956,7 +1019,8 @@ bool llvm::UnrollRuntimeLoopRemainder(
// Connect the epilog code to the original loop and update the
// PHI functions.
ConnectEpilog(L, ModVal, NewExit, LatchExit, PreHeader, EpilogPreHeader,
NewPreHeader, VMap, DT, LI, PreserveLCSSA, *SE, Count, *AC);
NewPreHeader, VMap, DT, LI, PreserveLCSSA, *SE, Count, *AC,
OriginalLoopProb);

// Update counter in loop for unrolling.
// Use an incrementing IV. Pre-incr/post-incr is backedge/trip count.
Expand Down
Loading