[LoopUnroll] Fix block frequencies when no runtime #157754

jdenny-ornl · 2025-09-09T21:49:34Z

This patch implements the LoopUnroll changes discussed in [RFC] Fix Loop Transformations to Preserve Block
Frequencies and is thus another step in addressing issue #135812.

In summary, for the case of partial loop unrolling without a remainder loop, this patch changes LoopUnroll to:

Maintain branch weights consistently with the original loop for the sake of preserving the total frequency of the original loop body.
Store the new estimated trip count in the llvm.loop.estimated_trip_count metadata, introduced by PR [PGO] Add llvm.loop.estimated_trip_count metadata #148758.
Correct the new estimated trip count (e.g., 3 instead of 2) when the original estimated trip count (e.g., 10) divided by the unroll count (e.g., 4) leaves a remainder (e.g., 2).

There are loop unrolling cases this patch does not fully fix, such as partial unrolling with a remainder loop and complete unrolling, and there are two associated tests whose branch weights this patch adversely affects. They will be addressed in future patches that should land with this patch.

@f

For example: ``` declare void @f(i32) define void @test(i32 %n) { entry: br label %do.body do.body: %i = phi i32 [ 0, %entry ], [ %inc, %do.body ] %inc = add i32 %i, 1 call void @f(i32 %i) %c = icmp sge i32 %inc, %n br i1 %c, label %do.end, label %do.body, !prof !0 do.end: ret void } !0 = !{!"branch_weights", i32 1, i32 9} ``` Given those branch weights, once any loop iteration is actually reached, the probability of the loop exiting at the iteration's end is 1/(1+9). That is, the loop is likely to exit every 10 iterations and thus has an estimated trip count of 10. `opt -passes='print<block-freq>'` shows that 10 is indeed the frequency of the loop body: ``` Printing analysis results of BFI for function 'test': block-frequency-info: test - entry: float = 1.0, int = 1801439852625920 - do.body: float = 10.0, int = 18014398509481984 - do.end: float = 1.0, int = 1801439852625920 ``` Key Observation: The frequency of reaching any particular iteration is less than for the previous iteration because the previous iteration has a non-zero probability of exiting the loop. This observation holds even though every loop iteration, once actually reached, has exactly the same probability of exiting and thus exactly the same branch weights. Now we use `opt -unroll-force-peel-count=2 -passes=loop-unroll` to peel 2 iterations and insert them before the remaining loop. We expect the key observation above not to change, but it does under the implementation without this patch. The block frequency becomes 1.0 for the first iteration, 0.9 for the second, and 6.4 for the main loop body. Again, a decreasing frequency is expected, but it decreases too much: the total frequency of the original loop body becomes 8.3. The new branch weights reveal the problem: ``` !0 = !{!"branch_weights", i32 1, i32 9} !1 = !{!"branch_weights", i32 1, i32 8} !2 = !{!"branch_weights", i32 1, i32 7} ``` The exit probability is now 1/10 for the first peeled iteration, 1/9 for the second, and 1/8 for the remaining loop iterations. It seems this behavior is trying to ensure a decreasing block frequency. However, as in the key observation above for the original loop, that happens correctly without decreasing the branch weights across iterations. This patch changes the peeling implementation not to decrease the branch weights across loop iterations so that the frequency for every iteration is the same as it was in the original loop. The total frequency of the loop body, summed across all its occurrences, thus remains 10 after peeling. Unfortunately, that change means a later analysis cannot accurately estimate the trip count of the remaining loop while examining the remaining loop in isolation without considering the probability of actually reaching it. For that purpose, this patch stores the new trip count as separate metadata named `llvm.loop.estimated_trip_count` and extends `llvm::getLoopEstimatedTripCount` to prefer it, if present, over branch weights. An alternative fix is for `llvm::getLoopEstimatedTripCount` to subtract the `llvm.loop.peeled.count` metadata from the trip count estimated by a loop's branch weights. However, there might be other loop transformations that still corrupt block frequencies in a similar manner and require a similar fix. `llvm.loop.estimated_trip_count` is intended to provide a general way to store estimated trip counts when branch weights cannot directly store them. This patch introduces several fixme comments that need to be addressed before it can land.

Extending beyond the limitations of `getExpectedExitLoopLatchBranch` is a possible improvement for the future not an urgent fixme. No one has pointed out code that computes estimated trip counts without using `llvm::getLoopEstimatedTripCount`.

The update adds this PR's new metadata.

This patch implements the `llvm.loop.estimated_trip_count` metadata discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785). As [suggested in the RFC comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4), it adds the new metadata to all loops at the time of profile ingestion and estimates each trip count from the loop's `branch_weights` metadata. As [suggested in the PR#128785 review](#128785 (comment)), it does so via a `PGOEstimateTripCountsPass` pass, which creates the new metadata for the loop but omits the value if it cannot estimate a trip count due to the loop's form. An important observation not previously discussed is that `PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip count but later passes can transform the loop in a way that makes it possible. Currently, such passes do not necessarily update the metadata, but eventually that should be fixed. Until then, if the new metadata has no value, `llvm::getLoopEstimatedTripCount` disregards it and tries again to estimate the trip count from the loop's `branch_weights` metadata.

Somehow, on some of my builds, `llvm::` prefixes are dropped from some symbol names in the printed past list.

That's PR #128785, which is now a parent PR. First, remove a todo that's now documented more generally than LoopPeel in plenty of other places. Second, update LoopPeel's setLoopEstimatedTripCount call to avoid a now redundant argument that eventually won't be supported.

…runtime

As another step in issue #135812, this patch fixes block frequencies for partial loop unrolling with an epilogue remainder loop. It does not fully handle the case when the epilogue loop itself is unrolled. That will be handled in the next patch. For the guard and latch of each of the unrolled loop and epilogue loop, this patch sets branch weights derived directly from the original loop latch branch weights. The total frequency of the original loop body, summed across all its occurrences in the unrolled loop and epilogue loop, is the same as in the original loop. This patch also sets `llvm.loop.estimated_trip_count` for the epilogue loop instead of relying on the epilogue's latch branch weights to imply it. This patch removes the XFAIL directives that PR #157754 added to the test suite.

arsenm · 2025-09-17T11:18:44Z

llvm/test/Transforms/LoopUnroll/runtime-loop-branchweight.ll

@@ -1,4 +1,5 @@
 ; RUN: opt < %s -S -passes=loop-unroll -unroll-runtime=true -unroll-count=4 | FileCheck %s
+; XFAIL: *


No xfailing tests?

See last paragraph of #157754 (comment).

I wouldn't use xfail for this, check the actual baseline content and update in the next patches

check the actual baseline content

Sorry, what do you mean?

I removed the xfails and made the tests pass. Let me know whether it's what you had in mind.

…runtime

…ntime

jdenny-ornl · 2025-10-08T15:29:04Z

ping

jdenny-ornl · 2025-10-10T14:41:44Z

Thanks. As we discussed, I will land this with other patches later.

As another step in issue #135812, this patch fixes block frequencies for partial loop unrolling with an epilogue remainder loop. It does not fully handle the case when the epilogue loop itself is unrolled. That will be handled in the next patch. For the guard and latch of each of the unrolled loop and epilogue loop, this patch sets branch weights derived directly from the original loop latch branch weights. The total frequency of the original loop body, summed across all its occurrences in the unrolled loop and epilogue loop, is the same as in the original loop. This patch also sets `llvm.loop.estimated_trip_count` for the epilogue loop instead of relying on the epilogue's latch branch weights to imply it. This patch fixes branch weights in tests that PR #157754 adversely affected.

This patch implements the LoopUnroll changes discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785) and is thus another step in addressing issue llvm#135812. In summary, for the case of partial loop unrolling without a remainder loop, this patch changes LoopUnroll to: - Maintain branch weights consistently with the original loop for the sake of preserving the total frequency of the original loop body. - Store the new estimated trip count in the `llvm.loop.estimated_trip_count` metadata, introduced by PR llvm#148758. - Correct the new estimated trip count (e.g., 3 instead of 2) when the original estimated trip count (e.g., 10) divided by the unroll count (e.g., 4) leaves a remainder (e.g., 2). There are loop unrolling cases this patch does not fully fix, such as partial unrolling with a remainder loop and complete unrolling, and there are two associated tests whose branch weights this patch adversely affects. They will be addressed in future patches that should land with this patch.

As another step in issue llvm#135812, this patch fixes block frequencies for partial loop unrolling with an epilogue remainder loop. It does not fully handle the case when the epilogue loop itself is unrolled. That will be handled in the next patch. For the guard and latch of each of the unrolled loop and epilogue loop, this patch sets branch weights derived directly from the original loop latch branch weights. The total frequency of the original loop body, summed across all its occurrences in the unrolled loop and epilogue loop, is the same as in the original loop. This patch also sets `llvm.loop.estimated_trip_count` for the epilogue loop instead of relying on the epilogue's latch branch weights to imply it. This patch fixes branch weights in tests that PR llvm#157754 adversely affected.

jdenny-ornl added 30 commits March 19, 2025 16:19

Run update_test_checks.py on a test

f821eeb

Fix typo

af8ec56

Merge branch 'main' into fix-peel-branch-weights

a0264ad

Merge branch 'main' into fix-peel-branch-weights

fd29a49

Document new metadata

6303177

Improve LangRef.rst entry

bbd0e95

Merge branch 'main' into fix-peel-branch-weights

715cb0a

Merge branch 'main' into fix-peel-branch-weights

67fa67d

Update fixmes

37ce859

Extending beyond the limitations of `getExpectedExitLoopLatchBranch` is a possible improvement for the future not an urgent fixme. No one has pointed out code that computes estimated trip counts without using `llvm::getLoopEstimatedTripCount`.

Merge branch 'main' into fix-peel-branch-weights

4337dcd

Update test for AArch4, which I did not build before

5193158

Merge branch 'main' into fix-peel-branch-weights

bbd2f22

Run update script on test changed by merge from main

b23f467

The update adds this PR's new metadata.

Merge branch 'main' into fix-peel-branch-weights

e250cfc

Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights

859b84d

Merge branch 'main' into pgo-estimated-trip-count

db5920a

Add PGOEstimateTripCounts in more cases

47fbe85

Add unused initialization

f8097fb

Simplify some test changes

7b27203

Extend verify pass to cover new metadata

4c4669a

Fix test for some builds

0f40efd

Somehow, on some of my builds, `llvm::` prefixes are dropped from some symbol names in the printed past list.

Merge branch 'main' into pgo-estimated-trip-count

2791a1c

Apply some small reviewer suggestions

6148922

Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights

3f6a91a

Attempt to fix windows pre-commit CI

3a49b43

Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights

c283ebe

Merge branch 'main' into pgo-estimated-trip-count

2f7daa8

jdenny-ornl added 4 commits September 15, 2025 12:36

Empty commit to try to restart pre-commit CI

1f81310

Merge branch 'fix-peel-branch-weights' into skip-unroll-epilog-guard

2382fbd

Merge branch 'skip-unroll-epilog-guard' into fix-blockfreq-unroll-no-…

967f8a1

…runtime

Improve some code comments

2897e64

jdenny-ornl mentioned this pull request Sep 16, 2025

[LoopUnroll] Fix block frequencies for epilogue #159163

Merged

arsenm reviewed Sep 17, 2025

View reviewed changes

jdenny-ornl added 9 commits September 17, 2025 11:24

Remove xfails

876e055

Merge branch 'main' into fix-peel-branch-weights

04c8ade

Merge branch 'fix-peel-branch-weights' into skip-unroll-epilog-guard

9b80c13

Merge branch 'skip-unroll-epilog-guard' into fix-blockfreq-unroll-no-…

9215f47

…runtime

Merge branch 'main' into fix-peel-branch-weights

a1a5460

Merge branch 'fix-peel-branch-weights' into skip-unroll-epilog-guard

df9cf8c

Merge branch 'skip-unroll-epilog-guard' into fix-blockfreq-unroll-no-…

99c95b1

…runtime

Merge branch 'main' into skip-unroll-epilog-guard

4353f1f

Merge branch 'skip-unroll-epilog-guard' into fix-blockfreq-unroll-no-…

f66ae02

…runtime

Base automatically changed from users/jdenny-ornl/skip-unroll-epilog-guard to main October 7, 2025 14:45

Merge branch 'main' into users/jdenny-ornl/fix-blockfreq-unroll-no-ru…

22fdacf

…ntime

arsenm approved these changes Oct 10, 2025

View reviewed changes

jdenny-ornl added 4 commits October 13, 2025 13:07

Merge branch 'main' into fix-blockfreq-unroll-no-runtime

f625d45

Merge branch 'main' into fix-blockfreq-unroll-no-runtime

662e60f

Merge branch 'main' into fix-blockfreq-unroll-no-runtime

093ad2a

Merge branch 'main' into fix-blockfreq-unroll-no-runtime

663cf2b

jdenny-ornl merged commit 24557cc into main Oct 31, 2025
13 of 14 checks passed

jdenny-ornl deleted the users/jdenny-ornl/fix-blockfreq-unroll-no-runtime branch October 31, 2025 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LoopUnroll] Fix block frequencies when no runtime #157754

[LoopUnroll] Fix block frequencies when no runtime #157754

Uh oh!

jdenny-ornl commented Sep 9, 2025 •

edited

Loading

Uh oh!

arsenm Sep 17, 2025

Uh oh!

jdenny-ornl Sep 17, 2025

Uh oh!

arsenm Sep 17, 2025

Uh oh!

jdenny-ornl Sep 17, 2025

Uh oh!

jdenny-ornl Sep 17, 2025

Uh oh!

jdenny-ornl commented Oct 8, 2025

Uh oh!

jdenny-ornl commented Oct 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		@@ -1,4 +1,5 @@
		; RUN: opt < %s -S -passes=loop-unroll -unroll-runtime=true -unroll-count=4 \| FileCheck %s
		; XFAIL: *

[LoopUnroll] Fix block frequencies when no runtime #157754

[LoopUnroll] Fix block frequencies when no runtime #157754

Uh oh!

Conversation

jdenny-ornl commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

jdenny-ornl Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

jdenny-ornl Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

jdenny-ornl Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

jdenny-ornl commented Oct 8, 2025

Uh oh!

jdenny-ornl commented Oct 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jdenny-ornl commented Sep 9, 2025 •

edited

Loading