-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[LoopPeel] Fix branch weights' effect on block frequencies #128785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
be2ad30
to
cec331a
Compare
cec331a
to
843b4cf
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
843b4cf
to
9d72e03
Compare
For example: ``` declare void @f(i32) define void @test(i32 %n) { entry: br label %do.body do.body: %i = phi i32 [ 0, %entry ], [ %inc, %do.body ] %inc = add i32 %i, 1 call void @f(i32 %i) %c = icmp sge i32 %inc, %n br i1 %c, label %do.end, label %do.body, !prof !0 do.end: ret void } !0 = !{!"branch_weights", i32 1, i32 9} ``` Given those branch weights, once any loop iteration is actually reached, the probability of the loop exiting at the iteration's end is 1/(1+9). That is, the loop is likely to exit every 10 iterations and thus has an estimated trip count of 10. `opt -passes='print<block-freq>'` shows that 10 is indeed the frequency of the loop body: ``` Printing analysis results of BFI for function 'test': block-frequency-info: test - entry: float = 1.0, int = 1801439852625920 - do.body: float = 10.0, int = 18014398509481984 - do.end: float = 1.0, int = 1801439852625920 ``` Key Observation: The frequency of reaching any particular iteration is less than for the previous iteration because the previous iteration has a non-zero probability of exiting the loop. This observation holds even though every loop iteration, once actually reached, has exactly the same probability of exiting and thus exactly the same branch weights. Now we use `opt -unroll-force-peel-count=2 -passes=loop-unroll` to peel 2 iterations and insert them before the remaining loop. We expect the key observation above not to change, but it does under the implementation without this patch. The block frequency becomes 1.0 for the first iteration, 0.9 for the second, and 6.4 for the main loop body. Again, a decreasing frequency is expected, but it decreases too much: the total frequency of the original loop body becomes 8.3. The new branch weights reveal the problem: ``` !0 = !{!"branch_weights", i32 1, i32 9} !1 = !{!"branch_weights", i32 1, i32 8} !2 = !{!"branch_weights", i32 1, i32 7} ``` The exit probability is now 1/10 for the first peeled iteration, 1/9 for the second, and 1/8 for the remaining loop iterations. It seems this behavior is trying to ensure a decreasing block frequency. However, as in the key observation above for the original loop, that happens correctly without decreasing the branch weights across iterations. This patch changes the peeling implementation not to decrease the branch weights across loop iterations so that the frequency for every iteration is the same as it was in the original loop. The total frequency of the loop body, summed across all its occurrences, thus remains 10 after peeling. Unfortunately, that change means a later analysis cannot accurately estimate the trip count of the remaining loop while examining the remaining loop in isolation without considering the probability of actually reaching it. For that purpose, this patch stores the new trip count as separate metadata named `llvm.loop.estimated_trip_count` and extends `llvm::getLoopEstimatedTripCount` to prefer it, if present, over branch weights. An alternative fix is for `llvm::getLoopEstimatedTripCount` to subtract the `llvm.loop.peeled.count` metadata from the trip count estimated by a loop's branch weights. However, there might be other loop transformations that still corrupt block frequencies in a similar manner and require a similar fix. `llvm.loop.estimated_trip_count` is intended to provide a general way to store estimated trip counts when branch weights cannot directly store them. This patch introduces several fixme comments that need to be addressed before it can land.
9d72e03
to
f413520
Compare
Extending beyond the limitations of `getExpectedExitLoopLatchBranch` is a possible improvement for the future not an urgent fixme. No one has pointed out code that computes estimated trip counts without using `llvm::getLoopEstimatedTripCount`.
|
judging from reactiong on the RFC it seems others fine with the new metadata. Then no objections from my side. |
No, I didn't. |
ea9addb
to
83ac767
Compare
I think CI has gone awry. |
PR #152775 seems to have landed successfully. This PR is ready for review again. |
ping |
// FIXME: The previous updateBranchWeights implementation had this | ||
// comment: | ||
// | ||
// Don't set the probability of taking the edge from latch to loop header | ||
// to less than 1:1 ratio (meaning Weight should not be lower than | ||
// SubWeight), as this could significantly reduce the loop's hotness, | ||
// which would be incorrect in the case of underestimating the trip count. | ||
// | ||
// See e8d5db206c2f commit log for further discussion. That seems to | ||
// suggest that we should avoid ever setting a trip count of < 2 here | ||
// (equal chance of continuing and exiting means the loop will likely | ||
// continue once and then exit once). Or is keeping the original branch | ||
// weights already a sufficient improvement for whatever analysis cares | ||
// about this case? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After many months, there have been no remarks on this fixme comment. I am inclined to just remove the comment.
@mtrofin Thanks for accepting. Is it ok with you if I just remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have decided to move forward with removing the fixme and merging the PR. If there are objections after the merge, I can revert or fix.
I added the fixme to this PR because I do not understand the importance of the associated prior comments from others. There has been no response to the fixme since I posted the PR seven months ago, so I assume it is not important going forward.
[LoopPeel] Fix branch weights' effect on block frequencies This patch implements the LoopPeel changes discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785). In summary, a loop's latch block can have branch weight metadata that encodes an estimated trip count that is derived from application profile data. Initially, the loop body's block frequencies agree with the estimated trip count, as expected. However, sometimes loop transformations adjust those branch weights in a way that correctly maintains the estimated trip count but that corrupts the block frequencies. This patch addresses that problem in LoopPeel, which it changes to: - Maintain branch weights consistently with the original loop for the sake of preserving the total frequency of the original loop body. - Store the new estimated trip count in the `llvm.loop.estimated_trip_count` metadata, introduced by PR llvm#148758.
[LoopPeel] Fix branch weights' effect on block frequencies
This patch implements the LoopPeel changes discussed in [RFC] Fix Loop Transformations to Preserve Block Frequencies.
In summary, a loop's latch block can have branch weight metadata that encodes an estimated trip count that is derived from application profile data. Initially, the loop body's block frequencies agree with the estimated trip count, as expected. However, sometimes loop transformations adjust those branch weights in a way that correctly maintains the estimated trip count but that corrupts the block frequencies. This patch addresses that problem in LoopPeel, which it changes to:
llvm.loop.estimated_trip_count
metadata, introduced by PR [PGO] Addllvm.loop.estimated_trip_count
metadata #148758.