Skip to content

Commit 24557cc

Browse files
authored
[LoopUnroll] Fix block frequencies when no runtime (#157754)
This patch implements the LoopUnroll changes discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785) and is thus another step in addressing issue #135812. In summary, for the case of partial loop unrolling without a remainder loop, this patch changes LoopUnroll to: - Maintain branch weights consistently with the original loop for the sake of preserving the total frequency of the original loop body. - Store the new estimated trip count in the `llvm.loop.estimated_trip_count` metadata, introduced by PR #148758. - Correct the new estimated trip count (e.g., 3 instead of 2) when the original estimated trip count (e.g., 10) divided by the unroll count (e.g., 4) leaves a remainder (e.g., 2). There are loop unrolling cases this patch does not fully fix, such as partial unrolling with a remainder loop and complete unrolling, and there are two associated tests whose branch weights this patch adversely affects. They will be addressed in future patches that should land with this patch.
1 parent e72876a commit 24557cc

File tree

5 files changed

+108
-8
lines changed

5 files changed

+108
-8
lines changed

llvm/lib/Transforms/Utils/LoopUnroll.cpp

Lines changed: 33 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -499,9 +499,8 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI,
499499

500500
const unsigned MaxTripCount = SE->getSmallConstantMaxTripCount(L);
501501
const bool MaxOrZero = SE->isBackedgeTakenCountMaxOrZero(L);
502-
unsigned EstimatedLoopInvocationWeight = 0;
503502
std::optional<unsigned> OriginalTripCount =
504-
llvm::getLoopEstimatedTripCount(L, &EstimatedLoopInvocationWeight);
503+
llvm::getLoopEstimatedTripCount(L);
505504

506505
// Effectively "DCE" unrolled iterations that are beyond the max tripcount
507506
// and will never be executed.
@@ -1131,10 +1130,38 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI,
11311130
// We shouldn't try to use `L` anymore.
11321131
L = nullptr;
11331132
} else if (OriginalTripCount) {
1134-
// Update the trip count. Note that the remainder has already logic
1135-
// computing it in `UnrollRuntimeLoopRemainder`.
1136-
setLoopEstimatedTripCount(L, *OriginalTripCount / ULO.Count,
1137-
EstimatedLoopInvocationWeight);
1133+
// Update metadata for the loop's branch weights and estimated trip count:
1134+
// - If ULO.Runtime, UnrollRuntimeLoopRemainder sets the guard branch
1135+
// weights, latch branch weights, and estimated trip count of the
1136+
// remainder loop it creates. It also sets the branch weights for the
1137+
// unrolled loop guard it creates. The branch weights for the unrolled
1138+
// loop latch are adjusted below. FIXME: Actually handle ULO.Runtime.
1139+
// - Otherwise, if unrolled loop iteration latches become unconditional,
1140+
// branch weights are adjusted above. FIXME: Actually handle such
1141+
// unconditional latches.
1142+
// - Otherwise, the original loop's branch weights are correct for the
1143+
// unrolled loop, so do not adjust them.
1144+
// - In all cases, the unrolled loop's estimated trip count is set below.
1145+
//
1146+
// As an example of the last case, consider what happens if the unroll count
1147+
// is 4 for a loop with an estimated trip count of 10 when we do not create
1148+
// a remainder loop and all iterations' latches remain conditional. Each
1149+
// unrolled iteration's latch still has the same probability of exiting the
1150+
// loop as it did when in the original loop, and thus it should still have
1151+
// the same branch weights. Each unrolled iteration's non-zero probability
1152+
// of exiting already appropriately reduces the probability of reaching the
1153+
// remaining iterations just as it did in the original loop. Trying to also
1154+
// adjust the branch weights of the final unrolled iteration's latch (i.e.,
1155+
// the backedge for the unrolled loop as a whole) to reflect its new trip
1156+
// count of 3 will erroneously further reduce its block frequencies.
1157+
// However, in case an analysis later needs to estimate the trip count of
1158+
// the unrolled loop as a whole without considering the branch weights for
1159+
// each unrolled iteration's latch within it, we store the new trip count as
1160+
// separate metadata.
1161+
unsigned NewTripCount = *OriginalTripCount / ULO.Count;
1162+
if (!ULO.Runtime && *OriginalTripCount % ULO.Count)
1163+
NewTripCount += 1;
1164+
setLoopEstimatedTripCount(L, NewTripCount);
11381165
}
11391166

11401167
// LoopInfo should not be valid, confirm that.
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
; Test branch weight metadata, estimated trip count metadata, and block
2+
; frequencies after partial loop unrolling without -unroll-runtime.
3+
4+
; RUN: opt < %s -S -passes='print<block-freq>' 2>&1 | \
5+
; RUN: FileCheck -check-prefix=CHECK %s
6+
7+
; The -implicit-check-not options make sure that no additional labels or calls
8+
; to @f show up.
9+
; RUN: opt < %s -S -passes='loop-unroll,print<block-freq>' \
10+
; RUN: -unroll-count=4 2>&1 | \
11+
; RUN: FileCheck %s -check-prefix=CHECK-UR \
12+
; RUN: -implicit-check-not='{{^( *- )?[^ ;]*:}}' \
13+
; RUN: -implicit-check-not='call void @f'
14+
15+
; CHECK: block-frequency-info: test
16+
; CHECK: do.body: float = 10.0,
17+
18+
; The sum should still be ~10.
19+
;
20+
; CHECK-UR: block-frequency-info: test
21+
; CHECK-UR: - [[ENTRY:.*]]:
22+
; CHECK-UR: - [[DO_BODY:.*]]: float = 2.9078,
23+
; CHECK-UR: - [[DO_BODY_1:.*]]: float = 2.617,
24+
; CHECK-UR: - [[DO_BODY_2:.*]]: float = 2.3553,
25+
; CHECK-UR: - [[DO_BODY_3:.*]]: float = 2.1198,
26+
; CHECK-UR: - [[DO_END:.*]]:
27+
28+
declare void @f(i32)
29+
30+
define void @test(i32 %n) {
31+
; CHECK-UR-LABEL: define void @test(i32 %{{.*}}) {
32+
; CHECK-UR: [[ENTRY]]:
33+
; CHECK-UR: br label %[[DO_BODY]]
34+
; CHECK-UR: [[DO_BODY]]:
35+
; CHECK-UR: call void @f
36+
; CHECK-UR: br i1 %{{.*}}, label %[[DO_END]], label %[[DO_BODY_1]], !prof ![[#PROF:]]
37+
; CHECK-UR: [[DO_BODY_1]]:
38+
; CHECK-UR: call void @f
39+
; CHECK-UR: br i1 %{{.*}}, label %[[DO_END]], label %[[DO_BODY_2]], !prof ![[#PROF]]
40+
; CHECK-UR: [[DO_BODY_2]]:
41+
; CHECK-UR: call void @f
42+
; CHECK-UR: br i1 %{{.*}}, label %[[DO_END]], label %[[DO_BODY_3]], !prof ![[#PROF]]
43+
; CHECK-UR: [[DO_BODY_3]]:
44+
; CHECK-UR: call void @f
45+
; CHECK-UR: br i1 %{{.*}}, label %[[DO_END]], label %[[DO_BODY]], !prof ![[#PROF]], !llvm.loop ![[#LOOP_UR_LATCH:]]
46+
; CHECK-UR: [[DO_END]]:
47+
; CHECK-UR: ret void
48+
49+
entry:
50+
br label %do.body
51+
52+
do.body:
53+
%i = phi i32 [ 0, %entry ], [ %inc, %do.body ]
54+
%inc = add i32 %i, 1
55+
call void @f(i32 %i)
56+
%c = icmp sge i32 %inc, %n
57+
br i1 %c, label %do.end, label %do.body, !prof !0
58+
59+
do.end:
60+
ret void
61+
}
62+
63+
!0 = !{!"branch_weights", i32 1, i32 9}
64+
65+
; CHECK-UR: ![[#PROF]] = !{!"branch_weights", i32 1, i32 9}
66+
; CHECK-UR: ![[#LOOP_UR_LATCH]] = distinct !{![[#LOOP_UR_LATCH]], ![[#LOOP_UR_TC:]], ![[#DISABLE:]]}
67+
; CHECK-UR: ![[#LOOP_UR_TC]] = !{!"llvm.loop.estimated_trip_count", i32 3}
68+
; CHECK-UR: ![[#DISABLE]] = !{!"llvm.loop.unroll.disable"}

llvm/test/Transforms/LoopUnroll/runtime-loop-branchweight.ll

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,10 @@
66
; CHECK: br i1 [[COND1:%.*]], label %for.end.loopexit.unr-lcssa, label %for.body, !prof ![[#PROF:]], !llvm.loop ![[#LOOP:]]
77
; CHECK-LABEL: for.body.epil:
88
; CHECK: br i1 [[COND2:%.*]], label %for.body.epil, label %for.end.loopexit.epilog-lcssa, !prof ![[#PROF2:]], !llvm.loop ![[#LOOP2:]]
9-
; CHECK: ![[#PROF]] = !{!"branch_weights", i32 1, i32 2499}
9+
10+
; FIXME: These branch weights are incorrect and should not be merged into main
11+
; until PR #159163, which fixes them.
12+
; CHECK: ![[#PROF]] = !{!"branch_weights", i32 1, i32 9999}
1013
; CHECK: ![[#PROF2]] = !{!"branch_weights", i32 1, i32 1}
1114

1215
define i3 @test(ptr %a, i3 %n) {

llvm/test/Transforms/LoopUnroll/unroll-heuristics-pgo.ll

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,5 +60,7 @@ loop.end:
6060
!1 = !{!"function_entry_count", i64 1}
6161
!2 = !{!"branch_weights", i32 1, i32 1000}
6262

63-
; CHECK: [[PROF0]] = !{!"branch_weights", i32 1, i32 124}
63+
; FIXME: These branch weights are incorrect and should not be merged into main
64+
; until PR #159163, which fixes them.
65+
; CHECK: [[PROF0]] = !{!"branch_weights", i32 1, i32 1000}
6466
; CHECK: [[PROF1]] = !{!"branch_weights", i32 3, i32 1}

0 commit comments

Comments
 (0)