Skip to content

Commit c8e2f43

Browse files
[MemProf] Select largest of matching contexts from profile (#165338)
We aren't currently deduplicating contexts that are identical or nearly identical (differing inline frame information) when generating the profile. When we have multiple identical contexts we end up conservatively marking it as non-cold, even if some are much smaller in terms of bytes allocated. This was causing us to lose sight of a very large cold context, because we had a small non-cold one that only differed in the inlining (which we don't consider when matching as the inlining could change or be incomplete at that point in compilation). Likely the smaller one was from binary with much smaller usage and therefore not yet detected as cold. Deduplicate the alloc contexts for a function before applying the profile, selecting the largest one, or conservatively selecting the non-cold one if they are the same size. This caused a minor difference to an existing test (memprof_loop_unroll.ll), which now only gets one message for the duplicate context instead of 2. While here, convert to the text version of the profile.
1 parent deb54ba commit c8e2f43

File tree

3 files changed

+182
-17
lines changed

3 files changed

+182
-17
lines changed

llvm/lib/Transforms/Instrumentation/MemProfUse.cpp

Lines changed: 28 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -127,15 +127,19 @@ static uint64_t computeStackId(const memprof::Frame &Frame) {
127127
return computeStackId(Frame.Function, Frame.LineOffset, Frame.Column);
128128
}
129129

130+
static AllocationType getAllocType(const AllocationInfo *AllocInfo) {
131+
return getAllocType(AllocInfo->Info.getTotalLifetimeAccessDensity(),
132+
AllocInfo->Info.getAllocCount(),
133+
AllocInfo->Info.getTotalLifetime());
134+
}
135+
130136
static AllocationType addCallStack(CallStackTrie &AllocTrie,
131137
const AllocationInfo *AllocInfo,
132138
uint64_t FullStackId) {
133139
SmallVector<uint64_t> StackIds;
134140
for (const auto &StackFrame : AllocInfo->CallStack)
135141
StackIds.push_back(computeStackId(StackFrame));
136-
auto AllocType = getAllocType(AllocInfo->Info.getTotalLifetimeAccessDensity(),
137-
AllocInfo->Info.getAllocCount(),
138-
AllocInfo->Info.getTotalLifetime());
142+
auto AllocType = getAllocType(AllocInfo);
139143
std::vector<ContextTotalSize> ContextSizeInfo;
140144
if (recordContextSizeInfoForAnalysis()) {
141145
auto TotalSize = AllocInfo->Info.getTotalSize();
@@ -405,22 +409,39 @@ handleAllocSite(Instruction &I, CallBase *CI,
405409
const std::set<const AllocationInfo *> &AllocInfoSet,
406410
std::map<std::pair<uint64_t, unsigned>, AllocMatchInfo>
407411
&FullStackIdToAllocMatchInfo) {
412+
// TODO: Remove this once the profile creation logic deduplicates contexts
413+
// that are the same other than the IsInlineFrame bool. Until then, keep the
414+
// largest.
415+
DenseMap<uint64_t, const AllocationInfo *> UniqueFullContextIdAllocInfo;
416+
for (auto *AllocInfo : AllocInfoSet) {
417+
auto FullStackId = computeFullStackId(AllocInfo->CallStack);
418+
auto [It, Inserted] =
419+
UniqueFullContextIdAllocInfo.insert({FullStackId, AllocInfo});
420+
// If inserted entry, done.
421+
if (Inserted)
422+
continue;
423+
// Keep the larger one, or the noncold one if they are the same size.
424+
auto CurSize = It->second->Info.getTotalSize();
425+
auto NewSize = AllocInfo->Info.getTotalSize();
426+
if ((CurSize > NewSize) ||
427+
(CurSize == NewSize &&
428+
getAllocType(AllocInfo) != AllocationType::NotCold))
429+
continue;
430+
It->second = AllocInfo;
431+
}
408432
// We may match this instruction's location list to multiple MIB
409433
// contexts. Add them to a Trie specialized for trimming the contexts to
410434
// the minimal needed to disambiguate contexts with unique behavior.
411435
CallStackTrie AllocTrie(&ORE, MaxColdSize);
412436
uint64_t TotalSize = 0;
413437
uint64_t TotalColdSize = 0;
414-
for (auto *AllocInfo : AllocInfoSet) {
438+
for (auto &[FullStackId, AllocInfo] : UniqueFullContextIdAllocInfo) {
415439
// Check the full inlined call stack against this one.
416440
// If we found and thus matched all frames on the call, include
417441
// this MIB.
418442
if (stackFrameIncludesInlinedCallStack(AllocInfo->CallStack,
419443
InlinedCallStack)) {
420444
NumOfMemProfMatchedAllocContexts++;
421-
uint64_t FullStackId = 0;
422-
if (ClPrintMemProfMatchInfo || recordContextSizeInfoForAnalysis())
423-
FullStackId = computeFullStackId(AllocInfo->CallStack);
424445
auto AllocType = addCallStack(AllocTrie, AllocInfo, FullStackId);
425446
TotalSize += AllocInfo->Info.getTotalSize();
426447
if (AllocType == AllocationType::Cold)
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
;; Tests that the compiler ignores smaller contexts that differ only in the
2+
;; IsInlineFrame bool. These map to the same full context id internally, as we
3+
;; ignore the inline frame status which may differ in feedback compiles.
4+
;; Presumably this happens when profiles collected from different binaries are
5+
;; merged. If we didn't pick the largest we would default them all to noncold.
6+
7+
;; Avoid failures on big-endian systems that can't read the profile properly
8+
; REQUIRES: x86_64-linux
9+
10+
;; Generate the profile and the IR.
11+
; RUN: split-file %s %t
12+
13+
;; Generate indexed profile
14+
; RUN: llvm-profdata merge %t/memprof_diff_inline.yaml -o %t.memprofdata
15+
16+
; RUN: opt < %t/memprof_diff_inline.ll -passes='memprof-use<profile-filename=%t.memprofdata>' -S -memprof-report-hinted-sizes -memprof-print-match-info 2>&1 | FileCheck %s --check-prefixes=MEMPROF
17+
18+
; MEMPROF: MemProf notcold context with id 10194276560488437434 has total profiled size 200 is matched with 1 frames
19+
; MEMPROF: MemProf cold context with id 16342802530253093571 has total profiled size 10000 is matched with 1 frames
20+
21+
;--- memprof_diff_inline.yaml
22+
---
23+
HeapProfileRecords:
24+
- GUID: _Z3foov
25+
AllocSites:
26+
# Small non-cold, full context id 16342802530253093571, should ignore
27+
- Callstack:
28+
- { Function: _Z3foov, LineOffset: 1, Column: 10, IsInlineFrame: false }
29+
- { Function: _Z4foo2v, LineOffset: 1, Column: 10, IsInlineFrame: false }
30+
- { Function: _Z3barv, LineOffset: 1, Column: 10, IsInlineFrame: false }
31+
- { Function: main, LineOffset: 8, Column: 13, IsInlineFrame: false }
32+
MemInfoBlock:
33+
AllocCount: 1
34+
TotalSize: 10
35+
TotalLifetime: 0
36+
TotalLifetimeAccessDensity: 20000
37+
# Large cold, full context id 16342802530253093571, should keep
38+
- Callstack:
39+
- { Function: _Z3foov, LineOffset: 1, Column: 10, IsInlineFrame: false }
40+
- { Function: _Z4foo2v, LineOffset: 1, Column: 10, IsInlineFrame: true }
41+
- { Function: _Z3barv, LineOffset: 1, Column: 10, IsInlineFrame: false }
42+
- { Function: main, LineOffset: 8, Column: 13, IsInlineFrame: false }
43+
MemInfoBlock:
44+
AllocCount: 1
45+
TotalSize: 10000
46+
TotalLifetime: 200000
47+
TotalLifetimeAccessDensity: 0
48+
# Small non-cold, full context id 16342802530253093571, should ignore
49+
- Callstack:
50+
- { Function: _Z3foov, LineOffset: 1, Column: 10, IsInlineFrame: false }
51+
- { Function: _Z4foo2v, LineOffset: 1, Column: 10, IsInlineFrame: false }
52+
- { Function: _Z3barv, LineOffset: 1, Column: 10, IsInlineFrame: true }
53+
- { Function: main, LineOffset: 8, Column: 13, IsInlineFrame: false }
54+
MemInfoBlock:
55+
AllocCount: 1
56+
TotalSize: 100
57+
TotalLifetime: 0
58+
TotalLifetimeAccessDensity: 20000
59+
# Small non-cold, full context id 10194276560488437434
60+
- Callstack:
61+
- { Function: _Z3foov, LineOffset: 1, Column: 10, IsInlineFrame: false }
62+
- { Function: _Z4foo2v, LineOffset: 1, Column: 10, IsInlineFrame: false }
63+
- { Function: _Z3barv, LineOffset: 1, Column: 10, IsInlineFrame: false }
64+
- { Function: main, LineOffset: 9, Column: 13, IsInlineFrame: false }
65+
MemInfoBlock:
66+
AllocCount: 1
67+
TotalSize: 200
68+
TotalLifetime: 0
69+
TotalLifetimeAccessDensity: 20000
70+
CallSites: []
71+
...
72+
;--- memprof_diff_inline.ll
73+
; ModuleID = 'memprof_diff_inline.cc'
74+
source_filename = "memprof_diff_inline.cc"
75+
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
76+
target triple = "x86_64-unknown-linux-gnu"
77+
78+
%"struct.std::nothrow_t" = type { i8 }
79+
80+
@_ZSt7nothrow = external global %"struct.std::nothrow_t", align 1
81+
82+
define dso_local noundef ptr @_Z3foov() !dbg !10 {
83+
entry:
84+
; MEMPROF: call {{.*}} @_Znwm{{.*}} !memprof ![[M1:[0-9]+]], !callsite ![[C1:[0-9]+]]
85+
%call = call noalias noundef align 32 ptr @_Znwm(i64 noundef 32) #6, !dbg !13
86+
ret ptr %call
87+
}
88+
89+
declare noundef ptr @_Znwm(i64 noundef)
90+
91+
attributes #6 = { builtin allocsize(0) }
92+
93+
; MEMPROF: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]]}
94+
95+
; MEMPROF: ![[MIB1]] = !{![[STACK1:[0-9]+]], !"notcold", ![[CONTEXTSIZE1:[0-9]+]]}
96+
; MEMPROF: ![[STACK1]] = !{i64 2732490490862098848, i64 8467819354083268568, i64 9086428284934609951, i64 2061451396820446691}
97+
;; Full context id 10194276560488437434 == -8252467513221114182
98+
; MEMPROF: ![[CONTEXTSIZE1]] = !{i64 -8252467513221114182, i64 200}
99+
100+
; MEMPROF: ![[MIB2]] = !{![[STACK2:[0-9]+]], !"cold", ![[CONTEXTSIZE2:[0-9]+]]}
101+
; MEMPROF: ![[STACK2]] = !{i64 2732490490862098848, i64 8467819354083268568, i64 9086428284934609951, i64 -5747251260480066785}
102+
;; Full context id 16342802530253093571 == -2103941543456458045
103+
;; We should have kept the large (cold) one.
104+
; MEMPROF: ![[CONTEXTSIZE2]] = !{i64 -2103941543456458045, i64 10000}
105+
106+
; MEMPROF: ![[C1]] = !{i64 2732490490862098848}
107+
108+
!llvm.dbg.cu = !{!0}
109+
!llvm.module.flags = !{!2, !3}
110+
111+
!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !1, producer: "clang version 15.0.0 (https://github.com/llvm/llvm-project.git 6cbe6284d1f0a088b5c6482ae27b738f03d82fe7)", isOptimized: false, runtimeVersion: 0, emissionKind: LineTablesOnly, splitDebugInlining: false, debugInfoForProfiling: true, nameTableKind: None)
112+
!1 = !DIFile(filename: "memprof.cc", directory: "/usr/local/google/home/tejohnson/llvm/tmp", checksumkind: CSK_MD5, checksum: "e8c40ebe4b21776b4d60e9632cbc13c2")
113+
!2 = !{i32 7, !"Dwarf Version", i32 5}
114+
!3 = !{i32 2, !"Debug Info Version", i32 3}
115+
!10 = distinct !DISubprogram(name: "foo", linkageName: "_Z3foov", scope: !1, file: !1, line: 4, type: !11, scopeLine: 4, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !12)
116+
!11 = !DISubroutineType(types: !12)
117+
!12 = !{}
118+
!13 = !DILocation(line: 5, column: 10, scope: !10)

llvm/test/Transforms/PGOProfile/memprof_loop_unroll.ll

Lines changed: 36 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,24 +4,50 @@
44
;; Avoid failures on big-endian systems that can't read the profile properly
55
; REQUIRES: x86_64-linux
66

7-
;; TODO: Use text profile inputs once that is available for memprof.
8-
;; # To update the Inputs below, run Inputs/update_memprof_inputs.sh.
9-
;; # To generate below LLVM IR for use in matching.
10-
;; $ clang++ -gmlt -fdebug-info-for-profiling -S %S/Inputs/memprof_loop_unroll_b.cc -emit-llvm
7+
; Generate the profile and the IR.
8+
; RUN: split-file %s %t
9+
10+
;; Generate indexed profile
11+
; RUN: llvm-profdata merge %t/memprof_loop_unroll.yaml -o %t.memprofdata
1112

12-
; RUN: llvm-profdata merge %S/Inputs/memprof_loop_unroll.memprofraw --profiled-binary %S/Inputs/memprof_loop_unroll.exe -o %t.memprofdata
1313
;; Set the minimum lifetime threshold to 0 to ensure that one context is
1414
;; considered cold (the other will be notcold).
15-
; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -S -memprof-report-hinted-sizes -memprof-ave-lifetime-cold-threshold=0 2>&1 | FileCheck %s
15+
; RUN: opt < %t/memprof_loop_unroll.ll -passes='memprof-use<profile-filename=%t.memprofdata>' -S -memprof-report-hinted-sizes -memprof-ave-lifetime-cold-threshold=0 2>&1 | FileCheck %s
1616

17-
;; Conservatively annotate as not cold. We get two messages as there are two
18-
;; unrolled copies of the allocation.
19-
; CHECK: MemProf hinting: Total size for full allocation context hash {{.*}} and indistinguishable alloc type notcold: 4
20-
; CHECK: MemProf hinting: Total size for full allocation context hash {{.*}} and indistinguishable alloc type notcold: 4
17+
;; Conservatively annotate as not cold.
18+
; CHECK: MemProf hinting: Total size for full allocation context hash {{.*}} and single alloc type notcold: 4
2119
; CHECK: call {{.*}} @_Znam{{.*}} #[[ATTR:[0-9]+]]
2220
; CHECK: attributes #[[ATTR]] = { builtin allocsize(0) "memprof"="notcold" }
2321
; CHECK-NOT: stackIds: ()
2422

23+
;--- memprof_loop_unroll.yaml
24+
---
25+
HeapProfileRecords:
26+
- GUID: 0x7f8d88fcc70a347b
27+
AllocSites:
28+
- Callstack:
29+
- { Function: 0x7f8d88fcc70a347b, LineOffset: 2, Column: 16, IsInlineFrame: false }
30+
- { Function: 0xdb956436e78dd5fa, LineOffset: 1, Column: 5, IsInlineFrame: false }
31+
MemInfoBlock:
32+
AllocCount: 1
33+
TotalSize: 4
34+
TotalLifetime: 2
35+
TotalLifetimeAccessDensity: 12500000000
36+
- Callstack:
37+
- { Function: 0x7f8d88fcc70a347b, LineOffset: 2, Column: 16, IsInlineFrame: false }
38+
- { Function: 0xdb956436e78dd5fa, LineOffset: 1, Column: 5, IsInlineFrame: false }
39+
MemInfoBlock:
40+
AllocCount: 1
41+
TotalSize: 4
42+
TotalLifetime: 2
43+
TotalLifetimeAccessDensity: 0
44+
- GUID: 0xdb956436e78dd5fa
45+
CallSites:
46+
- Frames:
47+
- { Function: 0xdb956436e78dd5fa, LineOffset: 1, Column: 5, IsInlineFrame: false }
48+
...
49+
50+
;--- memprof_loop_unroll.ll
2551
; ModuleID = 'memprof_loop_unroll_b.cc'
2652
source_filename = "memprof_loop_unroll_b.cc"
2753
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"

0 commit comments

Comments
 (0)