Skip to content

Commit 97e7802

Browse files
[memprof] Speed up caller-callee pair extraction
We know that the MemProf profile has a lot of duplicate call stacks. Extracting caller-callee pairs from a call stack we've seen before is a wasteful effort. This patch makes the extraction more efficient by first coming up with a work list of linear call stack IDs -- the set of starting positions in the radix tree array -- and then extract caller-callee pairs from each call stack in the work list. We implement the work list as a bit vector because we expect the work list to be dense in the range [0, RadixTreeSize). Also, we want the set insertion to be cheap. Without this patch, it takes 25 seconds to extract caller-callee pairs from a large MemProf profile. This patch shortenes that down to 4 seconds.
1 parent d119d43 commit 97e7802

File tree

2 files changed

+18
-1
lines changed

2 files changed

+18
-1
lines changed

llvm/include/llvm/ProfileData/InstrProfReader.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -683,6 +683,8 @@ class IndexedMemProfReader {
683683
const unsigned char *FrameBase = nullptr;
684684
/// The starting address of the call stack array.
685685
const unsigned char *CallStackBase = nullptr;
686+
// The number of elements in the radix tree array.
687+
unsigned RadixTreeSize = 0;
686688

687689
Error deserializeV012(const unsigned char *Start, const unsigned char *Ptr,
688690
uint64_t FirstWord);

llvm/lib/ProfileData/InstrProfReader.cpp

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1303,6 +1303,10 @@ Error IndexedMemProfReader::deserializeV3(const unsigned char *Start,
13031303
FrameBase = Ptr;
13041304
CallStackBase = Start + CallStackPayloadOffset;
13051305

1306+
// Compute the number of elements in the radix tree array.
1307+
RadixTreeSize = (RecordPayloadOffset - CallStackPayloadOffset) /
1308+
sizeof(memprof::LinearFrameId);
1309+
13061310
// Now initialize the table reader with a pointer into data buffer.
13071311
MemProfRecordTable.reset(MemProfRecordHashTable::Create(
13081312
/*Buckets=*/Start + RecordTableOffset,
@@ -1674,11 +1678,22 @@ IndexedMemProfReader::getMemProfCallerCalleePairs() const {
16741678
memprof::LinearFrameIdConverter FrameIdConv(FrameBase);
16751679
memprof::CallerCalleePairExtractor Extractor(CallStackBase, FrameIdConv);
16761680

1681+
// The set of linear call stack IDs that we need to traverse from. We expect
1682+
// the set to be dense, so we use a BitVector.
1683+
BitVector Worklist(RadixTreeSize);
1684+
1685+
// Collect the set of linear call stack IDs. Since we expect a lot of
1686+
// duplicates, we first collect them in the form of a bit vector before
1687+
// processing them.
16771688
for (const memprof::IndexedMemProfRecord &IndexedRecord :
16781689
MemProfRecordTable->data())
16791690
for (const memprof::IndexedAllocationInfo &IndexedAI :
16801691
IndexedRecord.AllocSites)
1681-
Extractor(IndexedAI.CSId);
1692+
Worklist.set(IndexedAI.CSId);
1693+
1694+
// Collect caller-callee pairs for each linear call stack ID in Worklist.
1695+
for (unsigned CS : Worklist.set_bits())
1696+
Extractor(CS);
16821697

16831698
DenseMap<uint64_t, SmallVector<memprof::CallEdgeTy, 0>> Pairs =
16841699
std::move(Extractor.CallerCalleePairs);

0 commit comments

Comments
 (0)