Skip to content

Conversation

@kazutakahirata
Copy link
Contributor

This patch adds InstrProfWriter::addMemProfData, which adds the
complete MemProf profile (frames, call stacks, and records) to the
writer context.

Without this function, functions like loadInput in llvm-profdata.cpp
and InstrProfWriter::mergeRecordsFromWriter must add one item (frame,
call stack, or record) at a time. The new function std::moves the
entire MemProf profile to the writer context if the destination is
empty, which is the common use case. Otherwise, we fall back to
adding one item at a time behind the scene.

Here are a couple of reasons why we should add this function:

  • We've had a bug where we forgot to add one of the three data
    structures (frames, call stacks, and records) to the writer context,
    resulting in a nearly empty indexed profile. We should always
    package the three data structures together, especially on API
    boundaries.

  • We expose a little too much of the MemProf detail to
    InstrProfWriter. I'd like to gradually transform
    InstrProfReader/Writer to entities managing buffers (sequences of
    bytes), with actual serialization/deserialization left to external
    classes. We already do some of this in InstrProfReader, where
    InstrProfReader "contracts out" to IndexedMemProfReader to handle
    MemProf details.

I am not changing loadInput or InstrProfWriter::mergeRecordsFromWriter
for now because MemProfReader uses DenseMap for frames and call
stacks, whereas MemProfData uses MapVector. I'll resolve these
mismatches in subsequent patches.

This patch adds InstrProfWriter::addMemProfData, which adds the
complete MemProf profile (frames, call stacks, and records) to the
writer context.

Without this function, functions like loadInput in llvm-profdata.cpp
and InstrProfWriter::mergeRecordsFromWriter must add one item (frame,
call stack, or record) at a time.  The new function std::moves the
entire MemProf profile to the writer context if the destination is
empty, which is the common use case.  Otherwise, we fall back to
adding one item at a time behind the scene.

Here are a couple of reasons why we should add this function:

- We've had a bug where we forgot to add one of the three data
  structures (frames, call stacks, and records) to the writer context,
  resulting in a nearly empty indexed profile.  We should always
  package the three data structures together, especially on API
  boundaries.

- We expose a little too much of the MemProf detail to
  InstrProfWriter.  I'd like to gradually transform
  InstrProfReader/Writer to entities managing buffers (sequences of
  bytes), with actual serialization/deserialization left to external
  classes.  We already do some of this in InstrProfReader, where
  InstrProfReader "contracts out" to IndexedMemProfReader to handle
  MemProf details.

I am not changing loadInput or InstrProfWriter::mergeRecordsFromWriter
for now because MemProfReader uses DenseMap for frames and call
stacks, whereas MemProfData uses MapVector.  I'll resolve these
mismatches in subsequent patches.
@llvmbot llvmbot added the PGO Profile Guided Optimizations label Nov 17, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 17, 2024

@llvm/pr-subscribers-pgo

Author: Kazu Hirata (kazutakahirata)

Changes

This patch adds InstrProfWriter::addMemProfData, which adds the
complete MemProf profile (frames, call stacks, and records) to the
writer context.

Without this function, functions like loadInput in llvm-profdata.cpp
and InstrProfWriter::mergeRecordsFromWriter must add one item (frame,
call stack, or record) at a time. The new function std::moves the
entire MemProf profile to the writer context if the destination is
empty, which is the common use case. Otherwise, we fall back to
adding one item at a time behind the scene.

Here are a couple of reasons why we should add this function:

  • We've had a bug where we forgot to add one of the three data
    structures (frames, call stacks, and records) to the writer context,
    resulting in a nearly empty indexed profile. We should always
    package the three data structures together, especially on API
    boundaries.

  • We expose a little too much of the MemProf detail to
    InstrProfWriter. I'd like to gradually transform
    InstrProfReader/Writer to entities managing buffers (sequences of
    bytes), with actual serialization/deserialization left to external
    classes. We already do some of this in InstrProfReader, where
    InstrProfReader "contracts out" to IndexedMemProfReader to handle
    MemProf details.

I am not changing loadInput or InstrProfWriter::mergeRecordsFromWriter
for now because MemProfReader uses DenseMap for frames and call
stacks, whereas MemProfData uses MapVector. I'll resolve these
mismatches in subsequent patches.


Full diff: https://github.com/llvm/llvm-project/pull/116528.diff

3 Files Affected:

  • (modified) llvm/include/llvm/ProfileData/InstrProfWriter.h (+4)
  • (modified) llvm/lib/ProfileData/InstrProfWriter.cpp (+25)
  • (modified) llvm/unittests/ProfileData/InstrProfTest.cpp (+51-61)
diff --git a/llvm/include/llvm/ProfileData/InstrProfWriter.h b/llvm/include/llvm/ProfileData/InstrProfWriter.h
index 199e565bead044..fa30926c662587 100644
--- a/llvm/include/llvm/ProfileData/InstrProfWriter.h
+++ b/llvm/include/llvm/ProfileData/InstrProfWriter.h
@@ -130,6 +130,10 @@ class InstrProfWriter {
                            const llvm::SmallVector<memprof::FrameId> &CallStack,
                            function_ref<void(Error)> Warn);
 
+  /// Add the entire MemProfData \p Incoming to the writer context.
+  bool addMemProfData(memprof::IndexedMemProfData Incoming,
+                      function_ref<void(Error)> Warn);
+
   // Add a binary id to the binary ids list.
   void addBinaryIds(ArrayRef<llvm::object::BuildID> BIs);
 
diff --git a/llvm/lib/ProfileData/InstrProfWriter.cpp b/llvm/lib/ProfileData/InstrProfWriter.cpp
index 47f463541d8ef4..f2b15a28178014 100644
--- a/llvm/lib/ProfileData/InstrProfWriter.cpp
+++ b/llvm/lib/ProfileData/InstrProfWriter.cpp
@@ -350,6 +350,31 @@ bool InstrProfWriter::addMemProfCallStack(
   return true;
 }
 
+bool InstrProfWriter::addMemProfData(memprof::IndexedMemProfData Incoming,
+                                     function_ref<void(Error)> Warn) {
+  if (MemProfData.Frames.empty())
+    MemProfData.Frames = std::move(Incoming.Frames);
+  else
+    for (const auto &[Id, F] : Incoming.Frames)
+      if (addMemProfFrame(Id, F, Warn))
+        return false;
+
+  if (MemProfData.CallStacks.empty())
+    MemProfData.CallStacks = std::move(Incoming.CallStacks);
+  else
+    for (const auto &[CSId, CS] : Incoming.CallStacks)
+      if (addMemProfCallStack(CSId, CS, Warn))
+        return false;
+
+  if (MemProfData.Records.empty())
+    MemProfData.Records = std::move(Incoming.Records);
+  else
+    for (const auto &[GUID, Record] : Incoming.Records)
+      addMemProfRecord(GUID, Record);
+
+  return true;
+}
+
 void InstrProfWriter::addBinaryIds(ArrayRef<llvm::object::BuildID> BIs) {
   llvm::append_range(BinaryIds, BIs);
 }
diff --git a/llvm/unittests/ProfileData/InstrProfTest.cpp b/llvm/unittests/ProfileData/InstrProfTest.cpp
index 582efad531bf74..b9f244104c65cf 100644
--- a/llvm/unittests/ProfileData/InstrProfTest.cpp
+++ b/llvm/unittests/ProfileData/InstrProfTest.cpp
@@ -21,6 +21,7 @@
 #include "llvm/Testing/Support/Error.h"
 #include "gtest/gtest.h"
 #include <cstdarg>
+#include <initializer_list>
 #include <optional>
 
 using namespace llvm;
@@ -348,10 +349,10 @@ TEST_F(InstrProfTest, test_merge_traces_sampled) {
 using ::llvm::memprof::IndexedMemProfRecord;
 using ::llvm::memprof::MemInfoBlock;
 using FrameIdMapTy =
-    llvm::DenseMap<::llvm::memprof::FrameId, ::llvm::memprof::Frame>;
+    llvm::MapVector<::llvm::memprof::FrameId, ::llvm::memprof::Frame>;
 using CallStackIdMapTy =
-    llvm::DenseMap<::llvm::memprof::CallStackId,
-                   ::llvm::SmallVector<::llvm::memprof::FrameId>>;
+    llvm::MapVector<::llvm::memprof::CallStackId,
+                    ::llvm::SmallVector<::llvm::memprof::FrameId>>;
 
 static FrameIdMapTy getFrameMapping() {
   FrameIdMapTy Mapping;
@@ -467,11 +468,11 @@ TEST_F(InstrProfTest, test_memprof_v0) {
       /*CallSiteFrames=*/{
           {4, 5},
       });
-  const FrameIdMapTy IdToFrameMap = getFrameMapping();
-  for (const auto &I : IdToFrameMap) {
-    Writer.addMemProfFrame(I.first, I.getSecond(), Err);
-  }
-  Writer.addMemProfRecord(/*Id=*/0x9999, IndexedMR);
+
+  memprof::IndexedMemProfData MemProfData;
+  MemProfData.Frames = getFrameMapping();
+  MemProfData.Records.try_emplace(0x9999, IndexedMR);
+  Writer.addMemProfData(MemProfData, Err);
 
   auto Profile = Writer.writeBuffer();
   readProfile(std::move(Profile));
@@ -482,8 +483,8 @@ TEST_F(InstrProfTest, test_memprof_v0) {
 
   std::optional<memprof::FrameId> LastUnmappedFrameId;
   auto IdToFrameCallback = [&](const memprof::FrameId Id) {
-    auto Iter = IdToFrameMap.find(Id);
-    if (Iter == IdToFrameMap.end()) {
+    auto Iter = MemProfData.Frames.find(Id);
+    if (Iter == MemProfData.Frames.end()) {
       LastUnmappedFrameId = Id;
       return memprof::Frame(0, 0, 0, false);
     }
@@ -508,15 +509,11 @@ TEST_F(InstrProfTest, test_memprof_v2_full_schema) {
   const IndexedMemProfRecord IndexedMR = makeRecordV2(
       /*AllocFrames=*/{0x111, 0x222},
       /*CallSiteFrames=*/{0x333}, MIB, memprof::getFullSchema());
-  const FrameIdMapTy IdToFrameMap = getFrameMapping();
-  const auto CSIdToCallStackMap = getCallStackMapping();
-  for (const auto &I : IdToFrameMap) {
-    Writer.addMemProfFrame(I.first, I.getSecond(), Err);
-  }
-  for (const auto &I : CSIdToCallStackMap) {
-    Writer.addMemProfCallStack(I.first, I.getSecond(), Err);
-  }
-  Writer.addMemProfRecord(/*Id=*/0x9999, IndexedMR);
+  memprof::IndexedMemProfData MemProfData;
+  MemProfData.Frames = getFrameMapping();
+  MemProfData.CallStacks = getCallStackMapping();
+  MemProfData.Records.try_emplace(0x9999, IndexedMR);
+  Writer.addMemProfData(MemProfData, Err);
 
   auto Profile = Writer.writeBuffer();
   readProfile(std::move(Profile));
@@ -525,9 +522,10 @@ TEST_F(InstrProfTest, test_memprof_v2_full_schema) {
   ASSERT_THAT_ERROR(RecordOr.takeError(), Succeeded());
   const memprof::MemProfRecord &Record = RecordOr.get();
 
-  memprof::FrameIdConverter<decltype(IdToFrameMap)> FrameIdConv(IdToFrameMap);
-  memprof::CallStackIdConverter<decltype(CSIdToCallStackMap)> CSIdConv(
-      CSIdToCallStackMap, FrameIdConv);
+  memprof::FrameIdConverter<decltype(MemProfData.Frames)> FrameIdConv(
+      MemProfData.Frames);
+  memprof::CallStackIdConverter<decltype(MemProfData.CallStacks)> CSIdConv(
+      MemProfData.CallStacks, FrameIdConv);
 
   const ::llvm::memprof::MemProfRecord WantRecord =
       IndexedMR.toMemProfRecord(CSIdConv);
@@ -550,15 +548,11 @@ TEST_F(InstrProfTest, test_memprof_v2_partial_schema) {
   const IndexedMemProfRecord IndexedMR = makeRecordV2(
       /*AllocFrames=*/{0x111, 0x222},
       /*CallSiteFrames=*/{0x333}, MIB, memprof::getHotColdSchema());
-  const FrameIdMapTy IdToFrameMap = getFrameMapping();
-  const auto CSIdToCallStackMap = getCallStackMapping();
-  for (const auto &I : IdToFrameMap) {
-    Writer.addMemProfFrame(I.first, I.getSecond(), Err);
-  }
-  for (const auto &I : CSIdToCallStackMap) {
-    Writer.addMemProfCallStack(I.first, I.getSecond(), Err);
-  }
-  Writer.addMemProfRecord(/*Id=*/0x9999, IndexedMR);
+  memprof::IndexedMemProfData MemProfData;
+  MemProfData.Frames = getFrameMapping();
+  MemProfData.CallStacks = getCallStackMapping();
+  MemProfData.Records.try_emplace(0x9999, IndexedMR);
+  Writer.addMemProfData(MemProfData, Err);
 
   auto Profile = Writer.writeBuffer();
   readProfile(std::move(Profile));
@@ -567,9 +561,10 @@ TEST_F(InstrProfTest, test_memprof_v2_partial_schema) {
   ASSERT_THAT_ERROR(RecordOr.takeError(), Succeeded());
   const memprof::MemProfRecord &Record = RecordOr.get();
 
-  memprof::FrameIdConverter<decltype(IdToFrameMap)> FrameIdConv(IdToFrameMap);
-  memprof::CallStackIdConverter<decltype(CSIdToCallStackMap)> CSIdConv(
-      CSIdToCallStackMap, FrameIdConv);
+  memprof::FrameIdConverter<decltype(MemProfData.Frames)> FrameIdConv(
+      MemProfData.Frames);
+  memprof::CallStackIdConverter<decltype(MemProfData.CallStacks)> CSIdConv(
+      MemProfData.CallStacks, FrameIdConv);
 
   const ::llvm::memprof::MemProfRecord WantRecord =
       IndexedMR.toMemProfRecord(CSIdConv);
@@ -601,23 +596,21 @@ TEST_F(InstrProfTest, test_caller_callee_pairs) {
   //       Line: 7, Column: 8
   //         new(...)
 
-  const std::pair<memprof::FrameId, memprof::Frame> Frames[] = {
-      {0, {0x123, 1, 2, false}},
-      {1, {0x234, 3, 4, true}},
-      {2, {0x123, 5, 6, false}},
-      {3, {0x345, 7, 8, true}}};
-  for (const auto &[FrameId, Frame] : Frames)
-    Writer.addMemProfFrame(FrameId, Frame, Err);
-
-  const std::pair<memprof::CallStackId, SmallVector<memprof::FrameId>>
-      CallStacks[] = {{0x111, {1, 0}}, {0x222, {3, 2}}};
-  for (const auto &[CSId, CallStack] : CallStacks)
-    Writer.addMemProfCallStack(CSId, CallStack, Err);
-
   const IndexedMemProfRecord IndexedMR = makeRecordV2(
       /*AllocFrames=*/{0x111, 0x222},
       /*CallSiteFrames=*/{}, MIB, memprof::getHotColdSchema());
-  Writer.addMemProfRecord(/*Id=*/0x9999, IndexedMR);
+
+  memprof::IndexedMemProfData MemProfData;
+  MemProfData.Frames.try_emplace(0, 0x123, 1, 2, false);
+  MemProfData.Frames.try_emplace(1, 0x234, 3, 4, true);
+  MemProfData.Frames.try_emplace(2, 0x123, 5, 6, false);
+  MemProfData.Frames.try_emplace(3, 0x345, 7, 8, true);
+  MemProfData.CallStacks.try_emplace(
+      0x111, std::initializer_list<memprof::FrameId>{1, 0});
+  MemProfData.CallStacks.try_emplace(
+      0x222, std::initializer_list<memprof::FrameId>{3, 2});
+  MemProfData.Records.try_emplace(0x9999, IndexedMR);
+  Writer.addMemProfData(MemProfData, Err);
 
   auto Profile = Writer.writeBuffer();
   readProfile(std::move(Profile));
@@ -681,19 +674,15 @@ TEST_F(InstrProfTest, test_memprof_merge) {
   ASSERT_THAT_ERROR(Writer2.mergeProfileKind(InstrProfKind::MemProf),
                     Succeeded());
 
-  const FrameIdMapTy IdToFrameMap = getFrameMapping();
-  for (const auto &I : IdToFrameMap) {
-    Writer2.addMemProfFrame(I.first, I.getSecond(), Err);
-  }
-
-  const auto CSIdToCallStackMap = getCallStackMapping();
-  for (const auto &[CSId, CallStack] : CSIdToCallStackMap)
-    Writer2.addMemProfCallStack(CSId, CallStack, Err);
-
   const IndexedMemProfRecord IndexedMR = makeRecordV2(
       /*AllocFrames=*/{0x111, 0x222},
       /*CallSiteFrames=*/{}, makePartialMIB(), memprof::getHotColdSchema());
-  Writer2.addMemProfRecord(/*Id=*/0x9999, IndexedMR);
+
+  memprof::IndexedMemProfData MemProfData;
+  MemProfData.Frames = getFrameMapping();
+  MemProfData.CallStacks = getCallStackMapping();
+  MemProfData.Records.try_emplace(0x9999, IndexedMR);
+  Writer2.addMemProfData(MemProfData, Err);
 
   ASSERT_THAT_ERROR(Writer.mergeProfileKind(Writer2.getProfileKind()),
                     Succeeded());
@@ -714,9 +703,10 @@ TEST_F(InstrProfTest, test_memprof_merge) {
 
   std::optional<memprof::FrameId> LastUnmappedFrameId;
 
-  memprof::FrameIdConverter<decltype(IdToFrameMap)> FrameIdConv(IdToFrameMap);
-  memprof::CallStackIdConverter<decltype(CSIdToCallStackMap)> CSIdConv(
-      CSIdToCallStackMap, FrameIdConv);
+  memprof::FrameIdConverter<decltype(MemProfData.Frames)> FrameIdConv(
+      MemProfData.Frames);
+  memprof::CallStackIdConverter<decltype(MemProfData.CallStacks)> CSIdConv(
+      MemProfData.CallStacks, FrameIdConv);
 
   const ::llvm::memprof::MemProfRecord WantRecord =
       IndexedMR.toMemProfRecord(CSIdConv);

Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm but a question/suggestion below

@kazutakahirata kazutakahirata merged commit 6bf8f08 into llvm:main Nov 18, 2024
6 of 8 checks passed
@kazutakahirata kazutakahirata deleted the memprof_org_addMemProfData branch November 18, 2024 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PGO Profile Guided Optimizations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants