[MemProf] Print full context hash when reporting hinted bytes #114465

teresajohnson · 2024-10-31T20:55:41Z

Improve the information printed when -memprof-report-hinted-sizes is
enabled. Now print the full context hash computed from the original
profile, similar to what we do when reporting matching statistics. This
will make it easier to correlate with the profile.

Note that the full context hash must be computed at profile match time
and saved in the metadata and summary, because we may trim the context
during matching when it isn't needed for distinguishing hotness.
Similarly, due to the context trimming, we may have more than one full
context id and total size pair per MIB in the metadata and summary,
which now get a list of these pairs.

Remove the old aggregate size from the metadata and summary support.
One other change from the prior support is that we no longer write the
size information into the combined index for the LTO backends, which
don't use this information, which reduces unnecessary bloat in
distributed index files.

Improve the information printed when -memprof-report-hinted-sizes is enabled. Now print the full context hash computed from the original profile, similar to what we do when reporting matching statistics. This will make it easier to correlate with the profile. Note that the full context hash must be computed at profile match time and saved in the metadata and summary, because we may trim the context during matching when it isn't needed for distinguishing hotness. Similarly, due to the context trimming, we may have more than one full context id and total size pair per MIB in the metadata and summary, which now get a list of these pairs. Remove the old aggregate size from the metadata and summary support. One other change from the prior support is that we no longer write the size information into the combined index for the LTO backends, which don't use this information, which reduces unnecessary bloat in distributed index files.

llvmbot · 2024-10-31T20:56:14Z

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-pgo
@llvm/pr-subscribers-lto

@llvm/pr-subscribers-llvm-analysis

Author: Teresa Johnson (teresajohnson)

Changes

Improve the information printed when -memprof-report-hinted-sizes is
enabled. Now print the full context hash computed from the original
profile, similar to what we do when reporting matching statistics. This
will make it easier to correlate with the profile.

Note that the full context hash must be computed at profile match time
and saved in the metadata and summary, because we may trim the context
during matching when it isn't needed for distinguishing hotness.
Similarly, due to the context trimming, we may have more than one full
context id and total size pair per MIB in the metadata and summary,
which now get a list of these pairs.

Remove the old aggregate size from the metadata and summary support.
One other change from the prior support is that we no longer write the
size information into the combined index for the LTO backends, which
don't use this information, which reduces unnecessary bloat in
distributed index files.

Patch is 49.87 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/114465.diff

19 Files Affected:

(modified) llvm/include/llvm/Analysis/MemoryProfileInfo.h (+19-8)
(modified) llvm/include/llvm/Bitcode/LLVMBitCodes.h (+8-2)
(modified) llvm/include/llvm/IR/ModuleSummaryIndex.h (+67-9)
(modified) llvm/lib/Analysis/MemoryProfileInfo.cpp (+64-26)
(modified) llvm/lib/Analysis/ModuleSummaryAnalysis.cpp (+25-8)
(modified) llvm/lib/Bitcode/Reader/BitcodeAnalyzer.cpp (+1)
(modified) llvm/lib/Bitcode/Reader/BitcodeReader.cpp (+34-21)
(modified) llvm/lib/Bitcode/Writer/BitcodeWriter.cpp (+32-8)
(modified) llvm/lib/IR/Verifier.cpp (+29-8)
(modified) llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp (+44-22)
(modified) llvm/lib/Transforms/Instrumentation/MemProfiler.cpp (+12-6)
(modified) llvm/test/ThinLTO/X86/memprof-aliased-location1.ll (+2-2)
(modified) llvm/test/ThinLTO/X86/memprof-aliased-location2.ll (+2-2)
(modified) llvm/test/ThinLTO/X86/memprof-basic.ll (+8-4)
(modified) llvm/test/Transforms/MemProfContextDisambiguation/aliased-location1.ll (+2-2)
(modified) llvm/test/Transforms/MemProfContextDisambiguation/aliased-location2.ll (+2-2)
(modified) llvm/test/Transforms/MemProfContextDisambiguation/basic.ll (+8-4)
(modified) llvm/test/Transforms/PGOProfile/memprof.ll (+18-11)
(modified) llvm/test/Verifier/memprof-metadata-bad.ll (+1-1)

diff --git a/llvm/include/llvm/Analysis/MemoryProfileInfo.h b/llvm/include/llvm/Analysis/MemoryProfileInfo.h
index edbce706953d18..55889c841b283e 100644
--- a/llvm/include/llvm/Analysis/MemoryProfileInfo.h
+++ b/llvm/include/llvm/Analysis/MemoryProfileInfo.h
@@ -28,16 +28,17 @@ AllocationType getAllocType(uint64_t TotalLifetimeAccessDensity,
 /// the resulting metadata node.
 MDNode *buildCallstackMetadata(ArrayRef<uint64_t> CallStack, LLVMContext &Ctx);
 
+/// Build metadata from the provided list of full stack id and profiled size, to
+/// use when reporting of hinted sizes is enabled.
+MDNode *buildContextSizeMetadata(ArrayRef<ContextTotalSize> ContextSizeInfo,
+                                 LLVMContext &Ctx);
+
 /// Returns the stack node from an MIB metadata node.
 MDNode *getMIBStackNode(const MDNode *MIB);
 
 /// Returns the allocation type from an MIB metadata node.
 AllocationType getMIBAllocType(const MDNode *MIB);
 
-/// Returns the total size from an MIB metadata node, or 0 if it was not
-/// recorded.
-uint64_t getMIBTotalSize(const MDNode *MIB);
-
 /// Returns the string to use in attributes with the given type.
 std::string getAllocTypeAttributeString(AllocationType Type);
 
@@ -55,11 +56,15 @@ class CallStackTrie {
     // Allocation types for call context sharing the context prefix at this
     // node.
     uint8_t AllocTypes;
-    uint64_t TotalSize;
+    // If the user has requested reporting of hinted sizes, keep track of the
+    // associated full stack id and profiled sizes. Can have more than one
+    // after trimming (e.g. when building from metadata). This is only placed on
+    // the last (root-most) trie node for each allocation context.
+    std::vector<ContextTotalSize> ContextSizeInfo;
     // Map of caller stack id to the corresponding child Trie node.
     std::map<uint64_t, CallStackTrieNode *> Callers;
-    CallStackTrieNode(AllocationType Type, uint64_t TotalSize)
-        : AllocTypes(static_cast<uint8_t>(Type)), TotalSize(TotalSize) {}
+    CallStackTrieNode(AllocationType Type)
+        : AllocTypes(static_cast<uint8_t>(Type)) {}
   };
 
   // The node for the allocation at the root.
@@ -75,6 +80,11 @@ class CallStackTrie {
     delete Node;
   }
 
+  // Recursively build up a complete list of context size information from the
+  // trie nodes reached form the given Node, for hint size reporting.
+  void collectContextSizeInfo(CallStackTrieNode *Node,
+                              std::vector<ContextTotalSize> &ContextSizeInfo);
+
   // Recursive helper to trim contexts and create metadata nodes.
   bool buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
                      std::vector<uint64_t> &MIBCallStack,
@@ -93,7 +103,8 @@ class CallStackTrie {
   /// allocation call down to the bottom of the call stack (i.e. callee to
   /// caller order).
   void addCallStack(AllocationType AllocType, ArrayRef<uint64_t> StackIds,
-                    uint64_t TotalSize = 0);
+                    std::vector<ContextTotalSize> ContextSizeInfo =
+                        std::vector<ContextTotalSize>());
 
   /// Add the call stack context along with its allocation type from the MIB
   /// metadata to the Trie.
diff --git a/llvm/include/llvm/Bitcode/LLVMBitCodes.h b/llvm/include/llvm/Bitcode/LLVMBitCodes.h
index 41a6447356c23b..130c92b28b3d5e 100644
--- a/llvm/include/llvm/Bitcode/LLVMBitCodes.h
+++ b/llvm/include/llvm/Bitcode/LLVMBitCodes.h
@@ -308,7 +308,7 @@ enum GlobalValueSummarySymtabCodes {
   FS_PERMODULE_CALLSITE_INFO = 26,
   // Summary of per-module allocation memprof metadata.
   // [nummib, nummib x (alloc type, numstackids, numstackids x stackidindex),
-  // [nummib x total size]?]
+  // [nummib x (numcontext x contextsizeindex)]?]
   FS_PERMODULE_ALLOC_INFO = 27,
   // Summary of combined index memprof callsite metadata.
   // [valueid, numstackindices, numver,
@@ -317,9 +317,15 @@ enum GlobalValueSummarySymtabCodes {
   // Summary of combined index allocation memprof metadata.
   // [nummib, numver,
   //  nummib x (alloc type, numstackids, numstackids x stackidindex),
-  //  numver x version, [nummib x total size]?]
+  //  numver x version]
   FS_COMBINED_ALLOC_INFO = 29,
+  // List of all stack ids referenced by index in the callsite and alloc infos.
+  // [n x stack id]
   FS_STACK_IDS = 30,
+  // List of all (full stack id, total size) pairs optionally referenced by
+  // index from the alloc info records.
+  // [n x (full stack id, total size)]
+  FS_CONTEXT_SIZE_INFOS = 31,
 };
 
 enum MetadataCodes {
diff --git a/llvm/include/llvm/IR/ModuleSummaryIndex.h b/llvm/include/llvm/IR/ModuleSummaryIndex.h
index 1cfe7c15f97dbc..ccb6c8473f23ee 100644
--- a/llvm/include/llvm/IR/ModuleSummaryIndex.h
+++ b/llvm/include/llvm/IR/ModuleSummaryIndex.h
@@ -302,6 +302,14 @@ template <> struct DenseMapInfo<ValueInfo> {
   static unsigned getHashValue(ValueInfo I) { return (uintptr_t)I.getRef(); }
 };
 
+// For optional hinted size reporting, holds a pair of the full stack id
+// (pre-trimming, from the full context in the profile), and the associated
+// total profiled size.
+struct ContextTotalSize {
+  uint64_t FullStackId;
+  uint64_t TotalSize;
+};
+
 /// Summary of memprof callsite metadata.
 struct CallsiteInfo {
   // Actual callee function.
@@ -408,9 +416,15 @@ struct AllocInfo {
   // Vector of MIBs in this memprof metadata.
   std::vector<MIBInfo> MIBs;
 
-  // If requested, keep track of total profiled sizes for each MIB. This will be
-  // a vector of the same length and order as the MIBs vector, if non-empty.
-  std::vector<uint64_t> TotalSizes;
+  // If requested, keep track of full stack contexts and total profiled sizes
+  // for each MIB. This will be a vector of the same length and order as the
+  // MIBs vector, if non-empty. Note that each MIB in the summary can have
+  // multiple of these as we trim the contexts when possible during matching.
+  // For hinted size reporting we, however, want the original pre-trimmed full
+  // stack context id for better correlation with the profile. Note that these
+  // are indexes into the ContextSizeInfos list in the index, to enable
+  // deduplication.
+  std::vector<std::vector<unsigned>> ContextSizeInfoIndices;
 
   AllocInfo(std::vector<MIBInfo> MIBs) : MIBs(std::move(MIBs)) {
     Versions.push_back(0);
@@ -432,14 +446,21 @@ inline raw_ostream &operator<<(raw_ostream &OS, const AllocInfo &AE) {
   for (auto &M : AE.MIBs) {
     OS << "\t\t" << M << "\n";
   }
-  if (!AE.TotalSizes.empty()) {
-    OS << " TotalSizes per MIB:\n\t\t";
+  if (!AE.ContextSizeInfoIndices.empty()) {
+    OS << " ContextSizeInfo index per MIB:\n\t\t";
     First = true;
-    for (uint64_t TS : AE.TotalSizes) {
+    for (auto Indices : AE.ContextSizeInfoIndices) {
       if (!First)
         OS << ", ";
       First = false;
-      OS << TS << "\n";
+      bool FirstIndex = true;
+      for (uint64_t Index : Indices) {
+        if (!FirstIndex)
+          OS << ", ";
+        FirstIndex = false;
+        OS << Index;
+      }
+      OS << "\n";
     }
   }
   return OS;
@@ -1426,6 +1447,19 @@ class ModuleSummaryIndex {
   // built via releaseTemporaryMemory.
   DenseMap<uint64_t, unsigned> StackIdToIndex;
 
+  // List of unique ContextTotalSize structs (pair of the full stack id hash and
+  // its associated total profiled size). We use an index into this vector when
+  // referencing from the alloc summary to reduce the overall memory and size
+  // requirements, since often allocations may be duplicated due to inlining.
+  std::vector<ContextTotalSize> ContextSizeInfos;
+
+  // Temporary map while building the ContextSizeInfos list. Clear when index is
+  // completely built via releaseTemporaryMemory.
+  // Maps from full stack id to a map of total size to the assigned index.
+  // We need size in here too because due to stack truncation in the profile we
+  // can have the same full stack id and different sizes.
+  DenseMap<uint64_t, DenseMap<uint64_t, unsigned>> ContextToTotalSizeAndIndex;
+
   // YAML I/O support.
   friend yaml::MappingTraits<ModuleSummaryIndex>;
 
@@ -1470,6 +1504,9 @@ class ModuleSummaryIndex {
   size_t size() const { return GlobalValueMap.size(); }
 
   const std::vector<uint64_t> &stackIds() const { return StackIds; }
+  const std::vector<ContextTotalSize> &contextSizeInfos() const {
+    return ContextSizeInfos;
+  }
 
   unsigned addOrGetStackIdIndex(uint64_t StackId) {
     auto Inserted = StackIdToIndex.insert({StackId, StackIds.size()});
@@ -1483,15 +1520,36 @@ class ModuleSummaryIndex {
     return StackIds[Index];
   }
 
+  unsigned addOrGetContextSizeIndex(ContextTotalSize ContextSizeInfo) {
+    auto &Entry = ContextToTotalSizeAndIndex[ContextSizeInfo.FullStackId];
+    auto Inserted =
+        Entry.insert({ContextSizeInfo.TotalSize, ContextSizeInfos.size()});
+    if (Inserted.second)
+      ContextSizeInfos.push_back(
+          {ContextSizeInfo.FullStackId, ContextSizeInfo.TotalSize});
+    else
+      assert(Inserted.first->first == ContextSizeInfo.TotalSize);
+    return Inserted.first->second;
+  }
+
+  ContextTotalSize getContextSizeInfoAtIndex(unsigned Index) const {
+    assert(ContextSizeInfos.size() > Index);
+    return ContextSizeInfos[Index];
+  }
+
   // Facility to release memory from data structures only needed during index
-  // construction (including while building combined index). Currently this only
+  // construction (including while building combined index). Currently this
   // releases the temporary map used while constructing a correspondence between
-  // stack ids and their index in the StackIds vector. Mostly impactful when
+  // stack ids and their index in the StackIds vector, and a similar map used
+  // while constructing a the ContextSizeInfos vector. Mostly impactful when
   // building a large combined index.
   void releaseTemporaryMemory() {
     assert(StackIdToIndex.size() == StackIds.size());
     StackIdToIndex.clear();
     StackIds.shrink_to_fit();
+    assert(ContextToTotalSizeAndIndex.size() == ContextSizeInfos.size());
+    ContextToTotalSizeAndIndex.clear();
+    ContextSizeInfos.shrink_to_fit();
   }
 
   /// Convenience function for doing a DFS on a ValueInfo. Marks the function in
diff --git a/llvm/lib/Analysis/MemoryProfileInfo.cpp b/llvm/lib/Analysis/MemoryProfileInfo.cpp
index 2b49dce17b7931..885f2e4d040143 100644
--- a/llvm/lib/Analysis/MemoryProfileInfo.cpp
+++ b/llvm/lib/Analysis/MemoryProfileInfo.cpp
@@ -99,12 +99,6 @@ AllocationType llvm::memprof::getMIBAllocType(const MDNode *MIB) {
   return AllocationType::NotCold;
 }
 
-uint64_t llvm::memprof::getMIBTotalSize(const MDNode *MIB) {
-  if (MIB->getNumOperands() < 3)
-    return 0;
-  return mdconst::dyn_extract<ConstantInt>(MIB->getOperand(2))->getZExtValue();
-}
-
 std::string llvm::memprof::getAllocTypeAttributeString(AllocationType Type) {
   switch (Type) {
   case AllocationType::NotCold:
@@ -135,22 +129,22 @@ bool llvm::memprof::hasSingleAllocType(uint8_t AllocTypes) {
   return NumAllocTypes == 1;
 }
 
-void CallStackTrie::addCallStack(AllocationType AllocType,
-                                 ArrayRef<uint64_t> StackIds,
-                                 uint64_t TotalSize) {
+void CallStackTrie::addCallStack(
+    AllocationType AllocType, ArrayRef<uint64_t> StackIds,
+    std::vector<ContextTotalSize> ContextSizeInfo) {
   bool First = true;
   CallStackTrieNode *Curr = nullptr;
   for (auto StackId : StackIds) {
-    // If this is the first stack frame, add or update alloc node.
+    // errs() << StackId << " ";
+    //  If this is the first stack frame, add or update alloc node.
     if (First) {
       First = false;
       if (Alloc) {
         assert(AllocStackId == StackId);
         Alloc->AllocTypes |= static_cast<uint8_t>(AllocType);
-        Alloc->TotalSize += TotalSize;
       } else {
         AllocStackId = StackId;
-        Alloc = new CallStackTrieNode(AllocType, TotalSize);
+        Alloc = new CallStackTrieNode(AllocType);
       }
       Curr = Alloc;
       continue;
@@ -160,15 +154,18 @@ void CallStackTrie::addCallStack(AllocationType AllocType,
     if (Next != Curr->Callers.end()) {
       Curr = Next->second;
       Curr->AllocTypes |= static_cast<uint8_t>(AllocType);
-      Curr->TotalSize += TotalSize;
       continue;
     }
     // Otherwise add a new caller node.
-    auto *New = new CallStackTrieNode(AllocType, TotalSize);
+    auto *New = new CallStackTrieNode(AllocType);
     Curr->Callers[StackId] = New;
     Curr = New;
   }
   assert(Curr);
+  Curr->ContextSizeInfo.insert(Curr->ContextSizeInfo.end(),
+                               ContextSizeInfo.begin(), ContextSizeInfo.end());
+  std::vector<ContextTotalSize> AllContextSizeInfo;
+  collectContextSizeInfo(Curr, AllContextSizeInfo);
 }
 
 void CallStackTrie::addCallStack(MDNode *MIB) {
@@ -181,21 +178,55 @@ void CallStackTrie::addCallStack(MDNode *MIB) {
     assert(StackId);
     CallStack.push_back(StackId->getZExtValue());
   }
-  addCallStack(getMIBAllocType(MIB), CallStack, getMIBTotalSize(MIB));
+  std::vector<ContextTotalSize> ContextSizeInfo;
+  // Collect the context size information if it exists.
+  if (MIB->getNumOperands() > 2) {
+    for (unsigned I = 2; I < MIB->getNumOperands(); I++) {
+      MDNode *ContextSizePair = dyn_cast<MDNode>(MIB->getOperand(I));
+      assert(ContextSizePair->getNumOperands() == 2);
+      uint64_t FullStackId =
+          mdconst::dyn_extract<ConstantInt>(ContextSizePair->getOperand(0))
+              ->getZExtValue();
+      uint64_t TotalSize =
+          mdconst::dyn_extract<ConstantInt>(ContextSizePair->getOperand(1))
+              ->getZExtValue();
+      ContextSizeInfo.push_back({FullStackId, TotalSize});
+    }
+  }
+  addCallStack(getMIBAllocType(MIB), CallStack, std::move(ContextSizeInfo));
 }
 
 static MDNode *createMIBNode(LLVMContext &Ctx, ArrayRef<uint64_t> MIBCallStack,
-                             AllocationType AllocType, uint64_t TotalSize) {
+                             AllocationType AllocType,
+                             ArrayRef<ContextTotalSize> ContextSizeInfo) {
   SmallVector<Metadata *> MIBPayload(
       {buildCallstackMetadata(MIBCallStack, Ctx)});
   MIBPayload.push_back(
       MDString::get(Ctx, getAllocTypeAttributeString(AllocType)));
-  if (TotalSize)
-    MIBPayload.push_back(ValueAsMetadata::get(
-        ConstantInt::get(Type::getInt64Ty(Ctx), TotalSize)));
+  if (!ContextSizeInfo.empty()) {
+    for (auto Info : ContextSizeInfo) {
+      auto *FullStackIdMD = ValueAsMetadata::get(
+          ConstantInt::get(Type::getInt64Ty(Ctx), Info.FullStackId));
+      auto *TotalSizeMD = ValueAsMetadata::get(
+          ConstantInt::get(Type::getInt64Ty(Ctx), Info.TotalSize));
+      auto *ContextSizeMD = MDNode::get(Ctx, {FullStackIdMD, TotalSizeMD});
+      MIBPayload.push_back(ContextSizeMD);
+    }
+  }
   return MDNode::get(Ctx, MIBPayload);
 }
 
+void CallStackTrie::collectContextSizeInfo(
+    CallStackTrieNode *Node, std::vector<ContextTotalSize> &ContextSizeInfo) {
+  ContextSizeInfo.insert(ContextSizeInfo.end(), Node->ContextSizeInfo.begin(),
+                         Node->ContextSizeInfo.end());
+  if (Node->Callers.empty())
+    return;
+  for (auto &Caller : Node->Callers) {
+    collectContextSizeInfo(Caller.second, ContextSizeInfo);
+  }
+}
+
 // Recursive helper to trim contexts and create metadata nodes.
 // Caller should have pushed Node's loc to MIBCallStack. Doing this in the
 // caller makes it simpler to handle the many early returns in this method.
@@ -206,8 +237,10 @@ bool CallStackTrie::buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
   // Trim context below the first node in a prefix with a single alloc type.
   // Add an MIB record for the current call stack prefix.
   if (hasSingleAllocType(Node->AllocTypes)) {
+    std::vector<ContextTotalSize> ContextSizeInfo;
+    collectContextSizeInfo(Node, ContextSizeInfo);
     MIBNodes.push_back(createMIBNode(
-        Ctx, MIBCallStack, (AllocationType)Node->AllocTypes, Node->TotalSize));
+        Ctx, MIBCallStack, (AllocationType)Node->AllocTypes, ContextSizeInfo));
     return true;
   }
 
@@ -243,8 +276,10 @@ bool CallStackTrie::buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
   // non-cold allocation type.
   if (!CalleeHasAmbiguousCallerContext)
     return false;
+  std::vector<ContextTotalSize> ContextSizeInfo;
+  collectContextSizeInfo(Node, ContextSizeInfo);
   MIBNodes.push_back(createMIBNode(Ctx, MIBCallStack, AllocationType::NotCold,
-                                   Node->TotalSize));
+                                   ContextSizeInfo));
   return true;
 }
 
@@ -256,11 +291,14 @@ bool CallStackTrie::buildAndAttachMIBMetadata(CallBase *CI) {
   if (hasSingleAllocType(Alloc->AllocTypes)) {
     addAllocTypeAttribute(Ctx, CI, (AllocationType)Alloc->AllocTypes);
     if (MemProfReportHintedSizes) {
-      assert(Alloc->TotalSize);
-      errs() << "Total size for allocation with location hash " << AllocStackId
-             << " and single alloc type "
-             << getAllocTypeAttributeString((AllocationType)Alloc->AllocTypes)
-             << ": " << Alloc->TotalSize << "\n";
+      std::vector<ContextTotalSize> ContextSizeInfo;
+      collectContextSizeInfo(Alloc, ContextSizeInfo);
+      for (const auto &Info : ContextSizeInfo) {
+        errs() << "Total size for full allocation context hash "
+               << Info.FullStackId << " and single alloc type "
+               << getAllocTypeAttributeString((AllocationType)Alloc->AllocTypes)
+               << ": " << Info.TotalSize << "\n";
+      }
     }
     return false;
   }
diff --git a/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp b/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
index 004e8b76a3c851..3273de51a79d9f 100644
--- a/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
+++ b/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
@@ -523,6 +523,7 @@ static void computeFunctionSummary(
       if (MemProfMD) {
         std::vector<MIBInfo> MIBs;
         std::vector<uint64_t> TotalSizes;
+        std::vector<std::vector<unsigned>> ContextSizeInfoIndices;
         for (auto &MDOp : MemProfMD->operands()) {
           auto *MIBMD = cast<const MDNode>(MDOp);
           MDNode *StackNode = getMIBStackNode(MIBMD);
@@ -540,18 +541,34 @@ static void computeFunctionSummary(
             if (StackIdIndices.empty() || StackIdIndices.back() != StackIdIdx)
               StackIdIndices.push_back(StackIdIdx);
           }
+          // If we have context size information, collect it for inclusion in
+          // the summary.
+          assert(MIBMD->getNumOperands() > 2 || !MemProfReportHintedSizes);
+          if (MIBMD->getNumOperands() > 2) {
+            std::vector<unsigned> ContextSizeIndices;
+            for (unsigned I = 2; I < MIBMD->getNumOperands(); I++) {
+              MDNode *ContextSizePair = dyn_cast<MDNode>(MIBMD->getOperand(I));
+              assert(ContextSizePair->getNumOperands() == 2);
+              uint64_t FullStackId = mdconst::dyn_extract<ConstantInt>(
+                                         ContextSizePair->getOperand(0))
+                                         ->getZExtValue();
+              uint64_t TS = mdconst::dyn_extract<ConstantInt>(
+                                ContextSizePair->getOperand(1))
+                                ->getZExtValue();
+              ContextSizeIndices.push_back(
+                  Index.addOrGetContextSizeIndex({FullStackId, TS}));
+            }
+            ContextSizeInfoIndices.push_back(std::move(ContextSizeIndices));
+          }
           MIBs.push_back(
               MIBInfo(getMIBAllocType(MIBMD), std::move(StackIdIndices)));
-          if (MemProfReportHintedSizes) {
-            auto TotalSize = getMIBTotalSize(MIBMD);
-            assert(TotalSize);
-            TotalSizes.push_back(TotalSize);
-          }
         }
         Allocs.push_back(AllocInfo(std::move(MIBs)));
-        if (MemProfReportHintedSizes) {
-          assert(Allocs.back().MIBs.size() == TotalSizes.size());
-          Allocs.back().TotalSizes = std::move(TotalSizes);
+        assert(!ContextSizeInfoIndices.empty() || !MemProfReportHintedSizes);
+        if (!ContextSizeInfoIndices.empty()) {
+          assert(Allocs.back().MIBs.size() == ContextSizeInfoIndices.size());
+          Allocs.back().ContextSizeInfoIndices =
+              std::move(ContextSizeInfoIndices);
         }
       } else i...
[truncated]

github-actions · 2024-10-31T20:59:04Z

✅ With the latest revision this PR passed the C/C++ code formatter.

snehasish

Looks good overall.

Is there any noticable impact on compile time and memory when this is enabled?
What is the size impact to the summary?

snehasish · 2024-10-31T21:08:05Z

llvm/include/llvm/Analysis/MemoryProfileInfo.h

  void addCallStack(AllocationType AllocType, ArrayRef<uint64_t> StackIds,
-                    uint64_t TotalSize = 0);
+                    std::vector<ContextTotalSize> ContextSizeInfo =
+                        std::vector<ContextTotalSize>());


nit: Just = {}?

snehasish · 2024-10-31T21:17:19Z

llvm/include/llvm/Bitcode/LLVMBitCodes.h

+  // List of all (full stack id, total size) pairs optionally referenced by
+  // index from the alloc info records.
+  // [n x (full stack id, total size)]
+  FS_CONTEXT_SIZE_INFOS = 31,


All the other entries are "_INFO" instead of "_INFOS". Would be nice to be consistent.

snehasish · 2024-10-31T21:25:57Z

llvm/include/llvm/IR/ModuleSummaryIndex.h

  // releases the temporary map used while constructing a correspondence between
-  // stack ids and their index in the StackIds vector. Mostly impactful when
+  // stack ids and their index in the StackIds vector, and a similar map used
+  // while constructing a the ContextSizeInfos vector. Mostly impactful when


typo: extra 'a'

snehasish · 2024-10-31T21:26:23Z

llvm/lib/Analysis/MemoryProfileInfo.cpp

  CallStackTrieNode *Curr = nullptr;
  for (auto StackId : StackIds) {
-    // If this is the first stack frame, add or update alloc node.
+    // errs() << StackId << " ";


Leftover commented code?

snehasish · 2024-10-31T21:59:19Z

llvm/lib/Analysis/MemoryProfileInfo.cpp

+  if (Node->Callers.empty())
+    return;


Deleting these two lines will have no impact since the loop below won't run if it's empty?

removed, and removed extraneous braces below.

snehasish · 2024-11-01T23:03:54Z

llvm/lib/Bitcode/Reader/BitcodeReader.cpp

+        while (MIBsRead++ < NumMIBs) {
+          unsigned NumContextSizeInfoEntries = Record[I++];
+          assert(Record.size() - I >= NumContextSizeInfoEntries);
+          std::vector<unsigned> ContextSizeIndices;


ContextSizeIndices.reserve( NumContextSizeInfoEntries) since we know how many we will insert?

snehasish · 2024-11-01T23:04:38Z

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp

+    assert(AI.ContextSizeInfoIndices.empty() ||
+           AI.ContextSizeInfoIndices.size() == AI.MIBs.size());
+    if (WriteContextSizeInfoIndex && !AI.ContextSizeInfoIndices.empty()) {
+      for (auto Indices : AI.ContextSizeInfoIndices) {


Use a reference to avoid a copy of Indices?

snehasish · 2024-11-01T23:05:09Z

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp

+    ContextSizeInfoAbbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8));
+    unsigned ContextSizeInfoAbbvId =
+        Stream.EmitAbbrev(std::move(ContextSizeInfoAbbv));
+    for (const auto &Info : Index->contextSizeInfos()) {


Can use a structured binding here.

snehasish · 2024-11-01T23:05:28Z

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp

+    }
+    Stream.EmitRecord(bitc::FS_CONTEXT_SIZE_INFOS, NameVals,
+                      ContextSizeInfoAbbvId);
+    NameVals.clear();


Do we want to shrink too?

Is it worth it? We reuse this vector multiple times for other records. And the lifetime is not so long that it is going to make any noticeable difference on memory (this is in the pre-lto link compile).

snehasish · 2024-11-01T23:06:43Z

llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp

+      auto CSI = ContextIdToContextSizeInfos.find(Id);
+      if (CSI != ContextIdToContextSizeInfos.end()) {
+        for (auto &Info : CSI->second) {
+          OS << getAllocTypeString((uint8_t)TypeI->second)


Add a "MemProf hinting" or related prefix to make this easy to parse from the log later.

teresajohnson

Addressed comments (except one, see follow up question). I will measure the overheads you asked about and get back.

teresajohnson · 2024-11-07T18:35:18Z

llvm/include/llvm/Analysis/MemoryProfileInfo.h

  void addCallStack(AllocationType AllocType, ArrayRef<uint64_t> StackIds,
-                    uint64_t TotalSize = 0);
+                    std::vector<ContextTotalSize> ContextSizeInfo =
+                        std::vector<ContextTotalSize>());


teresajohnson · 2024-11-07T18:39:20Z

llvm/include/llvm/Bitcode/LLVMBitCodes.h

+  // List of all (full stack id, total size) pairs optionally referenced by
+  // index from the alloc info records.
+  // [n x (full stack id, total size)]
+  FS_CONTEXT_SIZE_INFOS = 31,


teresajohnson · 2024-11-07T18:39:50Z

llvm/include/llvm/IR/ModuleSummaryIndex.h

  // releases the temporary map used while constructing a correspondence between
-  // stack ids and their index in the StackIds vector. Mostly impactful when
+  // stack ids and their index in the StackIds vector, and a similar map used
+  // while constructing a the ContextSizeInfos vector. Mostly impactful when


teresajohnson · 2024-11-07T18:40:18Z

llvm/lib/Analysis/MemoryProfileInfo.cpp

  CallStackTrieNode *Curr = nullptr;
  for (auto StackId : StackIds) {
-    // If this is the first stack frame, add or update alloc node.
+    // errs() << StackId << " ";


teresajohnson · 2024-11-07T18:41:31Z

llvm/lib/Analysis/MemoryProfileInfo.cpp

-    MIBPayload.push_back(ValueAsMetadata::get(
-        ConstantInt::get(Type::getInt64Ty(Ctx), TotalSize)));
+  if (!ContextSizeInfo.empty()) {
+    for (auto Info : ContextSizeInfo) {


teresajohnson · 2024-11-07T18:45:01Z

llvm/lib/Bitcode/Reader/BitcodeReader.cpp

+        while (MIBsRead++ < NumMIBs) {
+          unsigned NumContextSizeInfoEntries = Record[I++];
+          assert(Record.size() - I >= NumContextSizeInfoEntries);
+          std::vector<unsigned> ContextSizeIndices;


teresajohnson · 2024-11-07T18:45:38Z

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp

+    assert(AI.ContextSizeInfoIndices.empty() ||
+           AI.ContextSizeInfoIndices.size() == AI.MIBs.size());
+    if (WriteContextSizeInfoIndex && !AI.ContextSizeInfoIndices.empty()) {
+      for (auto Indices : AI.ContextSizeInfoIndices) {


teresajohnson · 2024-11-07T18:46:31Z

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp

+    ContextSizeInfoAbbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8));
+    unsigned ContextSizeInfoAbbvId =
+        Stream.EmitAbbrev(std::move(ContextSizeInfoAbbv));
+    for (const auto &Info : Index->contextSizeInfos()) {


teresajohnson · 2024-11-07T18:50:33Z

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp

+    }
+    Stream.EmitRecord(bitc::FS_CONTEXT_SIZE_INFOS, NameVals,
+                      ContextSizeInfoAbbvId);
+    NameVals.clear();


Is it worth it? We reuse this vector multiple times for other records. And the lifetime is not so long that it is going to make any noticeable difference on memory (this is in the pre-lto link compile).

teresajohnson · 2024-11-07T18:52:09Z

llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp

+      auto CSI = ContextIdToContextSizeInfos.find(Id);
+      if (CSI != ContextIdToContextSizeInfos.end()) {
+        for (auto &Info : CSI->second) {
+          OS << getAllocTypeString((uint8_t)TypeI->second)


snehasish · 2024-11-07T20:28:05Z

Is it worth it? We reuse this vector multiple times for other records. And the lifetime is not so long that it is going to make any noticeable difference on memory (this is in the pre-lto link compile).

Probably not if it is reused. Ok to leave as is.

snehasish

lgtm.

I will keep an eye out for the overhead measurements though I don't consider them to be a blocker since we will only use this for debugging. Thanks!

teresajohnson · 2024-11-13T21:34:41Z

Update on overhead: I didn't see any significant time overheads, but the thin link had about 18% memory overhead and the bitcode IR+summary objects were about 8.5% larger (with summary ~30% larger). I did some work to reduce this overhead, with the following 2 realizations:

We were keeping the context id / size pairs in a global table in the in-memory and bitcode formats, and referring to them by index. This was meant to enable deduplication (e.g. when allocation have been cloned due to inlining, etc), but it turns out that this was inefficient on aggregate as there aren't that many deduplication opportunities in practice in the large profile I examined. I got improvements in both metrics by removing the separate tables and including the context id / size pairs directly in the allocation info summaries.
I then realized that the context id hashes are almost always nearly 64-bits in size, so the VBR8 encoding is not the most efficient, and a fixed width encoding is better overall. To enable encoding the context ids with a fixed width, I then pulled out the context ids into separate records, one per alloc info record, with the context ids listed in the same order as the sizes in the immediately following alloc info record to enable implicit correlation for reconstruction of the pairs when reading the bitcode. This reduced the bitcode sizes a bit further. Note that the largest fixed width size supported by the bitcode is 32 bits, so we record each context id as a pair of 32-bit fixed width values (this is the same approach taken for value GUIDs elsewhere in the summary).

With these improvements, we now have a bit less than 10% memory overhead and about 7.5% larger bitcode IR+summary objects (with summary about 22% larger).

snehasish

lgtm

snehasish · 2024-11-14T22:59:09Z

llvm/lib/Bitcode/Reader/BitcodeReader.cpp

+    case bitc::FS_ALLOC_CONTEXT_IDS: {
+      // This is an array of 32-bit fixed-width values, holding each 64-bit
+      // context id as a pair of adjacent (most significant first) 32-bit words.
+      assert(!(Record.size() % 2));


IMO Record.size() % 2 == 0 is a little easier to read.

llvmbot added PGO Profile Guided Optimizations LTO Link time optimization (regular/full LTO or ThinLTO) llvm:ir llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Oct 31, 2024

teresajohnson requested a review from snehasish October 31, 2024 20:56

clang format

cb1c098

snehasish reviewed Nov 1, 2024

View reviewed changes

Address comments

8d960cf

teresajohnson commented Nov 7, 2024

View reviewed changes

snehasish approved these changes Nov 7, 2024

View reviewed changes

Update format to reduce bitcode size and thin link memory overhead.

0d2f55a

snehasish approved these changes Nov 14, 2024

View reviewed changes

Address comment

68b327e

teresajohnson merged commit 9513f2f into llvm:main Nov 15, 2024
4 of 7 checks passed

[MemProf] Print full context hash when reporting hinted bytes #114465

[MemProf] Print full context hash when reporting hinted bytes #114465

Uh oh!

Conversation

teresajohnson commented Oct 31, 2024

Uh oh!

llvmbot commented Oct 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

snehasish left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

teresajohnson left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

snehasish commented Nov 7, 2024

Uh oh!

llvmbot commented Oct 31, 2024 •

edited

Loading

github-actions bot commented Oct 31, 2024 •

edited

Loading