Skip to content

Conversation

@kazutakahirata
Copy link
Contributor

This patch adds YAML read/write support to llvm-profdata. The primary
intent is to accommodate MemProf profiles in test cases, thereby
avoiding the binary format.

The read support is via llvm-profdata merge. This is useful when we
want to verify that the compiler does the right thing on a given .ll
file and a MemProf profile in a test case. In the test case, we would
convert the MemProf profile in YAML to an indexed profile and invoke
the compiler on the .ll file along with the indexed profile.

The write support is via llvm-profdata show --memory. This is useful
when we wish to convert an indexed MemProf profile to YAML while
writing tests. We would compile a test case in C++, run it for an
indexed MemProf profile, and then convert it to the text format.

This patch adds YAML read/write support to llvm-profdata.  The primary
intent is to accommodate MemProf profiles in test cases, thereby
avoiding the binary format.

The read support is via llvm-profdata merge.  This is useful when we
want to verify that the compiler does the right thing on a given .ll
file and a MemProf profile in a test case.  In the test case, we would
convert the MemProf profile in YAML to an indexed profile and invoke
the compiler on the .ll file along with the indexed profile.

The write support is via llvm-profdata show --memory.  This is useful
when we wish to convert an indexed MemProf profile to YAML while
writing tests.  We would compile a test case in C++, run it for an
indexed MemProf profile, and then convert it to the text format.
@llvmbot llvmbot added the PGO Profile Guided Optimizations label Dec 6, 2024
@llvmbot
Copy link
Member

llvmbot commented Dec 6, 2024

@llvm/pr-subscribers-pgo

Author: Kazu Hirata (kazutakahirata)

Changes

This patch adds YAML read/write support to llvm-profdata. The primary
intent is to accommodate MemProf profiles in test cases, thereby
avoiding the binary format.

The read support is via llvm-profdata merge. This is useful when we
want to verify that the compiler does the right thing on a given .ll
file and a MemProf profile in a test case. In the test case, we would
convert the MemProf profile in YAML to an indexed profile and invoke
the compiler on the .ll file along with the indexed profile.

The write support is via llvm-profdata show --memory. This is useful
when we wish to convert an indexed MemProf profile to YAML while
writing tests. We would compile a test case in C++, run it for an
indexed MemProf profile, and then convert it to the text format.


Full diff: https://github.com/llvm/llvm-project/pull/118915.diff

6 Files Affected:

  • (modified) llvm/include/llvm/ProfileData/InstrProfReader.h (+7)
  • (modified) llvm/include/llvm/ProfileData/MemProfReader.h (+15)
  • (modified) llvm/lib/ProfileData/InstrProfReader.cpp (+8)
  • (modified) llvm/lib/ProfileData/MemProfReader.cpp (+30)
  • (added) llvm/test/tools/llvm-profdata/memprof-yaml.test (+33)
  • (modified) llvm/tools/llvm-profdata/llvm-profdata.cpp (+77-12)
diff --git a/llvm/include/llvm/ProfileData/InstrProfReader.h b/llvm/include/llvm/ProfileData/InstrProfReader.h
index 330cf540c099be..95a8171021b431 100644
--- a/llvm/include/llvm/ProfileData/InstrProfReader.h
+++ b/llvm/include/llvm/ProfileData/InstrProfReader.h
@@ -716,6 +716,9 @@ class IndexedMemProfReader {
 
   DenseMap<uint64_t, SmallVector<memprof::CallEdgeTy, 0>>
   getMemProfCallerCalleePairs() const;
+
+  // Return a vector of all GUIDs that we have corresponding MemProfRecords for.
+  SmallVector<uint64_t, 0> getMemProfRecordKeys() const;
 };
 
 /// Reader for the indexed binary instrprof format.
@@ -823,6 +826,10 @@ class IndexedInstrProfReader : public InstrProfReader {
     return MemProfReader.getMemProfCallerCalleePairs();
   }
 
+  SmallVector<uint64_t, 0> getMemProfRecordKeys() {
+    return MemProfReader.getMemProfRecordKeys();
+  }
+
   /// Fill Counts with the profile data for the given function name.
   Error getFunctionCounts(StringRef FuncName, uint64_t FuncHash,
                           std::vector<uint64_t> &Counts);
diff --git a/llvm/include/llvm/ProfileData/MemProfReader.h b/llvm/include/llvm/ProfileData/MemProfReader.h
index 0529f794606465..421985b55056c2 100644
--- a/llvm/include/llvm/ProfileData/MemProfReader.h
+++ b/llvm/include/llvm/ProfileData/MemProfReader.h
@@ -213,6 +213,21 @@ class RawMemProfReader final : public MemProfReader {
 class YAMLMemProfReader final : public MemProfReader {
 public:
   YAMLMemProfReader() = default;
+
+  // Return true if the \p DataBuffer starts with magic bytes indicating it is
+  // a raw binary memprof profile.
+  static bool hasFormat(const MemoryBuffer &DataBuffer);
+  // Return true if the file at \p Path starts with magic bytes indicating it is
+  // a raw binary memprof profile.
+  static bool hasFormat(const StringRef Path);
+
+  // Create a RawMemProfReader after sanity checking the contents of the file at
+  // \p Path or the \p Buffer. The binary from which the profile has been
+  // collected is specified via a path in \p ProfiledBinary.
+  static Expected<std::unique_ptr<YAMLMemProfReader>> create(const Twine &Path);
+  static Expected<std::unique_ptr<YAMLMemProfReader>>
+  create(std::unique_ptr<MemoryBuffer> Buffer);
+
   void parse(StringRef YAMLData);
 };
 } // namespace memprof
diff --git a/llvm/lib/ProfileData/InstrProfReader.cpp b/llvm/lib/ProfileData/InstrProfReader.cpp
index dad79b2c1761e9..cd10429cd742b5 100644
--- a/llvm/lib/ProfileData/InstrProfReader.cpp
+++ b/llvm/lib/ProfileData/InstrProfReader.cpp
@@ -1664,6 +1664,14 @@ IndexedMemProfReader::getMemProfCallerCalleePairs() const {
   return Pairs;
 }
 
+SmallVector<uint64_t, 0> IndexedMemProfReader::getMemProfRecordKeys() const {
+  SmallVector<uint64_t, 0> Keys;
+  Keys.reserve(MemProfRecordTable->getNumEntries());
+  for (uint64_t Key : MemProfRecordTable->keys())
+    Keys.push_back(Key);
+  return Keys;
+}
+
 Error IndexedInstrProfReader::getFunctionCounts(StringRef FuncName,
                                                 uint64_t FuncHash,
                                                 std::vector<uint64_t> &Counts) {
diff --git a/llvm/lib/ProfileData/MemProfReader.cpp b/llvm/lib/ProfileData/MemProfReader.cpp
index 9dacf298985937..8325fc75aaa852 100644
--- a/llvm/lib/ProfileData/MemProfReader.cpp
+++ b/llvm/lib/ProfileData/MemProfReader.cpp
@@ -755,6 +755,36 @@ Error RawMemProfReader::readNextRecord(
   return MemProfReader::readNextRecord(GuidRecord, IdToFrameCallback);
 }
 
+Expected<std::unique_ptr<YAMLMemProfReader>>
+YAMLMemProfReader::create(const Twine &Path) {
+  auto BufferOr = MemoryBuffer::getFileOrSTDIN(Path);
+  if (std::error_code EC = BufferOr.getError())
+    return report(errorCodeToError(EC), Path.getSingleStringRef());
+
+  std::unique_ptr<MemoryBuffer> Buffer(BufferOr.get().release());
+  return create(std::move(Buffer));
+}
+
+Expected<std::unique_ptr<YAMLMemProfReader>>
+YAMLMemProfReader::create(std::unique_ptr<MemoryBuffer> Buffer) {
+  std::unique_ptr<YAMLMemProfReader> Reader(new YAMLMemProfReader());
+  Reader->parse(Buffer->getBuffer());
+  return std::move(Reader);
+}
+
+bool YAMLMemProfReader::hasFormat(const StringRef Path) {
+  auto BufferOr = MemoryBuffer::getFileOrSTDIN(Path);
+  if (!BufferOr)
+    return false;
+
+  std::unique_ptr<MemoryBuffer> Buffer(BufferOr.get().release());
+  return hasFormat(*Buffer);
+}
+
+bool YAMLMemProfReader::hasFormat(const MemoryBuffer &Buffer) {
+  return Buffer.getBuffer().starts_with("---");
+}
+
 void YAMLMemProfReader::parse(StringRef YAMLData) {
   memprof::AllMemProfData Doc;
   yaml::Input Yin(YAMLData);
diff --git a/llvm/test/tools/llvm-profdata/memprof-yaml.test b/llvm/test/tools/llvm-profdata/memprof-yaml.test
new file mode 100644
index 00000000000000..9875faf355582a
--- /dev/null
+++ b/llvm/test/tools/llvm-profdata/memprof-yaml.test
@@ -0,0 +1,33 @@
+; RUN: split-file %s %t
+; RUN: llvm-profdata merge %t/memprof-in.yaml -o %t/memprof-out.indexed
+; RUN: llvm-profdata show --memory %t/memprof-out.indexed > %t/memprof-out.yaml
+; RUN: cmp %t/memprof-in.yaml %t/memprof-out.yaml
+
+; Verify that the YAML output is identical to the YAML input.
+;--- memprof-in.yaml
+---
+HeapProfileRecords:
+  - GUID:            16045690981402826360
+    AllocSites:
+      - Callstack:
+          - { Function: 100, LineOffset: 11, Column: 10, IsInlineFrame: true }
+          - { Function: 200, LineOffset: 22, Column: 20, IsInlineFrame: false }
+        MemInfoBlock:
+          AllocCount:      111
+          TotalSize:       222
+          TotalLifetime:   333
+          TotalLifetimeAccessDensity: 444
+      - Callstack:
+          - { Function: 300, LineOffset: 33, Column: 30, IsInlineFrame: false }
+          - { Function: 400, LineOffset: 44, Column: 40, IsInlineFrame: true }
+        MemInfoBlock:
+          AllocCount:      555
+          TotalSize:       666
+          TotalLifetime:   777
+          TotalLifetimeAccessDensity: 888
+    CallSites:
+      - - { Function: 500, LineOffset: 55, Column: 50, IsInlineFrame: true }
+        - { Function: 600, LineOffset: 66, Column: 60, IsInlineFrame: false }
+      - - { Function: 700, LineOffset: 77, Column: 70, IsInlineFrame: true }
+        - { Function: 800, LineOffset: 88, Column: 80, IsInlineFrame: false }
+...
diff --git a/llvm/tools/llvm-profdata/llvm-profdata.cpp b/llvm/tools/llvm-profdata/llvm-profdata.cpp
index 1d9d7bcf765496..69973db5328534 100644
--- a/llvm/tools/llvm-profdata/llvm-profdata.cpp
+++ b/llvm/tools/llvm-profdata/llvm-profdata.cpp
@@ -723,6 +723,35 @@ loadInput(const WeightedFile &Input, SymbolRemapper *Remapper,
     return;
   }
 
+  using ::llvm::memprof::YAMLMemProfReader;
+  if (YAMLMemProfReader::hasFormat(Input.Filename)) {
+    auto ReaderOrErr = YAMLMemProfReader::create(Input.Filename);
+    if (!ReaderOrErr) {
+      exitWithError(ReaderOrErr.takeError(), Input.Filename);
+    }
+    std::unique_ptr<YAMLMemProfReader> Reader = std::move(ReaderOrErr.get());
+    // Check if the profile types can be merged, e.g. clang frontend profiles
+    // should not be merged with memprof profiles.
+    if (Error E = WC->Writer.mergeProfileKind(Reader->getProfileKind())) {
+      consumeError(std::move(E));
+      WC->Errors.emplace_back(
+          make_error<StringError>(
+              "Cannot merge MemProf profile with Clang generated profile.",
+              std::error_code()),
+          Filename);
+      return;
+    }
+
+    auto MemProfError = [&](Error E) {
+      auto [ErrorCode, Msg] = InstrProfError::take(std::move(E));
+      WC->Errors.emplace_back(make_error<InstrProfError>(ErrorCode, Msg),
+                              Filename);
+    };
+
+    WC->Writer.addMemProfData(Reader->takeMemProfData(), MemProfError);
+    return;
+  }
+
   auto FS = vfs::getRealFileSystem();
   // TODO: This only saves the first non-fatal error from InstrProfReader, and
   // then added to WriterContext::Errors. However, this is not extensible, if
@@ -3242,18 +3271,54 @@ static int showSampleProfile(ShowFormat SFormat, raw_fd_ostream &OS) {
 static int showMemProfProfile(ShowFormat SFormat, raw_fd_ostream &OS) {
   if (SFormat == ShowFormat::Json)
     exitWithError("JSON output is not supported for MemProf");
-  auto ReaderOr = llvm::memprof::RawMemProfReader::create(
-      Filename, ProfiledBinary, /*KeepNames=*/true);
-  if (Error E = ReaderOr.takeError())
-    // Since the error can be related to the profile or the binary we do not
-    // pass whence. Instead additional context is provided where necessary in
-    // the error message.
-    exitWithError(std::move(E), /*Whence*/ "");
-
-  std::unique_ptr<llvm::memprof::RawMemProfReader> Reader(
-      ReaderOr.get().release());
-
-  Reader->printYAML(OS);
+
+  // Load the file to check the magic bytes.
+  llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> BufferOrError =
+      llvm::MemoryBuffer::getFile(Filename);
+  if (auto EC = BufferOrError.getError())
+    exitWithError("Error opening profile file '" + Filename + "'");
+  auto Buffer = std::move(BufferOrError.get());
+
+  // Show the raw profile in YAML.
+  if (memprof::RawMemProfReader::hasFormat(*Buffer)) {
+    auto ReaderOr = llvm::memprof::RawMemProfReader::create(
+        Filename, ProfiledBinary, /*KeepNames=*/true);
+    if (Error E = ReaderOr.takeError())
+      // Since the error can be related to the profile or the binary we do not
+      // pass whence. Instead additional context is provided where necessary in
+      // the error message.
+      exitWithError(std::move(E), /*Whence*/ "");
+
+    std::unique_ptr<llvm::memprof::RawMemProfReader> Reader(
+        ReaderOr.get().release());
+
+    Reader->printYAML(OS);
+    return 0;
+  }
+
+  // Show the indexed MemProf profile in YAML.
+  auto FS = vfs::getRealFileSystem();
+  auto ReaderOrErr = IndexedInstrProfReader::create(Filename, *FS);
+  if (Error E = ReaderOrErr.takeError())
+    exitWithError(std::move(E), Filename);
+
+  auto Reader = std::move(ReaderOrErr.get());
+
+  // Build pairs of GUID and MemProfRecord.
+  memprof::AllMemProfData Data;
+  for (const uint64_t Key : Reader->getMemProfRecordKeys()) {
+    auto Record = Reader->getMemProfRecord(Key);
+    if (Record.takeError())
+      continue;
+    memprof::GUIDMemProfRecordPair Pair;
+    Pair.GUID = Key;
+    Pair.Record = std::move(*Record);
+    Data.HeapProfileRecords.push_back(std::move(Pair));
+  }
+
+  yaml::Output Yout(OS);
+  Yout << Data;
+
   return 0;
 }
 

@carlocab
Copy link
Member

carlocab commented Dec 6, 2024

Do you have a sense for approximately how many existing test cases would be simplified by adding this support? Or is this meant to support tests to be added in the future?

@teresajohnson
Copy link
Contributor

Do you have a sense for approximately how many existing test cases would be simplified by adding this support? Or is this meant to support tests to be added in the future?

All of the tests that currently have a ".memprofraw" input file, where there are companion update scripts in the test directories as they need to be manually updated when there are changes or new tests added. This is a fairly new profile type within LLVM, which @kazutakahirata is bringing up to par with other profile types that support textual input. This will greatly ease the addition of future tests as well.

@github-actions
Copy link

github-actions bot commented Dec 6, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@kazutakahirata
Copy link
Contributor Author

I think I've addressed all the comments. Please take another look. Thanks!

@kazutakahirata
Copy link
Contributor Author

I've revised the patch. Please take another look. Thanks!

Copy link

@snehasish snehasish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm modulo @teresajohnson's comment - I think a valid YAML file interpreted as an invalid profile would yield an empty indexed profile. So maybe we could add an assertion somewhere that the profile is non-empty?

@kazutakahirata kazutakahirata merged commit 684e79f into llvm:main Dec 8, 2024
8 checks passed
@kazutakahirata kazutakahirata deleted the memprof_YAML_llvm_profdata branch December 8, 2024 04:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PGO Profile Guided Optimizations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants