Skip to content

Conversation

@kikairoya
Copy link
Contributor

In loop of writeAndReadCoverageRegions, OutputFunctions[I].Filenames references to contents of Filenames after returning from readCoverageRegions but Filenames will be cleared in next call of readCoverageRegions, causes dangling reference.
The lifetime of the contents of Filenames must be equal or longer than OutputFunctions[I], thus it has been moved into OutputFunctions[I] (typed OutputFunctionCoverageData).

In loop of `writeAndReadCoverageRegions`, `OutputFunctions[I].Filenames` references to
contents of `Filenames` after returning from `readCoverageRegions` but `Filenames` will be
cleared in next call of `readCoverageRegions`, causes dangling reference.
The lifetime of the contents of `Filenames` must be equal or longer than `OutputFunctions[I]`,
thus it has been moved into `OutputFunctions[I]` (typed `OutputFunctionCoverageData`).
@llvmbot llvmbot added the PGO Profile Guided Optimizations label Jul 4, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 4, 2025

@llvm/pr-subscribers-pgo

Author: Tomohiro Kashiwada (kikairoya)

Changes

In loop of writeAndReadCoverageRegions, OutputFunctions[I].Filenames references to contents of Filenames after returning from readCoverageRegions but Filenames will be cleared in next call of readCoverageRegions, causes dangling reference.
The lifetime of the contents of Filenames must be equal or longer than OutputFunctions[I], thus it has been moved into OutputFunctions[I] (typed OutputFunctionCoverageData).


Full diff: https://github.com/llvm/llvm-project/pull/147118.diff

1 Files Affected:

  • (modified) llvm/unittests/ProfileData/CoverageMappingTest.cpp (+8-9)
diff --git a/llvm/unittests/ProfileData/CoverageMappingTest.cpp b/llvm/unittests/ProfileData/CoverageMappingTest.cpp
index ec81e5f274efa..59c381a1b4e82 100644
--- a/llvm/unittests/ProfileData/CoverageMappingTest.cpp
+++ b/llvm/unittests/ProfileData/CoverageMappingTest.cpp
@@ -64,6 +64,7 @@ namespace {
 struct OutputFunctionCoverageData {
   StringRef Name;
   uint64_t Hash;
+  std::vector<std::string> FilenamesStorage;
   std::vector<StringRef> Filenames;
   std::vector<CounterMappingRegion> Regions;
   std::vector<CounterExpression> Expressions;
@@ -71,8 +72,10 @@ struct OutputFunctionCoverageData {
   OutputFunctionCoverageData() : Hash(0) {}
 
   OutputFunctionCoverageData(OutputFunctionCoverageData &&OFCD)
-      : Name(OFCD.Name), Hash(OFCD.Hash), Filenames(std::move(OFCD.Filenames)),
-        Regions(std::move(OFCD.Regions)) {}
+      : Name(OFCD.Name), Hash(OFCD.Hash),
+        FilenamesStorage(std::move(OFCD.FilenamesStorage)),
+        Filenames(std::move(OFCD.Filenames)), Regions(std::move(OFCD.Regions)) {
+  }
 
   OutputFunctionCoverageData(const OutputFunctionCoverageData &) = delete;
   OutputFunctionCoverageData &
@@ -135,7 +138,6 @@ struct InputFunctionCoverageData {
 struct CoverageMappingTest : ::testing::TestWithParam<std::tuple<bool, bool>> {
   bool UseMultipleReaders;
   StringMap<unsigned> Files;
-  std::vector<std::string> Filenames;
   std::vector<InputFunctionCoverageData> InputFunctions;
   std::vector<OutputFunctionCoverageData> OutputFunctions;
 
@@ -233,13 +235,10 @@ struct CoverageMappingTest : ::testing::TestWithParam<std::tuple<bool, bool>> {
 
   void readCoverageRegions(const std::string &Coverage,
                            OutputFunctionCoverageData &Data) {
-    // We will re-use the StringRef in duplicate tests, clear it to avoid
-    // clobber previous ones.
-    Filenames.clear();
-    Filenames.resize(Files.size() + 1);
+    Data.FilenamesStorage.resize(Files.size() + 1);
     for (const auto &E : Files)
-      Filenames[E.getValue()] = E.getKey().str();
-    ArrayRef<std::string> FilenameRefs = llvm::ArrayRef(Filenames);
+      Data.FilenamesStorage[E.getValue()] = E.getKey().str();
+    ArrayRef<std::string> FilenameRefs = llvm::ArrayRef(Data.FilenamesStorage);
     RawCoverageMappingReader Reader(Coverage, FilenameRefs, Data.Filenames,
                                     Data.Expressions, Data.Regions);
     EXPECT_THAT_ERROR(Reader.read(), Succeeded());

@kikairoya
Copy link
Contributor Author

Gentle ping.
This causes the test to fail unexpectedly on Cygwin, which uses old CoW std::string.

@jeremyd2019
Copy link
Contributor

Oh, is this the cause of the failures of the ProfileData test failures on Cygwin?

@jeremyd2019 jeremyd2019 requested a review from mstorsjo August 11, 2025 03:32
@kikairoya
Copy link
Contributor Author

Yes. I got an expectation error, 3 vs 5.

Copy link
Contributor

@jeremyd2019 jeremyd2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I'd appreciate another review from someone else because I don't have a firm handle on what's going on here

@mstorsjo
Copy link
Member

Gentle ping. This causes the test to fail unexpectedly on Cygwin, which uses old CoW std::string.

The fix looks reasonable, but I wonder why this issue hasn't been observed elsewhere. If I understand it correctly, this issue should be observable anywhere by building and running this unit test with address sanitizer, no? Or does it specifically require having the old libstdc++ CoW std::string implementation in order to observe the issue?

If the issue is having an ArrayRef() pointing to an std::vector<std::string> which has been cleared, it sounds to me like it indeed would be observeable anywhere.

@mstorsjo mstorsjo requested a review from MaskRay August 11, 2025 10:49
@kikairoya
Copy link
Contributor Author

If I understand it correctly, this issue should be observable anywhere by building and running this unit test with address sanitizer, no?

Ideally, yes.
However, in my understanding by reading implementation of libstdc++ and libc++, std::vector::clear doesn't deallocate memory, but only tells the address sanitizer that the valid area has been shrunk.
Therefore, asan can't catch use-after-free if resize (it tells asan valid area will be expanded) immediately follows clear because there is no one accessing the freed memory.

If there were a sanitizer capable of detecting access to destroyed objects via a pseudo-destructor call, this issue should be detected.

StringRef Name;
uint64_t Hash;
std::vector<std::string> FilenamesStorage;
std::vector<StringRef> Filenames;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not store the strings directly in Filenames rather than having a separate storage?

Suggested change
std::vector<StringRef> Filenames;
std::vector<std::string> Filenames;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filenames will be referenced by CoverageMappingRecord::Filenames, which is ArrayRef<StringRef>.

// clobber previous ones.
Filenames.clear();
Filenames.resize(Files.size() + 1);
Data.FilenamesStorage.resize(Files.size() + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why + 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Files (filename to index map) uses 1-based index. I don't know why it does so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to say this in a comment to make it clear the +1 isn't a bug

@jeremyd2019
Copy link
Contributor

Ping? I've lost track of the status of this...

@mstorsjo
Copy link
Member

@ellishg Can you follow up here - any further objection to merging this?

Copy link
Member

@mstorsjo mstorsjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will merge soon if there's no further comments here.

@ellishg
Copy link
Contributor

ellishg commented Sep 11, 2025

@ellishg Can you follow up here - any further objection to merging this?

Sorry I missed this. LGTM

// clobber previous ones.
Filenames.clear();
Filenames.resize(Files.size() + 1);
Data.FilenamesStorage.resize(Files.size() + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to say this in a comment to make it clear the +1 isn't a bug

@mstorsjo mstorsjo merged commit ca09801 into llvm:main Sep 11, 2025
9 checks passed
@kikairoya kikairoya deleted the unittest-dangling-reference branch September 11, 2025 22:02
jeremyd2019 pushed a commit to jeremyd2019/llvm-project that referenced this pull request Sep 14, 2025
…47118)

In loop of `writeAndReadCoverageRegions`, `OutputFunctions[I].Filenames`
references to contents of `Filenames` after returning from
`readCoverageRegions` but `Filenames` will be cleared in next call of
`readCoverageRegions`, causes dangling reference.
The lifetime of the contents of `Filenames` must be equal or longer than
`OutputFunctions[I]`, thus it has been moved into `OutputFunctions[I]`
(typed `OutputFunctionCoverageData`).

(cherry picked from commit ca09801)
jeremyd2019 pushed a commit to jeremyd2019/llvm-project that referenced this pull request Sep 23, 2025
…47118)

In loop of `writeAndReadCoverageRegions`, `OutputFunctions[I].Filenames`
references to contents of `Filenames` after returning from
`readCoverageRegions` but `Filenames` will be cleared in next call of
`readCoverageRegions`, causes dangling reference.
The lifetime of the contents of `Filenames` must be equal or longer than
`OutputFunctions[I]`, thus it has been moved into `OutputFunctions[I]`
(typed `OutputFunctionCoverageData`).

(cherry picked from commit ca09801)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PGO Profile Guided Optimizations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants