Skip to content

Conversation

@qiongsiwu
Copy link
Contributor

@qiongsiwu qiongsiwu commented Jun 13, 2025

DependencyScanningFileSystemSharedCache can currently report out-of-date negatively stat cached paths. This PR enhances the reporting with two modifications.

  1. The reported path are now null terminated char arrays instead of StringRefs. This way the API's user can avoid copying StringRefs to other containers because the char arrays can be used directly.
  2. The API now reports out-of-date cache entry due to file size changes. Specifically, we check each file's cached size against the size of the same file on the underlying FS. If the sizes are different, diagnostics will be reported.

rdar://152247357

@qiongsiwu qiongsiwu marked this pull request as ready for review June 13, 2025 16:08
@qiongsiwu qiongsiwu requested a review from Bigcheese June 13, 2025 16:08
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Jun 13, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 13, 2025

@llvm/pr-subscribers-clang

Author: Qiongsi Wu (qiongsiwu)

Changes

DependencyScanningFileSystemSharedCache can currently report invalid negative stat cached paths. This PR enhances the reporting with two modifications.

  1. The reported path are now null terminated char arrays instead of StringRefs. This way the API's user can avoid copying StringRefs to other containers because the char arrays can be used directly.
  2. The API now reports invalid cache entry due to file size changes. Specifically, we check each file's cached size against the size of the same file on the underlying FS. If the sizes are different, diagnostics will be reported.

Full diff: https://github.com/llvm/llvm-project/pull/144105.diff

3 Files Affected:

  • (modified) clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h (+35-7)
  • (modified) clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp (+18-10)
  • (modified) clang/unittests/Tooling/DependencyScanning/DependencyScanningFilesystemTest.cpp (+35-6)
diff --git a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
index a20a89a4c2b76..e0656676fefff 100644
--- a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
+++ b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
@@ -220,13 +220,41 @@ class DependencyScanningFilesystemSharedCache {
   CacheShard &getShardForFilename(StringRef Filename) const;
   CacheShard &getShardForUID(llvm::sys::fs::UniqueID UID) const;
 
-  /// Visits all cached entries and re-stat an entry using FS if
-  /// it is negatively stat cached. If re-stat succeeds on a path,
-  /// the path is added to InvalidPaths, indicating that the cache
-  /// may have erroneously negatively cached it. The caller can then
-  /// use InvalidPaths to issue diagnostics.
-  std::vector<StringRef>
-  getInvalidNegativeStatCachedPaths(llvm::vfs::FileSystem &UnderlyingFS) const;
+  struct InvalidEntryDiagInfo {
+    // A null terminated string that contains a path.
+    const char *Path = nullptr;
+
+    enum class Type : unsigned char { NegativeCaching = 1, SizeChanged = 2 };
+
+    Type T;
+
+    struct SizeChangedInfo {
+      uint64_t CachedSize = 0;
+      uint64_t ActualSize = 0;
+    };
+
+    std::optional<SizeChangedInfo> SizeInfo;
+
+    InvalidEntryDiagInfo(const char *Path) : Path(Path) {
+      T = Type::NegativeCaching;
+    }
+
+    InvalidEntryDiagInfo(const char *Path, uint64_t CachedSize,
+                         uint64_t ActualSize)
+        : Path(Path), SizeInfo({CachedSize, ActualSize}) {
+      T = Type::SizeChanged;
+    }
+  };
+
+  /// Visits all cached entries and re-stat an entry using UnderlyingFS to check
+  /// if the cache contains invalid entries. An entry can be invalid for two
+  /// reasons:
+  ///  1. The entry contains a stat error, indicating the file did not exist
+  ///     in the cache, but the file exists on the UnderlyingFS.
+  ///  2. The entry is associated with a file whose size is different from the
+  ///     actual size on the UnderlyingFS.
+  std::vector<InvalidEntryDiagInfo>
+  getInvalidEntryDiagInfo(llvm::vfs::FileSystem &UnderlyingFS) const;
 
 private:
   std::unique_ptr<CacheShard[]> CacheShards;
diff --git a/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp b/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
index 140833050f4e9..64b41a482f921 100644
--- a/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
+++ b/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
@@ -107,31 +107,39 @@ DependencyScanningFilesystemSharedCache::getShardForUID(
   return CacheShards[Hash % NumShards];
 }
 
-std::vector<StringRef>
-DependencyScanningFilesystemSharedCache::getInvalidNegativeStatCachedPaths(
+std::vector<DependencyScanningFilesystemSharedCache::InvalidEntryDiagInfo>
+DependencyScanningFilesystemSharedCache::getInvalidEntryDiagInfo(
     llvm::vfs::FileSystem &UnderlyingFS) const {
   // Iterate through all shards and look for cached stat errors.
-  std::vector<StringRef> InvalidPaths;
+  std::vector<InvalidEntryDiagInfo> InvalidDiagInfo;
   for (unsigned i = 0; i < NumShards; i++) {
     const CacheShard &Shard = CacheShards[i];
     std::lock_guard<std::mutex> LockGuard(Shard.CacheLock);
     for (const auto &[Path, CachedPair] : Shard.CacheByFilename) {
       const CachedFileSystemEntry *Entry = CachedPair.first;
 
-      if (Entry->getError()) {
-        // Only examine cached errors.
-        llvm::ErrorOr<llvm::vfs::Status> Stat = UnderlyingFS.status(Path);
-        if (Stat) {
+      llvm::ErrorOr<llvm::vfs::Status> Status = UnderlyingFS.status(Path);
+      if (Status) {
+        if (Entry->getError()) {
           // This is the case where we have cached the non-existence
-          // of the file at Path first, and a a file at the path is created
+          // of the file at Path first, and a file at the path is created
           // later. The cache entry is not invalidated (as we have no good
           // way to do it now), which may lead to missing file build errors.
-          InvalidPaths.push_back(Path);
+          InvalidDiagInfo.emplace_back(Path.data());
+        } else {
+          llvm::vfs::Status CachedStatus = Entry->getStatus();
+          uint64_t CachedSize = CachedStatus.getSize();
+          uint64_t ActualSize = Status->getSize();
+          if (CachedSize != ActualSize) {
+            // This is the case where the cached file has a different size
+            // from the actual file that comes from the underlying FS.
+            InvalidDiagInfo.emplace_back(Path.data(), CachedSize, ActualSize);
+          }
         }
       }
     }
   }
-  return InvalidPaths;
+  return InvalidDiagInfo;
 }
 
 const CachedFileSystemEntry *
diff --git a/clang/unittests/Tooling/DependencyScanning/DependencyScanningFilesystemTest.cpp b/clang/unittests/Tooling/DependencyScanning/DependencyScanningFilesystemTest.cpp
index 7420743c97a2a..068212696943a 100644
--- a/clang/unittests/Tooling/DependencyScanning/DependencyScanningFilesystemTest.cpp
+++ b/clang/unittests/Tooling/DependencyScanning/DependencyScanningFilesystemTest.cpp
@@ -187,18 +187,47 @@ TEST(DependencyScanningFilesystem, DiagnoseStaleStatFailures) {
   DependencyScanningWorkerFilesystem DepFS(SharedCache, InMemoryFS);
 
   bool Path1Exists = DepFS.exists("/path1.suffix");
-  EXPECT_EQ(Path1Exists, false);
+  ASSERT_EQ(Path1Exists, false);
 
   // Adding a file that has been stat-ed,
   InMemoryFS->addFile("/path1.suffix", 0, llvm::MemoryBuffer::getMemBuffer(""));
   Path1Exists = DepFS.exists("/path1.suffix");
   // Due to caching in SharedCache, path1 should not exist in
   // DepFS's eyes.
-  EXPECT_EQ(Path1Exists, false);
+  ASSERT_EQ(Path1Exists, false);
 
-  std::vector<llvm::StringRef> InvalidPaths =
-      SharedCache.getInvalidNegativeStatCachedPaths(*InMemoryFS);
+  auto InvalidEntries = SharedCache.getInvalidEntryDiagInfo(*InMemoryFS);
 
-  EXPECT_EQ(InvalidPaths.size(), 1u);
-  ASSERT_STREQ("/path1.suffix", InvalidPaths[0].str().c_str());
+  EXPECT_EQ(InvalidEntries.size(), 1u);
+  ASSERT_STREQ("/path1.suffix", InvalidEntries[0].Path);
+}
+
+TEST(DependencyScanningFilesystem, DiagnoseCachedFileSizeChange) {
+  auto InMemoryFS1 = llvm::makeIntrusiveRefCnt<llvm::vfs::InMemoryFileSystem>();
+  auto InMemoryFS2 = llvm::makeIntrusiveRefCnt<llvm::vfs::InMemoryFileSystem>();
+  InMemoryFS1->setCurrentWorkingDirectory("/");
+  InMemoryFS2->setCurrentWorkingDirectory("/");
+
+  DependencyScanningFilesystemSharedCache SharedCache;
+  DependencyScanningWorkerFilesystem DepFS(SharedCache, InMemoryFS1);
+
+  InMemoryFS1->addFile("/path1.suffix", 0,
+                       llvm::MemoryBuffer::getMemBuffer(""));
+  bool Path1Exists = DepFS.exists("/path1.suffix");
+  ASSERT_EQ(Path1Exists, true);
+
+  // Add a file to a new FS that has the same path but different content.
+  InMemoryFS2->addFile("/path1.suffix", 1,
+                       llvm::MemoryBuffer::getMemBuffer("        "));
+
+  // Check against the new file system. InMemoryFS2 could be the underlying
+  // physical system in the real world.
+  auto InvalidEntries = SharedCache.getInvalidEntryDiagInfo(*InMemoryFS2);
+
+  ASSERT_EQ(InvalidEntries.size(), 1u);
+  ASSERT_STREQ("/path1.suffix", InvalidEntries[0].Path);
+  auto SizeInfo = InvalidEntries[0].SizeInfo;
+  ASSERT_TRUE(SizeInfo);
+  ASSERT_EQ(SizeInfo->CachedSize, 0u);
+  ASSERT_EQ(SizeInfo->ActualSize, 8u);
 }

@qiongsiwu qiongsiwu requested a review from jansvoboda11 June 23, 2025 16:34
Copy link
Contributor

@jansvoboda11 jansvoboda11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one nit suggestion. There seems to be a build failure in CI due to revert of @Bigcheese's commit, so that probably needs to be taken care of before merging. I'd also like to see the downstream PR before merging this one just so we know things integrate nicely with the C API.

@qiongsiwu
Copy link
Contributor Author

qiongsiwu commented Jun 24, 2025

LGTM with one nit suggestion. There seems to be a build failure in CI due to revert of @Bigcheese's commit, so that probably needs to be taken care of before merging. I'd also like to see the downstream PR before merging this one just so we know things integrate nicely with the C API.

Thanks!

Here is a commit that implements the C-APIs as a draft. It is intended to demonstrate how the C-API is implemented on top of this PR. Testing/documentations are missing and I will add them next.

@jansvoboda11
Copy link
Contributor

Here is a commit that implements the C-APIs as a draft. It is intended to demonstrate how the C-API is implemented on top of this PR. Testing/documentations are missing and I will add them next.

Looks good, thanks!

@qiongsiwu qiongsiwu merged commit 51d1385 into llvm:main Jun 24, 2025
7 checks passed
anthonyhatran pushed a commit to anthonyhatran/llvm-project that referenced this pull request Jun 26, 2025
…4105)

`DependencyScanningFileSystemSharedCache` can currently report
out-of-date negatively stat cached paths. This PR enhances the reporting
with two modifications.

1. The reported path are now null terminated char arrays instead of
`StringRef`s. This way the API's user can avoid copying `StringRef`s to
other containers because the char arrays can be used directly.
2. The API now reports out-of-date cache entry due to file size changes.
Specifically, we check each file's cached size against the size of the
same file on the underlying FS. If the sizes are different, diagnostics
will be reported.

rdar://152247357
qiongsiwu added a commit to swiftlang/llvm-project that referenced this pull request Jul 11, 2025
…try Reporting C-APIs (#10927)

This PR implements the C-APIs to report a scanning file system cache's out-of-date entries. The C-APIs contains a function to return a set of file system cache out-of-date entries, functions to facilitate looping through all the entries, and reporting the relevant information from the entries. 

The APIs are based on llvm#144105. 

rdar://152247357
qiongsiwu added a commit to qiongsiwu/llvm-project that referenced this pull request Jul 25, 2025
…4105)

`DependencyScanningFileSystemSharedCache` can currently report
out-of-date negatively stat cached paths. This PR enhances the reporting
with two modifications.

1. The reported path are now null terminated char arrays instead of
`StringRef`s. This way the API's user can avoid copying `StringRef`s to
other containers because the char arrays can be used directly.
2. The API now reports out-of-date cache entry due to file size changes.
Specifically, we check each file's cached size against the size of the
same file on the underlying FS. If the sizes are different, diagnostics
will be reported.

rdar://152247357
(cherry picked from commit 51d1385)
qiongsiwu added a commit to qiongsiwu/llvm-project that referenced this pull request Jul 25, 2025
…try Reporting C-APIs (llvm#10927)

This PR implements the C-APIs to report a scanning file system cache's out-of-date entries. The C-APIs contains a function to return a set of file system cache out-of-date entries, functions to facilitate looping through all the entries, and reporting the relevant information from the entries.

The APIs are based on llvm#144105.

rdar://152247357
(cherry picked from commit 6fbcea6)
qiongsiwu added a commit to swiftlang/llvm-project that referenced this pull request Jul 28, 2025
…4105)

`DependencyScanningFileSystemSharedCache` can currently report
out-of-date negatively stat cached paths. This PR enhances the reporting
with two modifications.

1. The reported path are now null terminated char arrays instead of
`StringRef`s. This way the API's user can avoid copying `StringRef`s to
other containers because the char arrays can be used directly.
2. The API now reports out-of-date cache entry due to file size changes.
Specifically, we check each file's cached size against the size of the
same file on the underlying FS. If the sizes are different, diagnostics
will be reported.

rdar://152247357
(cherry picked from commit 51d1385)
qiongsiwu added a commit to swiftlang/llvm-project that referenced this pull request Jul 28, 2025
…try Reporting C-APIs (#10927)

This PR implements the C-APIs to report a scanning file system cache's out-of-date entries. The C-APIs contains a function to return a set of file system cache out-of-date entries, functions to facilitate looping through all the entries, and reporting the relevant information from the entries.

The APIs are based on llvm#144105.

rdar://152247357
(cherry picked from commit 6fbcea6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants