Skip to content

Commit 57b95ab

Browse files
committed
[llvm][cas] New mapped file design to fix filesystem assumptions
On the advice of filesystem experts, we are changing our approach to the memory-mapped files used in the CAS. Specifically, we will no longer attempt to resize the underlying file while holding an mmap, which does not behave correctly on all filesystems (e.g. tmpfs) and is formally unspecified by POSIX. Ideally, we could simply update the mapped regions whenever we resize, but in practice this is challenging to do portably, cross-process, without locking the reading process' access, and without requiring us to check every offset before reading it and trigger a re-map if necessary (which is hard to test/maintain). Instead, we will truncate the file to its maximum size immediately, and resize it back to the allocated size when closing, if possible. On filesystems with sparse file support, this does not require any additional storage. See the comments in MappedFileRegionBumpPtr for details. Because the test suite is uniquely likley to (a) crash from time to time, and (b) have many separate CAS instances for tests, we override the usual index/data/actioncache file sizes to be at most 100 MB. This should help mitigate the risks of having many large CAS instances left around if testing on a filesystem without sparse file support. rdar://105905301 (cherry picked from commit 0046c5d)
1 parent 2556f08 commit 57b95ab

19 files changed

+831
-843
lines changed

clang/test/lit.cfg.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -297,3 +297,9 @@ def exclude_unsupported_files_for_aix(dirname):
297297
# possibly be present in system and user configuration files, so disable
298298
# default configs for the test runs.
299299
config.environment["CLANG_NO_DEFAULT_CONFIG"] = "1"
300+
301+
# Restrict the size of the on-disk CAS for tests. This allows testing in
302+
# constrained environments (e.g. small TMPDIR). It also prevents leaving
303+
# behind large files on file systems that do not support sparse files if a test
304+
# crashes before resizing the file.
305+
config.environment["LLVM_CAS_MAX_MAPPING_SIZE"] = "%d" % (100 * 1024 * 1024)

llvm/include/llvm/CAS/LazyMappedFileRegion.h

Lines changed: 0 additions & 156 deletions
This file was deleted.

llvm/include/llvm/CAS/LazyMappedFileRegionBumpPtr.h

Lines changed: 0 additions & 72 deletions
This file was deleted.
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
//===- MappedFileRegionBumpPtr.h --------------------------------*- C++ -*-===//
2+
//
3+
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4+
// See https://llvm.org/LICENSE.txt for license information.
5+
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6+
//
7+
//===----------------------------------------------------------------------===//
8+
9+
#ifndef LLVM_CAS_MAPPEDFILEREGIONBUMPPTR_H
10+
#define LLVM_CAS_MAPPEDFILEREGIONBUMPPTR_H
11+
12+
#include "llvm/Config/llvm-config.h"
13+
#include "llvm/Support/Alignment.h"
14+
#include "llvm/Support/FileSystem.h"
15+
#include <atomic>
16+
17+
namespace llvm::cas {
18+
19+
/// Allocator for an owned mapped file region that supports thread-safe and
20+
/// process-safe bump pointer allocation.
21+
///
22+
/// This allocator is designed to create a sparse file when supported by the
23+
/// filesystem's \c ftruncate so that it can be used with a large maximum size.
24+
/// It will also attempt to shrink the underlying file down to its current
25+
/// allocation size when the last concurrent mapping is closed.
26+
///
27+
/// Process-safe. Uses file locks when resizing the file during initialization
28+
/// and destruction.
29+
///
30+
/// Thread-safe, assuming all threads use the same instance to talk to a given
31+
/// file/mapping. Unsafe to have multiple instances talking to the same file
32+
/// in the same process since file locks will misbehave. Clients should
33+
/// coordinate (somehow).
34+
///
35+
/// \note Currently we allocate the whole file without sparseness on Windows.
36+
///
37+
/// Provides 8-byte alignment for all allocations.
38+
class MappedFileRegionBumpPtr {
39+
public:
40+
using RegionT = sys::fs::mapped_file_region;
41+
42+
/// Create a \c MappedFileRegionBumpPtr.
43+
///
44+
/// \param Path the path to open the mapped region.
45+
/// \param Capacity the maximum size for the mapped file region.
46+
/// \param BumpPtrOffset the offset at which to store the bump pointer.
47+
/// \param NewFileConstructor is for constructing new files. It has exclusive
48+
/// access to the file. Must call \c initializeBumpPtr.
49+
static Expected<MappedFileRegionBumpPtr>
50+
create(const Twine &Path, uint64_t Capacity, int64_t BumpPtrOffset,
51+
function_ref<Error(MappedFileRegionBumpPtr &)> NewFileConstructor);
52+
53+
/// Create a \c MappedFileRegionBumpPtr., shared across the process via a
54+
/// singleton map.
55+
///
56+
/// FIXME: Singleton map should be based on sys::fs::UniqueID, but currently
57+
/// it is just based on \p Path.
58+
///
59+
/// \param Path the path to open the mapped region.
60+
/// \param Capacity the maximum size for the mapped file region.
61+
/// \param BumpPtrOffset the offset at which to store the bump pointer.
62+
/// \param NewFileConstructor is for constructing new files. It has exclusive
63+
/// access to the file. Must call \c initializeBumpPtr.
64+
static Expected<std::shared_ptr<MappedFileRegionBumpPtr>> createShared(
65+
const Twine &Path, uint64_t Capacity, int64_t BumpPtrOffset,
66+
function_ref<Error(MappedFileRegionBumpPtr &)> NewFileConstructor);
67+
68+
/// Finish initializing the bump pointer. Must be called by
69+
/// \c NewFileConstructor.
70+
void initializeBumpPtr(int64_t BumpPtrOffset);
71+
72+
/// Minimum alignment for allocations, currently hardcoded to 8B.
73+
static constexpr Align getAlign() {
74+
// Trick Align into giving us '8' as a constexpr.
75+
struct alignas(8) T {};
76+
static_assert(alignof(T) == 8, "Tautology failed?");
77+
return Align::Of<T>();
78+
}
79+
80+
/// Allocate at least \p AllocSize. Rounds up to \a getAlign().
81+
char *allocate(uint64_t AllocSize) {
82+
return data() + allocateOffset(AllocSize);
83+
}
84+
/// Allocate, returning the offset from \a data() instead of a pointer.
85+
int64_t allocateOffset(uint64_t AllocSize);
86+
87+
char *data() const { return Region.data(); }
88+
uint64_t size() const { return *BumpPtr; }
89+
uint64_t capacity() const { return Region.size(); }
90+
91+
RegionT &getRegion() { return Region; }
92+
93+
~MappedFileRegionBumpPtr() { destroyImpl(); }
94+
95+
MappedFileRegionBumpPtr() = default;
96+
MappedFileRegionBumpPtr(MappedFileRegionBumpPtr &&RHS) { moveImpl(RHS); }
97+
MappedFileRegionBumpPtr &operator=(MappedFileRegionBumpPtr &&RHS) {
98+
destroyImpl();
99+
moveImpl(RHS);
100+
return *this;
101+
}
102+
103+
MappedFileRegionBumpPtr(const MappedFileRegionBumpPtr &) = delete;
104+
MappedFileRegionBumpPtr &operator=(const MappedFileRegionBumpPtr &) = delete;
105+
106+
private:
107+
void destroyImpl();
108+
void moveImpl(MappedFileRegionBumpPtr &RHS) {
109+
std::swap(Region, RHS.Region);
110+
std::swap(BumpPtr, RHS.BumpPtr);
111+
std::swap(Path, RHS.Path);
112+
std::swap(FD, RHS.FD);
113+
std::swap(SharedLockFD, RHS.SharedLockFD);
114+
}
115+
116+
private:
117+
RegionT Region;
118+
std::atomic<int64_t> *BumpPtr = nullptr;
119+
std::string Path;
120+
std::optional<int> FD;
121+
std::optional<int> SharedLockFD;
122+
};
123+
124+
} // namespace llvm::cas
125+
126+
#endif // LLVM_CAS_MAPPEDFILEREGIONBUMPPTR_H

llvm/include/llvm/CAS/OnDiskHashMappedTrie.h

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,8 +48,9 @@ class FileOffset {
4848
/// memory mapping. Atomic access does not work on networked filesystems.
4949
/// - Filesystem locks are used, but only sparingly:
5050
/// - during initialization, for creating / opening an existing store;
51-
/// - rarely on insertion, to resize the store (see \a
52-
/// OnDiskHashMappedTrie::MappedFileInfo::requestFileSize()).
51+
/// - for the lifetime of the instance, a shared/reader lock is held
52+
/// - during destruction, if there are no concurrent readers, to shrink the
53+
/// files to their minimum size.
5354
/// - Path is used as a directory:
5455
/// - "index" stores the root trie and subtries.
5556
/// - "data" stores (most of) the entries, like a bump-ptr-allocator.

llvm/lib/CAS/CMakeLists.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,10 @@ add_llvm_component_library(LLVMCAS
1616
HashMappedTrie.cpp
1717
HierarchicalTreeBuilder.cpp
1818
InMemoryCAS.cpp
19-
LazyMappedFileRegion.cpp
20-
LazyMappedFileRegionBumpPtr.cpp
19+
MappedFileRegionBumpPtr.cpp
2120
ObjectStore.cpp
2221
OnDiskCAS.cpp
22+
OnDiskCommon.cpp
2323
OnDiskGraphDB.cpp
2424
OnDiskHashMappedTrie.cpp
2525
OnDiskKeyValueDB.cpp

0 commit comments

Comments
 (0)