Skip to content
5 changes: 5 additions & 0 deletions lldb/source/Core/DataFileCache.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,11 @@ bool DataFileCache::SetCachedData(llvm::StringRef key,
if (file_or_err) {
llvm::CachedFileStream *cfs = file_or_err->get();
cfs->OS->write((const char *)data.data(), data.size());
if (llvm::Error err = cfs->commit()) {
Log *log = GetLog(LLDBLog::Modules);
LLDB_LOG_ERROR(log, std::move(err),
"failed to commit to the cache for key: {0}");
}
return true;
} else {
Log *log = GetLog(LLDBLog::Modules);
Expand Down
1 change: 1 addition & 0 deletions llvm/include/llvm/Support/Caching.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ class CachedFileStream {
CachedFileStream(std::unique_ptr<raw_pwrite_stream> OS,
std::string OSPath = "")
: OS(std::move(OS)), ObjectPathName(OSPath) {}
virtual Error commit() { return Error::success(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the expectation is that commit() should now always be called, should there be some error detection in the base class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved the error detection to the base class.

std::unique_ptr<raw_pwrite_stream> OS;
std::string ObjectPathName;
virtual ~CachedFileStream() = default;
Expand Down
3 changes: 3 additions & 0 deletions llvm/lib/CGData/CodeGenData.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,9 @@ void saveModuleForTwoRounds(const Module &TheModule, unsigned Task,

WriteBitcodeToFile(TheModule, *Stream->OS,
/*ShouldPreserveUseListOrder=*/true);

if (Error Err = Stream->commit())
report_fatal_error(std::move(Err));
}

std::unique_ptr<Module> loadModuleForTwoRounds(BitcodeModule &OrigModule,
Expand Down
9 changes: 9 additions & 0 deletions llvm/lib/Debuginfod/Debuginfod.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,7 @@ class StreamedHTTPResponseHandler : public HTTPResponseHandler {
public:
StreamedHTTPResponseHandler(CreateStreamFn CreateStream, HTTPClient &Client)
: CreateStream(CreateStream), Client(Client) {}
Error commit();
virtual ~StreamedHTTPResponseHandler() = default;

Error handleBodyChunk(StringRef BodyChunk) override;
Expand All @@ -210,6 +211,12 @@ Error StreamedHTTPResponseHandler::handleBodyChunk(StringRef BodyChunk) {
return Error::success();
}

Error StreamedHTTPResponseHandler::commit() {
if (FileStream)
return FileStream->commit();
return Error::success();
}

// An over-accepting simplification of the HTTP RFC 7230 spec.
static bool isHeader(StringRef S) {
StringRef Name;
Expand Down Expand Up @@ -298,6 +305,8 @@ Expected<std::string> getCachedOrDownloadArtifact(
Error Err = Client.perform(Request, Handler);
if (Err)
return std::move(Err);
if (Err = Handler.commit())
return std::move(Err);

unsigned Code = Client.responseCode();
if (Code && Code != 200)
Expand Down
3 changes: 3 additions & 0 deletions llvm/lib/LTO/LTOBackend.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -434,6 +434,9 @@ static void codegen(const Config &Conf, TargetMachine *TM,

if (DwoOut)
DwoOut->keep();

if (Error Err = Stream->commit())
report_fatal_error(std::move(Err));
}

static void splitCodeGen(const Config &C, TargetMachine *TM,
Expand Down
40 changes: 29 additions & 11 deletions llvm/lib/Support/Caching.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ Expected<FileCache> llvm::localCache(const Twine &CacheNameRef,
sys::fs::TempFile TempFile;
std::string ModuleName;
unsigned Task;
bool Committed = false;

CacheStream(std::unique_ptr<raw_pwrite_stream> OS, AddBufferFn AddBuffer,
sys::fs::TempFile TempFile, std::string EntryPath,
Expand All @@ -88,9 +89,10 @@ Expected<FileCache> llvm::localCache(const Twine &CacheNameRef,
AddBuffer(std::move(AddBuffer)), TempFile(std::move(TempFile)),
ModuleName(ModuleName), Task(Task) {}

~CacheStream() {
// TODO: Manually commit rather than using non-trivial destructor,
// allowing to replace report_fatal_errors with a return Error.
Error commit() override {
if (Committed)
return Error::success();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intuitively, this doesn't seem correct. Shouldn't we fail if we attempt to commit more than once, rather than just returning a success?
Additionally, I'm curious about what happens if we use this CacheStream after a commit has been called. I suspect it will fail, but do we provide a proper error message in this case? It would be ideal to have a test case to cover these scenarios.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. My thinking was that commit() would be idempotent and calling it a second time would be harmless. But I understand the potential for confusion so I'll make it fail instead. After commit() has been called, the OS member will have been reset() and I think will cause the any calling code to do a pure virtual function call if it tries to write to it. I'll have a think about how we could better there and make a testcase. Probably setting OS to a raw_fd_ostream with an fd of zero instead of a default-constructed raw_pwrite_ostream would give a better error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a unit test and also tested what happens if the stream is written after a commit. It aborts the process with an assert failure:

/usr/include/c++/11/bits/unique_ptr.h:407: typename std::add_lvalue_reference<_Tp>::type std::unique_ptr<_Tp, _Dp>::operator*() const [with _Tp = llvm::raw_pwrite_stream; _Dp = std::default_deletellvm::raw_pwrite_stream; typename std::add_lvalue_reference<_Tp>::type = llvm::raw_pwrite_stream&]: Assertion 'get() != pointer()' failed.

This is the same thing that will happen anywhere else in LLVM if an output stream is used after its reset() method has been called. This seems reasonable to me - it will be very obvious if a CacheStream is used after commit, so any such bugs will be found and fixed quickly. Providing a proper error message is of limited value since this should never happen unless there is a bug in the code that is calling the caching API - end users of the compiler won't see it. If such an error message is desirable then it should be implemented at the raw_ostream level so that so it appears for other "write after reset" bugs, so it is outside of the scope of this PR.

Committed = true;

// Make sure the stream is closed before committing it.
OS.reset();
Expand All @@ -100,10 +102,12 @@ Expected<FileCache> llvm::localCache(const Twine &CacheNameRef,
MemoryBuffer::getOpenFile(
sys::fs::convertFDToNativeFile(TempFile.FD), ObjectPathName,
/*FileSize=*/-1, /*RequiresNullTerminator=*/false);
if (!MBOrErr)
report_fatal_error(Twine("Failed to open new cache file ") +
TempFile.TmpName + ": " +
MBOrErr.getError().message() + "\n");
if (!MBOrErr) {
std::error_code EC = MBOrErr.getError();
return createStringError(EC, Twine("Failed to open new cache file ") +
TempFile.TmpName + ": " +
EC.message() + "\n");
}

// On POSIX systems, this will atomically replace the destination if
// it already exists. We try to emulate this on Windows, but this may
Expand All @@ -118,7 +122,10 @@ Expected<FileCache> llvm::localCache(const Twine &CacheNameRef,
E = handleErrors(std::move(E), [&](const ECError &E) -> Error {
std::error_code EC = E.convertToErrorCode();
if (EC != errc::permission_denied)
return errorCodeToError(EC);
return createStringError(
EC, Twine("Failed to rename temporary file ") +
TempFile.TmpName + " to " + ObjectPathName + ": " +
EC.message() + "\n");

auto MBCopy = MemoryBuffer::getMemBufferCopy((*MBOrErr)->getBuffer(),
ObjectPathName);
Expand All @@ -131,11 +138,22 @@ Expected<FileCache> llvm::localCache(const Twine &CacheNameRef,
});

if (E)
report_fatal_error(Twine("Failed to rename temporary file ") +
TempFile.TmpName + " to " + ObjectPathName + ": " +
toString(std::move(E)) + "\n");
return E;

AddBuffer(Task, ModuleName, std::move(*MBOrErr));
return Error::success();
}

~CacheStream() {
// In Debug builds, try to track down places where commit() was not
// called before destruction.
assert(Committed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand the intent here, it's unusual to have different behavior in debug and release modes. The release mode operation, where commit() is essentially optional because it's handled by the destructor, seems reasonable to me. For consistency, I would prefer to remove this assert, although you can use it for testing purposes. Are you aiming to use commit() optionally to make error messages more explicit?

Alternatively, if we require commit() to be used instead of relying on the destructor, I would suggest explicitly failing or emitting a warning if it hasn't been committed beforehand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage of explicit commit is that it allows proper error handling rather than aborting the entire process if the cache write fails - usually an undesirable behavior. My aim here was to quietly fall back to the previous behavior in if commit() was not called in release mode but help developers track down such places in debug mode (which worked perfectly - I found just such a case that way). I was thinking of leaving it there to help people who have code that uses localCache in branches that have not yet landed in mainline LLVM. However, I agree that it would be annoying to switch from release to debug to track down an unrelated problem and have to fix a missing commit() at that point. I think on balance explicitly failing here is the better plan - I'll modify my patch to do this but I could probably be talked into allowing the commit() to be optional if anyone has strong feelings about this.

// In Release builds, fall back to the previous behaviour of committing
// during destruction and reporting errors with report_fatal_error.
if (Committed)
return;
if (Error Err = commit())
report_fatal_error(Twine(toString(std::move(Err))));
}
};

Expand Down
4 changes: 3 additions & 1 deletion llvm/tools/gold/gold-plugin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1103,7 +1103,9 @@ static std::vector<std::pair<SmallString<128>, bool>> runLTO() {

auto AddBuffer = [&](size_t Task, const Twine &moduleName,
std::unique_ptr<MemoryBuffer> MB) {
*AddStream(Task, moduleName)->OS << MB->getBuffer();
auto Stream = *AddStream(Task, ModuleName);
Stream->OS << MB->getBuffer();
check(Stream->commit(), "Failed to commit cache");
};

FileCache Cache;
Expand Down
4 changes: 3 additions & 1 deletion llvm/tools/llvm-lto2/llvm-lto2.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -448,7 +448,9 @@ static int run(int argc, char **argv) {

auto AddBuffer = [&](size_t Task, const Twine &ModuleName,
std::unique_ptr<MemoryBuffer> MB) {
*AddStream(Task, ModuleName)->OS << MB->getBuffer();
auto Stream = AddStream(Task, ModuleName);
*Stream->OS << MB->getBuffer();
check(Stream->commit(), "Failed to commit cache");
};

FileCache Cache;
Expand Down
Loading