Skip to content

Conversation

@jansvoboda11
Copy link
Contributor

@jansvoboda11 jansvoboda11 commented Oct 28, 2025

This PR introduces a new mechanism for enforcing a sandbox around filesystem reads coming from the compiler. A fatal error is raised whenever the llvm::sys::fs, llvm::MemoryBuffer::getFile*() APIs get used directly instead of going through the "blessed" virtual interface of llvm::vfs::FileSystem.

@jansvoboda11 jansvoboda11 changed the title Sandboxing v2 [llvm][clang] Sandbox filesystem reads Nov 4, 2025
@jansvoboda11 jansvoboda11 marked this pull request as ready for review November 4, 2025 21:21
}

std::error_code is_other(const Twine &Path, bool &Result) {
sandbox::violationIfEnabled();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't we catch these via status being a violation? Similarly for the other APIs here (e.g. openFileForRead).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I wanted to explicitly mark each API we expect to cause a sandbox violation. This is still conceptually an API that's required to touch the real FS. If the implementation ever changes from calling sys::fs::status() to calling status() directly, I still want this to remain a violation. Same goes for the other APIs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems hard to be completely safe against any possible future change like that. For example, there are a bunch of inline functions in FileSystem.h that could theoretically be refactored in the future without remembering to trigger a violation (most of them call a non-inline version that you have a sandbox call in, some call fs::status directly).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, my approach isn't bulletproof either. Would you prefer if we only triggered sandbox violations the core low-level functions and left stuff like is_other() alone? I don't have a strong preference here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my weak preference, but I won't block the PR over this if anyone disagrees. It is a (tiny) perf win to only check it once, but I doubt it matters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants