-
Notifications
You must be signed in to change notification settings - Fork 2.7k
feat(build-dir): Reorganize build-dir layout #15947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
} | ||
|
||
/// Directory where incremental output for the given unit should go. | ||
pub fn incremental_dir(&self, unit: &Unit) -> PathBuf { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just raise the awareness that this PR changed the incremental directory as well. See the relevant discussion: #15010 (comment).
We'll need to investigate the impact of this, or whether incremental compilation is still working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#t-compiler > Cargo switching to one `-C incremental` directory per crate
Just opened a discussion on Zulip.
This is not a blocker BTW, as we are still experimenting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We got a quick answer from Mark-Simulacrum. So no issue on loading incremental artifacts side.
simulacrum: AFAIK, rustc always loads incremental artifacts out of the directory only for the local crate - cross-crate state is always from rmeta
Weihang Lo: Ah nice. So it shouldn't be an issue, and Cargo doesn't need to add flock there because it already has one, right?
simulacrum: That sounds right to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this more, I think we should probably start off conservatively and only have one incremental/
, like today (which may run into problems with #4282). We can experiment with it later as changing it should have little impact.
The incremental directory takes up a significant chunk of the build-dir size. If we make it unique by -Cextra-filename
then we will end up with multiple of them in the build, ballooning the build-dir size.
Its unclear what the performance impact would be. Having a single directory while changing inputs to -Cextra-filename
could mean faster rebuilds if it can reuse a lot. Or it throws out a lot and thrashes the caches and is benefited by unique incremental/
s.
For CI, its also a benefit to make it easy to clear to keep caching in CI easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think this sounds reasonable.
I was worried about a shared incremental
being a point of lock contention when we introduce fine grain locking.
But thinking about a bit more, cargo only enables incremental for workspace and path crates so generally only a small subset would need to lock on this directory.
Also since build-dir
internals are not public interface, we can change it in the future if we find another approach to be optimal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have update the implementation to return to a single incremental/
directory.
See the updated PR description for the file layout.
CC @Kobzol due to your work on rust-lang/rust#145408. If reducing duplicate search paths speeds up builds, I wonder what the impact will be of having more but pin point focused search paths will be. |
Probably doesn't matter much because you still need to go through all of them to find what a crate need? Edit: doesn't matter for primary crate, but for dependencies at the very root (like syn), it would be helpful. |
Currently, the search path includes each |
I would be a bit worried about perf. in large scenarios (e.g. 1000 crates, which is not that uncommon), as I suspect that rustc does a bunch of linear (hopefully not quadratic) searches through these directories and files in them. I would suggest benchmarking on https://github.com/zed-industries/zed 😆 |
This comment was marked as off-topic.
This comment was marked as off-topic.
Basically files under a search directory are preloaded and sorted and then binary search on them, so shouldn't be too bad? It may incur more opendir/readdir syscall though. Like epage mentioned, it also help for transitive dependency loading less files. But yeah worth some benchmark for larger projects. |
Ah, I forgot that we do binary search already. In that case it will be probably fine, yeah. |
I tried it on Zed and didn't see any perf. difference vs master, neither for clean builds nor for incremental rebuilds. |
If there is a different, it will most likely appear if you have multiple unique versions for each package, e.g. from
|
I didn't see the rebuild time getting higher for Zed when I added multiple versions from different cargo invocations. |
854b259
to
2355753
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
target/x86_64-unknown-linux-gnu/debug/build/proc-macro2-ee66340aaf816e44
So while this reduces the "max content per directory" (since proc-macro2-ee66340aaf816e44
will be a dir, rather than multiple files), we also have more flexibility for handling this.
Should we change from proc-macro2-ee66340aaf816e44
to proc/-macro2/ee66340aaf816e44
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we change from proc-macro2-ee66340aaf816e44 to proc/-macro2/ee66340aaf816e44?
Could we expand a bit more on the benefit of the proposed change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be performance issues on Windows when a directory has a lot of content. We do this prefix-directory stuff for the index and for the build-dir workspace hash. This would be extending it to the build units within the package dir.
As Ross brought up, we don't have guidance on how big is big, what the growth will look like, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we share this layout with the shared cache, then it will likely be more important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some time getting up to speed on Windows path issues. (goodness, its a deep rabbit hole)
As far as I can tell the main issue is Windows has much more restrictive path length than linux. (see #9770)
As for "too many files in a directory", I could not find much information on this.
Only a stack overflow post from 15 yrs ago saying "its well known that windows has poor performance on directories with many files".
That said I also found an article that claims a flat structure is better on ext4 for linux.
Granted the number of files in that article is huge and we would probably never have a build cache that gets that big.
Given the long paths issue on windows, perhaps it would be better to shorten names to help mitigate this? build-script-execution
while descriptive is pretty long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, these are separate conversations, so I moved path lengths over to https://github.com/rust-lang/cargo/pull/15947/files#r2402207606
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
As for the original suggestion
Should we change from
proc-macro2-ee66340aaf816e44
toproc/-macro2/ee66340aaf816e44
?
I lean towards not splitting these up. It keeps things simple and its questionable if there is even a performance issue in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is unspecified, we at least ahve the freedom to change it over time.
I think we should at minimum do <pkg-name>/<hash>
as that would greatly simplify cargo clean -p <pkg-name>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is a good idea 👍
The code for cargo clean -p
is a bit hard to follow and I think with the new layout we have the opportunity to make the code simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made this change in the latest push.
As a follow up to this PR, we may want to remove |
Another reason we might want to remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need a test for cargo clean -p foo
. Haven't looked at how thats implemented but might at least be a reason for name/hash
rather than name-hash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test for this in 0c143b1.
And good thing to because I silently broke that when I changed from name-hash
to name/hash
2355753
to
47e0ec9
Compare
This comment has been minimized.
This comment has been minimized.
We got this pretty frequently these days.
Re-running jobs. |
tests/testsuite/build_dir.rs
Outdated
[ROOT]/foo/build-dir/[HOST_TARGET]/CACHEDIR.TAG | ||
[ROOT]/foo/build-dir/[HOST_TARGET]/debug/.cargo-lock | ||
[ROOT]/foo/build-dir/[HOST_TARGET]/debug/build/foo-[HASH]/deps/foo[..][EXE] | ||
[ROOT]/foo/build-dir/[HOST_TARGET]/debug/build/foo-[HASH]/deps/foo[..].d | ||
[ROOT]/foo/build-dir/[HOST_TARGET]/debug/build/foo-[HASH]/fingerprint/bin-foo | ||
[ROOT]/foo/build-dir/[HOST_TARGET]/debug/build/foo-[HASH]/fingerprint/bin-foo.json | ||
[ROOT]/foo/build-dir/[HOST_TARGET]/debug/build/foo-[HASH]/fingerprint/dep-bin-foo | ||
[ROOT]/foo/build-dir/[HOST_TARGET]/debug/build/foo-[HASH]/fingerprint/invoked.timestamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From @ranger-ross at #15947 (comment)
I spent some time getting up to speed on Windows path issues. (goodness, its a deep rabbit hole)
...
Given the long paths issue on windows, perhaps it would be better to shorten names to help mitigate this?
build-script-execution
while descriptive is pretty long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be things we can do now but I think we should leave this partially for the stabilization process. In particular, if we can get #4282 to work out, we can likely drop <platform>/<profile>
.
We could then also explore other things like how we store fingerprints and if that could be simplified in a way that reduces that controbuting to path lengths.
47e0ec9
to
b137a08
Compare
b137a08
to
030ddce
Compare
r? @weihanglo rustbot has assigned @weihanglo. Use |
I am going to mark this PR as ready to review now. I have addressed some of the preliminary feedback and update the PR description with a list of follow up tasks after this PR. I did some basic sanity testing by verifying some popular projects build correctly with the new layout ( |
let (pkg_set, resolve) = ops::resolve_ws(ws, dry_run)?; | ||
let prof_dir_name = profiles.get_dir_name(); | ||
let host_layout = Layout::new(ws, None, &prof_dir_name)?; | ||
let host_target = CompileTarget::new(target_data.short_name(&CompileKind::Host))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised by how much was changed here.
I would have assumed we could do:
if new_layout {
// rm all
} else {
// use the existing old logic
}
What is it I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I probably should have called this out in the PR description.
When -Znew-build-dir-layout
is enabled, cargo clean -p
clean packages in BOTH the new and old layout.
My thinking was that we could have support for both layouts for a few releases and eventually drop support for cleaning the old layout.
I can probably simplify this a bit if you think that is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I would err on the side of simplifying and us adding as needed.
This PR mixes the following in the same commit
- new subdirectories
- new parent directories
- special cargo clean transitional logic
which makes things harder to track and consider. Doing it incrementally also allows us to better consider "do we need to bother?", especially if we can wait for feedback.
ws: &Workspace<'_>, | ||
target: Option<CompileTarget>, | ||
dest: &str, | ||
is_host_layout: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool
s are always a precarious thing in parameter lists because their meaning is unclear. When the functions are local, its not too bad.
Making an enum would likely be too much boilerplate.
Options
- Maybe we hold off on the adjusting the host layout until later, further reducing the scope of this change. Like I said, I wonder if we can get rid of some of our directory nesting which, depending on what we do, could make this short lived anyways.
- Always have the caller create a variable for the bool and pass that in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this bool is only used for to know if we should omit the platform in path. (build-dir/debug
vs build-dir/x86_64-unknown-linux-gnu /debug
)
For the new layout, build-dir will never omit the platform.
One idea i have is to split Layout
into ArtifactDirLayout
and BuildDirLayout
in a follow up PR.
I believe that would simply things (or at least separate the concerns) including this constructor fn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts on not adding the platform to the directory unconditionally, at least at this phase? I feel like that is a source of some extra complication in this change, making it harder to understand and its a change for a directory that is not user visible, limiting the benefits.
030ddce
to
c849440
Compare
[ROOT]/foo/build-dir/debug/.fingerprint/foo-[HASH]/invoked.timestamp | ||
[ROOT]/foo/build-dir/debug/deps/foo[..][EXE] | ||
[ROOT]/foo/build-dir/debug/deps/foo[..].d | ||
[ROOT]/foo/build-dir/[HOST_TARGET]/CACHEDIR.TAG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is a bug. CACHEDIR.TAG
should not be affected by these changes.
Will look into this later this week.
src/cargo/core/compiler/layout.rs
Outdated
} | ||
/// Fetch the deps path. | ||
pub fn deps(&self) -> &Path { | ||
pub fn deps(&self, _pkg_dir: &str) -> PathBuf { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I was interested in things like incremental
being created, taking in additional parameters that are unused I feel make this change harder to understand and likely would make the follow up harder to understand
self.compilation | ||
.deps_output | ||
.insert(kind, layout.deps().to_path_buf()); | ||
.insert(kind, layout.legacy_deps().to_path_buf()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this legacy_deps
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a follow up change, this is in an else for if build_dir_new_lay
, so it seems like deps()
would also work
assert!(self.metas.contains_key(unit)); | ||
assert!(unit.artifact.is_true()); | ||
let dir = self.pkg_dir(unit); | ||
let dir = self.pkg_dir(unit, "-"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be the only case for "-"
separator
- Being in
deps
, this looks internal - This is for artifact dependencies which is unstable
- For myself, I'd prefer separate
pkg_dir
functions if we do need this. Being a parameter gives it the appearance of more flexibility than should be exercised in both value and runtime setting it
src/cargo/core/compiler/layout.rs
Outdated
pub fn fingerprint(&self, pkg_dir: &str) -> PathBuf { | ||
self.fingerprint.join(pkg_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct for us to be joining pkg_dir
to self.fingerprint
?
pub fn fingerprint(&self, pkg_dir: &str) -> PathBuf { | ||
self.fingerprint.join(pkg_dir) | ||
} | ||
/// Fetch the fingerprint path. (old layout) | ||
pub fn legacy_fingerprint(&self) -> &Path { | ||
&self.fingerprint | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep the legacy splits, I feel like this commit should have the existing functions delegate to the legacy ones. Then the follow up commit will call the legacy functions in the else branch. This would make the relationship more explicit and avoid questions like 8603e30#r2411571494
if build_runner.bcx.gctx.cli_unstable().build_dir_new_layout { | ||
let mut map = BTreeMap::new(); | ||
|
||
// Recursively add all depenendency args to rustc process | ||
add_dep_arg(&mut map, build_runner, unit); | ||
|
||
let paths = map.into_iter().map(|(_, path)| path).sorted_unstable(); | ||
|
||
for path in paths { | ||
cmd.arg("-L").arg(&{ | ||
let mut deps = OsString::from("dependency="); | ||
deps.push(path); | ||
deps | ||
}); | ||
} | ||
} else { | ||
cmd.arg("-L").arg(&{ | ||
let mut deps = OsString::from("dependency="); | ||
deps.push(build_runner.files().deps_dir(unit)); | ||
deps | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something for us to keep in mind is that this change will likely lead to complaints similar to #13941.
What does this PR try to resolve?
This PR re-organizes the
build-dir
file layout structure to a layout organized by "build unit" when-Zbuild-dir-new-layout
is enabled.See #15010 for the motivations and design discussions.
Below is file structure generated for a
foo
crate with a single dependency onsyn
.How to test and review this PR?
This PR still needs to be more thoroughly tested. Thus far I have been testing on simple test crates.
Also see #15874 for potential test harness improvements that could be used by this PR.
Follow up actions
-Cextra-filename
in files where possible.