-
Notifications
You must be signed in to change notification settings - Fork 130
Implementation for /etc merge #1485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new crate etc-merge
for calculating differences between /etc
directories. The initial implementation provides a compute_diff
function and related helpers. My review focuses on correctness issues, such as a typo in a key data structure and an incomplete hashing function that misses file permissions. I've also pointed out an unused variable that indicates incomplete logic, and suggested improvements to test data and assertions for better maintainability.
crates/etc-merge/src/lib.rs
Outdated
assert_eq!(res.added.len(), new_files.len()); | ||
assert!(res.added.iter().all(|file| new_files | ||
.iter() | ||
.find(|x| PathBuf::from(*x) == *file) | ||
.is_some())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion logic is a bit complex. A simpler and more idiomatic way to compare two collections for equality regardless of order is to sort them and then compare the sorted collections. This pattern is repeated for modified and removed files as well.
assert_eq!(res.added.len(), new_files.len()); | |
assert!(res.added.iter().all(|file| new_files | |
.iter() | |
.find(|x| PathBuf::from(*x) == *file) | |
.is_some())); | |
let mut res_added = res.added; | |
res_added.sort(); | |
let mut expected_added: Vec<_> = new_files.iter().map(PathBuf::from).collect(); | |
expected_added.sort(); | |
assert_eq!(res_added, expected_added); |
ede5599
to
37d6b19
Compare
I've tested the output of this implementation with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question, not something specific about this PR (this looks great BTW!)
Would Bootc's ostree backend be refactored to make use of this crate in time, or is the plan for ostree to remain mostly separate? Obviously the implementation would be completely separate PR to this anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good question indeed. I've been really tempted to basically vendor+rewrite some of the ostree-sysroot stuff in this project. However, it has a ton of implications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
crates/etc-merge/src/lib.rs
Outdated
let mut buf = vec![0u8; entry.metadata()?.size() as usize]; | ||
|
||
let mut file = entry.open().context(format!("Opening entry {path:?}"))?; | ||
file.read_exact(&mut buf) | ||
.context(format!("Reading {path:?}. Buf: {buf:?}"))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two key things here. First I think we should incremental hashing instead of reading the whole thing into memory and then hashing the memory buffer.
But second a nice optimization to make here is handling the case where the file has fsverity enabled. Then we can efficiently just query its fsverity hash to do a diff.
It won't be the default for files in a mutable /etc
to have fsverity enabled but it makes sense to do so (dir is mutable, so the write-tmpfile-then-rename pattern continues to work) and we should make things faster for those who do use it.
37d6b19
to
3d6c8c3
Compare
7d596fe
to
d9b384d
Compare
dc635c5
to
0faabae
Compare
eafc75f
to
f1c87a7
Compare
Hmm... tests failed because we can't chown unless we're root. And some issue with |
Yeah, some of this stuff really needs privileged integration tests, not unit tests. In the short term what I think is easiest is to add an option to the core logic to ignore uid/gid + xattrs and that's what we do in unit tests. For integration tests what we have in some other places is exposing the core functionality via Line 1237 in 8ed1134
(actually that's not so good an example because the command hardcodes the test, which bloats production binaries) This one is maybe a bit better: Line 1291 in 8ed1134
There's the tests-integration crate which is also intended to hold stuff like this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh I reviewed but forgot to click submit
3e6d0f7
to
e744cec
Compare
e744cec
to
2e38947
Compare
e2b19f9
to
828d837
Compare
2e38947
to
f3f74c9
Compare
828d837
to
1a7ac13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't do a full review yet
.context(format!("Failed to create dir {dir_name:?}"))?; | ||
|
||
new_etc_fd | ||
.set_permissions(&dir_name, Permissions::from_mode(stat.st_mode)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ok as is, but since we need to use raw unix stuff we might as well just directly use rustix::fs::fchmodat
get_deletions(pristine_dir, curr_dir, current_path.clone(), diff)? | ||
} | ||
|
||
Err(ImageError::NotFound(..)) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and elsewhere, there's get_directory_opt
which is intended for exactly this use case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realised I had used get_directory
as we would still want to show a diff if a file was changed to a directory and vice versa. Given, we will fail while trying to merge, but I think it makes sense to show it as a diff?
// Dir not found in original /etc, dir was added | ||
diff.added.push(current_path.clone()); | ||
|
||
// Also add every file inside that dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it'd be cleaner to have added
actually be a borrowed reference to the inode for the current directory.
Then it's implicit that leaf files are, well, leaves - and directories are recursive.
Also I think you want to rebase on main right? |
Do we want to merge this directly to main? Kind of makes sense as this is entirely disjoint |
Yes please let's keep trying to merge independent things to main; it's OK to just mark them as |
Okay, sounds good |
We had a live chat about this and we're currently thinking that we wouldn't have Note this would fix #1360 - but I have the same concerns as I had there in that I think |
f3f74c9
to
daa93ad
Compare
58fee03
to
e254498
Compare
I have rebased the branch on main |
Some tests failed due to "connectivity issues". Assuming this is temporary, I've restarted the failed ones |
Any reason to keep the intermediate commits vs squashing to one? I have a slight preference for squashing in cases like this, but if you prefer it's OK by me to separate. Things like Also needs a rebase 🏄 to fix CI |
2ca6887
to
efca3f6
Compare
Rebased onto main and squashed commits only keeping what I thought were necessary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nonblocking comments that can be done as followups too.
new_etc_fd: &CapStdDir, | ||
new_etc_dirtree: &Directory<CustomMetadata>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to do this, the calling code needs to make a copy of the new pristine etc
already pre-populated into new_etc_fd
.
But...while it's generally useful to have "compute diff without merging", won't the main use case be to actually just one-shot a combination of diff + merge?
IOW, how about renaming this function merge_with
and then we have
pub fn merge(pristine_etc_fd: &Dir, current_etc_fd: &Dir, new_etc_fd: &Dir)
And inside that function we do:
- Copy pristine to new
- Compute diff
- Apply diff
This is kind of what https://github.com/ostreedev/ostree/blob/edfe02d01b78039db67c4247df6948070eae0cbe/src/libostree/ostree-sysroot-deploy.c#L511 is already doing, though https://github.com/ostreedev/ostree/blob/edfe02d01b78039db67c4247df6948070eae0cbe/src/libostree/ostree-sysroot-deploy.c#L511 does the copy beforehand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the diff function separate for testing while developing, and because ostree had a command that'd just print the diff between two etcs. I think it makes sense to combine both into a single merge function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Though I'm just saying we add a new high level helper, not that we need drop the split APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. That makes sense. Thinking about this a bit more, I think the best place to copy files/dirs in the new_etc would be in the function write_composefs_state
. During a fresh install there will be no merging of etcs; only on subsequent updates will the merge happen, so we'll kinda have two places where we copy the contents of etc depending upon what operation was performed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During a fresh install there will be no merging of etcs; only on subsequent updates will the merge happen, so we'll kinda have two places where we copy the contents of etc depending upon what operation was performed.
Yeah, in ostree the copying always happens outside of the merge function. So I'm fine having it always be separate here too.
} | ||
|
||
fn recurse_dir(dir: &CapStdDir, root: &mut Directory<CustomMetadata>) -> anyhow::Result<()> { | ||
for entry in dir.entries()? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional/for followup: This could probably use https://docs.rs/cap-std-ext/latest/cap_std_ext/dirext/trait.CapStdExtDirExt.html#tymethod.walk (I maintain that crate, a lot of handy helpers there)
Signed-off-by: Johan-Liebert1 <[email protected]> etc-merge: Incremental hash computation + test verity Test for whether the file has fs-verity enabled or not, and if it does we simply check the verity. Incrementally compute hash for files rather than reading the entire file in memory. Signed-off-by: Johan-Liebert1 <[email protected]> etc-merge: Use generic-tree from composefs-rs Signed-off-by: Johan-Liebert1 <[email protected]> Get removed files by traversing Signed-off-by: Johan-Liebert1 <[email protected]>
Merge added, modified, removed files from the current etc into the new etc directory, following the rules 1. If file is removed from current_etc, it will be removed from new_etc 2. If file is modified in current_etc, it will be copied to the new_etc overwriting any existing files 3. If a file is added in current_etc, then the above modification rule applies Modification includes change in content/permissions. Changed in Xattrs and/or ownership is not handled yet. Signed-off-by: Johan-Liebert1 <[email protected]> etc-merge: Handle ownership changes Signed-off-by: Johan-Liebert1 <[email protected]> etc-merge: Handle xattrs Signed-off-by: Johan-Liebert1 <[email protected]> etc-merge: Ignore mtime while comparing stat Signed-off-by: Johan-Liebert1 <[email protected]> Remove chown test Signed-off-by: Johan-Liebert1 <[email protected]> etc-merge: Use `llistxattr` and `lgetxattr` Use the non symlink following counterparts for getting xattrs. Document public functions and structures Signed-off-by: Johan-Liebert1 <[email protected]>
efca3f6
to
796645a
Compare
Needs |
While merging, existing directory in new_etc was being recursively deleted which is not correct as any new files might also be deleted. Instead, we simply create a directory if it doesn't exists, or if it does exists, we update its metadata accordingly. Add some test cases for the above. Signed-off-by: Johan-Liebert1 <[email protected]> cli: Add internal opt for printing etc-diff Signed-off-by: Johan-Liebert1 <[email protected]> etc-merge: Add license to Cargo.toml Signed-off-by: Johan-Liebert1 <[email protected]> etc-merge: More refactoring Signed-off-by: Johan-Liebert1 <[email protected]>
796645a
to
ccc558c
Compare
Ran cargo fmt. Weird my editor is also set up to run cargo fmt but for some reason it sorts imports differently |
PR for
/etc
merge