Skip to content

Conversation

Johan-Liebert1
Copy link
Collaborator

@Johan-Liebert1 Johan-Liebert1 commented Aug 5, 2025

PR for /etc merge

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new crate etc-merge for calculating differences between /etc directories. The initial implementation provides a compute_diff function and related helpers. My review focuses on correctness issues, such as a typo in a key data structure and an incomplete hashing function that misses file permissions. I've also pointed out an unused variable that indicates incomplete logic, and suggested improvements to test data and assertions for better maintainability.

Comment on lines 254 to 258
assert_eq!(res.added.len(), new_files.len());
assert!(res.added.iter().all(|file| new_files
.iter()
.find(|x| PathBuf::from(*x) == *file)
.is_some()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This assertion logic is a bit complex. A simpler and more idiomatic way to compare two collections for equality regardless of order is to sort them and then compare the sorted collections. This pattern is repeated for modified and removed files as well.

Suggested change
assert_eq!(res.added.len(), new_files.len());
assert!(res.added.iter().all(|file| new_files
.iter()
.find(|x| PathBuf::from(*x) == *file)
.is_some()));
let mut res_added = res.added;
res_added.sort();
let mut expected_added: Vec<_> = new_files.iter().map(PathBuf::from).collect();
expected_added.sort();
assert_eq!(res_added, expected_added);

@Johan-Liebert1
Copy link
Collaborator Author

I've tested the output of this implementation with ostree admin config-diff and it seems correct. But again, I haven't tested this exhaustively

Copy link
Contributor

@p5 p5 Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question, not something specific about this PR (this looks great BTW!)


Would Bootc's ostree backend be refactored to make use of this crate in time, or is the plan for ostree to remain mostly separate? Obviously the implementation would be completely separate PR to this anyway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question indeed. I've been really tempted to basically vendor+rewrite some of the ostree-sysroot stuff in this project. However, it has a ton of implications.

Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

Comment on lines 152 to 156
let mut buf = vec![0u8; entry.metadata()?.size() as usize];

let mut file = entry.open().context(format!("Opening entry {path:?}"))?;
file.read_exact(&mut buf)
.context(format!("Reading {path:?}. Buf: {buf:?}"))?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two key things here. First I think we should incremental hashing instead of reading the whole thing into memory and then hashing the memory buffer.

But second a nice optimization to make here is handling the case where the file has fsverity enabled. Then we can efficiently just query its fsverity hash to do a diff.

It won't be the default for files in a mutable /etc to have fsverity enabled but it makes sense to do so (dir is mutable, so the write-tmpfile-then-rename pattern continues to work) and we should make things faster for those who do use it.

@Johan-Liebert1 Johan-Liebert1 force-pushed the etc-merge branch 4 times, most recently from dc635c5 to 0faabae Compare August 11, 2025 10:22
@Johan-Liebert1 Johan-Liebert1 changed the title Initial implementaion for /etc merge Initial implementation for /etc merge Aug 11, 2025
@Johan-Liebert1 Johan-Liebert1 force-pushed the etc-merge branch 3 times, most recently from eafc75f to f1c87a7 Compare August 12, 2025 11:00
@Johan-Liebert1
Copy link
Collaborator Author

Hmm... tests failed because we can't chown unless we're root. And some issue with getxattr arguments

@cgwalters
Copy link
Collaborator

Hmm... tests failed because we can't chown unless we're root. And some issue with getxattr arguments

Yeah, some of this stuff really needs privileged integration tests, not unit tests.

In the short term what I think is easiest is to add an option to the core logic to ignore uid/gid + xattrs and that's what we do in unit tests.

For integration tests what we have in some other places is exposing the core functionality via bootc internals and then the tests can drive it, see e.g.

InternalsOpts::TestComposefs => {

(actually that's not so good an example because the command hardcodes the test, which bloats production binaries)
This one is maybe a bit better:
InternalsOpts::Relabel { as_path, path } => {

There's the tests-integration crate which is also intended to hold stuff like this.

Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh I reviewed but forgot to click submit

@Johan-Liebert1 Johan-Liebert1 changed the title Initial implementation for /etc merge Implementation for /etc merge Aug 18, 2025
@Johan-Liebert1 Johan-Liebert1 marked this pull request as ready for review August 19, 2025 05:30
Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't do a full review yet

.context(format!("Failed to create dir {dir_name:?}"))?;

new_etc_fd
.set_permissions(&dir_name, Permissions::from_mode(stat.st_mode))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ok as is, but since we need to use raw unix stuff we might as well just directly use rustix::fs::fchmodat

get_deletions(pristine_dir, curr_dir, current_path.clone(), diff)?
}

Err(ImageError::NotFound(..)) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere, there's get_directory_opt which is intended for exactly this use case

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realised I had used get_directory as we would still want to show a diff if a file was changed to a directory and vice versa. Given, we will fail while trying to merge, but I think it makes sense to show it as a diff?

// Dir not found in original /etc, dir was added
diff.added.push(current_path.clone());

// Also add every file inside that dir
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it'd be cleaner to have added actually be a borrowed reference to the inode for the current directory.

Then it's implicit that leaf files are, well, leaves - and directories are recursive.

@cgwalters
Copy link
Collaborator

Also I think you want to rebase on main right?

@Johan-Liebert1
Copy link
Collaborator Author

Also I think you want to rebase on main right?

Do we want to merge this directly to main? Kind of makes sense as this is entirely disjoint

@cgwalters
Copy link
Collaborator

Yes please let's keep trying to merge independent things to main; it's OK to just mark them as #[dead_code] for now

@Johan-Liebert1
Copy link
Collaborator Author

Okay, sounds good

@cgwalters
Copy link
Collaborator

We had a live chat about this and we're currently thinking that we wouldn't have /usr/etc in the filesystem tree as it appears. Instead this code would mount the pristine /etc from the underlying composefs (which is what it's doing now).

Note this would fix #1360 - but I have the same concerns as I had there in that I think /usr/etc is kind of an API at this point...but I'm ok to just "break" that API for composefs-native for now. If we need to re-introduce /usr/etc I think we could make it a thing done at image build time (by making an empty /usr/etc mountpoint dir and having a systemd unit to mount it)

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Aug 25, 2025
@Johan-Liebert1 Johan-Liebert1 changed the base branch from composefs-backend to main August 25, 2025 06:52
@Johan-Liebert1 Johan-Liebert1 force-pushed the etc-merge branch 2 times, most recently from 58fee03 to e254498 Compare August 25, 2025 07:31
@Johan-Liebert1
Copy link
Collaborator Author

I have rebased the branch on main

@Johan-Liebert1
Copy link
Collaborator Author

Some tests failed due to "connectivity issues". Assuming this is temporary, I've restarted the failed ones

@cgwalters
Copy link
Collaborator

Any reason to keep the intermediate commits vs squashing to one? I have a slight preference for squashing in cases like this, but if you prefer it's OK by me to separate.

Things like etc-merge: Add license to Cargo.toml in particular just don't seem to stand on their own to me.

Also needs a rebase 🏄 to fix CI

@Johan-Liebert1
Copy link
Collaborator Author

Rebased onto main and squashed commits only keeping what I thought were necessary

Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nonblocking comments that can be done as followups too.

Comment on lines +712 to +713
new_etc_fd: &CapStdDir,
new_etc_dirtree: &Directory<CustomMetadata>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to do this, the calling code needs to make a copy of the new pristine etc already pre-populated into new_etc_fd.

But...while it's generally useful to have "compute diff without merging", won't the main use case be to actually just one-shot a combination of diff + merge?

IOW, how about renaming this function merge_with and then we have

pub fn merge(pristine_etc_fd: &Dir, current_etc_fd: &Dir, new_etc_fd: &Dir)

And inside that function we do:

  • Copy pristine to new
  • Compute diff
  • Apply diff

This is kind of what https://github.com/ostreedev/ostree/blob/edfe02d01b78039db67c4247df6948070eae0cbe/src/libostree/ostree-sysroot-deploy.c#L511 is already doing, though https://github.com/ostreedev/ostree/blob/edfe02d01b78039db67c4247df6948070eae0cbe/src/libostree/ostree-sysroot-deploy.c#L511 does the copy beforehand.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the diff function separate for testing while developing, and because ostree had a command that'd just print the diff between two etcs. I think it makes sense to combine both into a single merge function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Though I'm just saying we add a new high level helper, not that we need drop the split APIs.

Copy link
Collaborator Author

@Johan-Liebert1 Johan-Liebert1 Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That makes sense. Thinking about this a bit more, I think the best place to copy files/dirs in the new_etc would be in the function write_composefs_state. During a fresh install there will be no merging of etcs; only on subsequent updates will the merge happen, so we'll kinda have two places where we copy the contents of etc depending upon what operation was performed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During a fresh install there will be no merging of etcs; only on subsequent updates will the merge happen, so we'll kinda have two places where we copy the contents of etc depending upon what operation was performed.

Yeah, in ostree the copying always happens outside of the merge function. So I'm fine having it always be separate here too.

}

fn recurse_dir(dir: &CapStdDir, root: &mut Directory<CustomMetadata>) -> anyhow::Result<()> {
for entry in dir.entries()? {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional/for followup: This could probably use https://docs.rs/cap-std-ext/latest/cap_std_ext/dirext/trait.CapStdExtDirExt.html#tymethod.walk (I maintain that crate, a lot of handy helpers there)

Signed-off-by: Johan-Liebert1 <[email protected]>

etc-merge: Incremental hash computation + test verity

Test for whether the file has fs-verity enabled or not, and if it does
we simply check the verity.

Incrementally compute hash for files rather than reading the entire file
in memory.

Signed-off-by: Johan-Liebert1 <[email protected]>

etc-merge: Use generic-tree from composefs-rs

Signed-off-by: Johan-Liebert1 <[email protected]>

Get removed files by traversing

Signed-off-by: Johan-Liebert1 <[email protected]>
Merge added, modified, removed files from the current etc into the new
etc directory, following the rules

1. If file is removed from current_etc, it will be removed from new_etc
2. If file is modified in current_etc, it will be copied to the new_etc
   overwriting any existing files
3. If a file is added in current_etc, then the above modification rule
   applies

Modification includes change in content/permissions. Changed in Xattrs
and/or ownership is not handled yet.

Signed-off-by: Johan-Liebert1 <[email protected]>

etc-merge: Handle ownership changes

Signed-off-by: Johan-Liebert1 <[email protected]>

etc-merge: Handle xattrs

Signed-off-by: Johan-Liebert1 <[email protected]>

etc-merge: Ignore mtime while comparing stat

Signed-off-by: Johan-Liebert1 <[email protected]>

Remove chown test

Signed-off-by: Johan-Liebert1 <[email protected]>

etc-merge: Use `llistxattr` and `lgetxattr`

Use the non symlink following counterparts for getting xattrs. Document
public functions and structures

Signed-off-by: Johan-Liebert1 <[email protected]>
@cgwalters
Copy link
Collaborator

Needs cargo fmt

While merging, existing directory in new_etc was being recursively
deleted which is not correct as any new files might also be deleted.

Instead, we simply create a directory if it doesn't exists, or if it
does exists, we update its metadata accordingly.

Add some test cases for the above.

Signed-off-by: Johan-Liebert1 <[email protected]>

cli: Add internal opt for printing etc-diff

Signed-off-by: Johan-Liebert1 <[email protected]>

etc-merge: Add license to Cargo.toml

Signed-off-by: Johan-Liebert1 <[email protected]>

etc-merge: More refactoring

Signed-off-by: Johan-Liebert1 <[email protected]>
@Johan-Liebert1
Copy link
Collaborator Author

Ran cargo fmt. Weird my editor is also set up to run cargo fmt but for some reason it sorts imports differently

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants