-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Also See: #10
The current garbage collection (GC) logic in cfsctl assumes the presence of a refs/ directory inside images/ and/or streams/.
This directory doesn’t currently exist, which causes the GC operation to treat all files in objects/ as unreferenced and mark them for deletion.
WRT bootc, bootloader entries are effectively the source of truth for determining what should be kept.
Currently, the GC op in bootc, regards the bootloader entries as the source of truth, and will delete an EROFS image if the corresponding boot entry doesn't exist.
Hence, the GC op in composefs-rs can simply list all the EROFS images, collect all the referenced objects, and delete any object(s) that are not in the list.
This should work relatively well, but the issue about GC-ing anything in the streams/ directory remains. We lose information about anything in the streams/ directory as soon as we lose the container that was used to create them.
One solution that came up in discussion is to write all corresponding streams to the EROFS header. This way we can link an image to streams. Deleting the streams does have the disadvantage of having to pull the corresponding layer again, so another approach would be to only GC a stream if the image referencing it hasn't been used in a while. We would require extra state in the repository to keep track of this though.
Pasting a comment by @allisonkarlitskaya
If we store splitstreams in the image, a few things to note:
- it should be done in a way that's not visible to the user. podman's security model is that non-privileged containers don't get to know where they came from
- there's lots of non-user-visible places that you could tuck this information.. there's a whole composefs header in the erofs image that we use for approximately nothing
- please do consider the case that the user might not want the splitstreams deployed
- but if you do this then it could theoretically be possible to drop the erofs reader code for collecting the fs-verity xattrs: we could just collect the information directly from the splitstreams (which is magnitudes easier and faster)