-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Currently, abwart uses the garbage collector of the registry binary which is shipped with the registry image to clean up unused blobs.
It turns out the --delete-untagged flag also deletes manifests of mutli-arch images as described in distribution/distribution#3178.
This makes the garbage collector pretty much unusable since after a cleanup most images will end up with a manifest unknown error when being pulled.
The idea is now to come up with some kind of custom "garbage collector" which deletes the unneeded layers from the registry. This implementation should be a temporary solution until the PR from the mentioned issue is merged and the bug in the garbage collector is fixed. Below are some ideas how this could be achieved:
Idea 1: Garbage collection over api
I would implement a "garbage collector" over the rest api. This implementation is resource costly as it would fetch every repository, all tags of all repositories and for every tag all layers.
Note
This implementation wouldn't work since we're only fetching all layers which are currently used. Our goal is the opposite: fetch all layers which are currently not used. I still wanted to bring this idea up since it could may be used in another implementation
Idea 2: Scan filesystem of registry container
All blobs are stored in the registry at /var/lib/registry/docker/registry/v2/blobs/** (or in directory specified in /etc/docker/registry/config.yml). This implementation would index all blobs from this directory and filter out the layers from the manifests. Once all layers are fetched we could filter out the unneeded layers (maybe by fetching all needed layers using idea 1) and then delete them either over the api or directly in the filesystem. I don't know how reliable modifications to the filesystem of a running registry are but this would need to be tested.
Idea 3: Backup falsly marked manifests
In this implementation we would call the garbage collector with the dry run flag in a first step. From this output we identify all manifests which would be deleted falsly. A way to do this is by fetching all manifests for all repositories and compare them to the identified manifests. Overlapping manifests are the backed up and recreated after the garbage collector was run.