Skip to content

idea: Implement own garbage collector #2

@WhySoBad

Description

@WhySoBad

Currently, abwart uses the garbage collector of the registry binary which is shipped with the registry image to clean up unused blobs.
It turns out the --delete-untagged flag also deletes manifests of mutli-arch images as described in distribution/distribution#3178.
This makes the garbage collector pretty much unusable since after a cleanup most images will end up with a manifest unknown error when being pulled.

The idea is now to come up with some kind of custom "garbage collector" which deletes the unneeded layers from the registry. This implementation should be a temporary solution until the PR from the mentioned issue is merged and the bug in the garbage collector is fixed. Below are some ideas how this could be achieved:

Idea 1: Garbage collection over api

I would implement a "garbage collector" over the rest api. This implementation is resource costly as it would fetch every repository, all tags of all repositories and for every tag all layers.

Note

This implementation wouldn't work since we're only fetching all layers which are currently used. Our goal is the opposite: fetch all layers which are currently not used. I still wanted to bring this idea up since it could may be used in another implementation

Idea 2: Scan filesystem of registry container

All blobs are stored in the registry at /var/lib/registry/docker/registry/v2/blobs/** (or in directory specified in /etc/docker/registry/config.yml). This implementation would index all blobs from this directory and filter out the layers from the manifests. Once all layers are fetched we could filter out the unneeded layers (maybe by fetching all needed layers using idea 1) and then delete them either over the api or directly in the filesystem. I don't know how reliable modifications to the filesystem of a running registry are but this would need to be tested.

Idea 3: Backup falsly marked manifests

In this implementation we would call the garbage collector with the dry run flag in a first step. From this output we identify all manifests which would be deleted falsly. A way to do this is by fetching all manifests for all repositories and compare them to the identified manifests. Overlapping manifests are the backed up and recreated after the garbage collector was run.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingenhancementNew feature or requestideaAn idea for a feature

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions