A safe, online garbage collector for Docker Distribution registries using S3 storage.
- ✅ Safe deletion - Uses time-window approach to avoid race conditions
- ✅ No downtime - Can run while registry is active
- ✅ Cost optimized - Minimizes S3 API calls
- ✅ State tracking - Remembers when blobs became unreferenced
- ✅ Dry-run mode - Test before deleting
- ✅ Detailed reporting - Shows exactly what was deleted
The tool implements a time-based safety mechanism similar to Zot registry:
- Mark Phase: Scans all manifests to build a set of referenced blobs
- Sweep Phase: Identifies unreferenced blobs
- Safety Check: Only deletes blobs that have been unreferenced for > N hours
- State Tracking: Maintains state file to track unreferenced duration
Timeline:
T0: GC starts, marks blob as unreferenced
T1: Client uploads manifest referencing that blob
T2: GC checks age - blob just became unreferenced (< 1 hour old)
T3: GC skips deletion (safety window = 48 hours)
T4: Next GC run - blob is now referenced again, safe!
The safety window must be longer than your maximum image push time.
cd tools/
pip install -r requirements.txt# See what would be deleted without actually deleting
python s3-gc.py --bucket my-registry-bucket --dry-runThis will:
- Scan all manifests
- Identify unreferenced blobs
- Show what would be deleted
- Create
gc-state.jsonto track unreferenced blobs
# Actually delete blobs unreferenced for > 48 hours
python s3-gc.py --bucket my-registry-bucket --safety-hours 48Add to cron for regular garbage collection:
# Run every hour
0 * * * * cd /path/to/tools && python s3-gc.py --bucket my-registry-bucket --safety-hours 48 >> gc-cron.log 2>&1# Custom S3 prefix
python s3-gc.py --bucket my-registry-bucket \
--prefix docker/registry/v2/ \
--safety-hours 72
# Use specific AWS profile
AWS_PROFILE=production python s3-gc.py --bucket my-registry-bucket
# Verbose logging
python s3-gc.py --bucket my-registry-bucket --verbose
# Custom state file
python s3-gc.py --bucket my-registry-bucket \
--state-file /var/lib/gc-state.jsonThe tool uses boto3, so configure AWS credentials via:
# Environment variables
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_DEFAULT_REGION=us-east-1
# Or AWS CLI profile
export AWS_PROFILE=production
# Or ~/.aws/credentials
[default]
aws_access_key_id = ...
aws_secret_access_key = ...{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::my-registry-bucket",
"arn:aws:s3:::my-registry-bucket/*"
]
}
]
}| Upload Frequency | Recommended Safety Window |
|---|---|
| Continuous | 72-96 hours |
| Daily builds | 48-72 hours |
| Infrequent | 24-48 hours |
Formula: Safety Window > (Max Upload Duration × 2) + GC Scan Interval
This tool is designed to minimize S3 costs:
For a registry with:
- 100 repositories
- 500 manifests total
- 10,000 blobs
Estimated S3 API calls:
- List repositories: ~1 LIST operation
- List manifests: ~100 LIST operations
- Read manifests: ~500 GET operations
- List all blobs: ~10 LIST operations (with pagination)
- Delete blobs: ~N DELETE operations (N = deletable blobs)
Total: ~611 + N operations
At AWS S3 pricing (us-east-1):
- LIST: $0.005 per 1,000 requests
- GET: $0.0004 per 1,000 requests
- DELETE: $0.0000 per 1,000 requests
Cost per run: ~$0.001 (less than a penny!)
-
Run less frequently: Hourly runs are usually overkill. Consider:
- Every 6 hours for active registries
- Daily for low-traffic registries
-
Increase safety window: Longer windows mean fewer deletions per run
-
Monitor state file size: The state file grows with unreferenced blobs. Consider cleanup:
# Clean up state older than 30 days python s3-gc.py --cleanup-state --days 30
The tool maintains gc-state.json to track when blobs became unreferenced:
{
"sha256:abc123...": "2025-12-13T10:30:00+00:00",
"sha256:def456...": "2025-12-14T15:45:00+00:00"
}- Purpose: Track unreferenced duration
- Location: Current directory (or use
--state-file) - Backup: Recommended to backup this file
- Cleanup: Automatically removes referenced blobs from tracking
# View real-time logs
tail -f gc.log
# Search for errors
grep ERROR gc.log
# Count deletions
grep "Deleting blob" gc.log | wc -lEach run produces a summary:
GC Statistics:
Total blobs found: 10,234
Referenced blobs: 8,456
Unreferenced blobs: 1,778
Skipped (too new): 1,650
Deleted blobs: 128
Bytes deleted: 5,234,567,890 (4.87 GB)
Errors encountered: 0
- Deleted blobs per run: Should stabilize over time
- Skipped (too new): Should be > 0 (shows safety is working)
- Errors: Should be 0
- Bytes deleted: Monitor storage savings
If you see a large number of deletions on first run:
# First run with dry-run
python s3-gc.py --bucket my-registry-bucket --dry-run
# Review what would be deleted
less gc.log
# If it looks wrong, increase safety window
python s3-gc.py --bucket my-registry-bucket --safety-hours 96Check bucket name and AWS credentials:
aws s3 ls s3://my-registry-bucket/Delete and recreate:
rm gc-state.json
python s3-gc.py --bucket my-registry-bucket --dry-runFor very large registries (100k+ blobs), consider:
- Run on a machine with more RAM
- Process repositories in batches
- Use
--verboseto see progress
- Dry-run by default: Must explicitly disable
- Confirmation prompt: Asks for confirmation before deleting
- Time-based safety: Won't delete recent blobs
- State persistence: Tracks deletion candidates across runs
- Detailed logging: Audit trail of all deletions
- Error handling: Continues on errors, reports at end
| Feature | Built-in GC | This Tool |
|---|---|---|
| Requires downtime | ✅ Yes | ❌ No |
| Read-only mode needed | ✅ Yes | ❌ No |
| Time-based safety | ❌ No | ✅ Yes |
| Direct S3 access | ❌ No | ✅ Yes |
| Cost optimized | ❌ No | ✅ Yes |
| State tracking | ❌ No | ✅ Yes |
Improvements welcome! Consider adding:
- Progress bars for long operations
- Prometheus metrics export
- Slack/email notifications
- Parallel processing for large registries
- S3 lifecycle policy integration
- Support for other storage backends
Use at your own risk. Test thoroughly in non-production first.