feat: enforce per-org cache storage quotas#1
Conversation
There was a problem hiding this comment.
💡 Codex Review
nimbus/src/nimbus/docker_cache/app.py
Lines 372 to 378 in e2c2338
When the cache exceeds its global size limit, ensure_storage_limit removes blob files and deletes their metric rows but never decrements the per‑org usage counters. As a result, an org’s recorded usage only ever increases and quota checks will eventually reject new uploads even though their earlier blobs have been evicted. The eviction loop should look up each victim’s org and call update_org_usage(..., -size) before deleting the record.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting
| bytes_written = await state.backend.head(namespaced_key) | ||
| new_total = state.update_org_usage(org_id, bytes_written) | ||
| quota = state.settings.org_storage_quota_bytes | ||
| if quota is not None and new_total > quota: | ||
| state.update_org_usage(org_id, -bytes_written) |
There was a problem hiding this comment.
Delete cache object when rejecting on org quota
The PUT handler increments the org’s byte counter only after the blob has been written, and when the quota is exceeded it merely subtracts the counter and returns 413 but never removes the already-written object. This allows rejected writes to remain stored and retrievable, so disk usage can exceed the configured quota while metrics claim otherwise. Consider deleting the object (e.g. via the backend) before raising the exception so quota enforcement actually prevents the artifact from being cached.
Useful? React with 👍 / 👎.
| quota = state.settings.org_storage_quota_bytes | ||
| new_total = state.update_org_usage(token.organization_id, final_session.size) | ||
| if quota is not None and new_total > quota: | ||
| state.update_org_usage(token.organization_id, -final_session.size) | ||
| final_session.file_path.unlink(missing_ok=True) |
There was a problem hiding this comment.
Avoid counting deduplicated blobs toward org quota
update_org_usage is called before checking whether the uploaded digest already exists. Re‑pushing a blob that is already present (a common case with Docker layers) bumps the org’s stored byte total even though the file is discarded in the target_path.exists() branch. This will cause quotas to trip even when no additional disk space is consumed. The quota update should happen only when a new blob is stored or should be rolled back when deduplication occurs.
Useful? React with 👍 / 👎.
haasonsaas
left a comment
There was a problem hiding this comment.
Fixes look great now—checksum lookup matches the computed digest, cache quota rollbacks cleanly delete artifacts, and docker evictions correctly adjust org usage. Tests all green. Ship it.
Summary