fix: proxy cache serve local on remote not found#22153
fix: proxy cache serve local on remote not found#22153AYDEV-FR wants to merge 15 commits intogoharbor:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #22153 +/- ##
==========================================
+ Coverage 45.36% 46.67% +1.30%
==========================================
Files 244 252 +8
Lines 13333 14287 +954
Branches 2719 2937 +218
==========================================
+ Hits 6049 6668 +619
- Misses 6983 7264 +281
- Partials 301 355 +54
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
Thanks for the PR. But IMO this is a "feat" rather than a "fix".
This is a break-change to the original design, which returns 404 when the remote content is removed. Because we have passed the "feature freeze" and will hit "feature complete" in July, I don't think it can be made into v2.14. Are you willing to continue working on this after the branch for 2.14 is cut? We can continue the discussion in the issue. IMO at least an option should be added, so user can choose whether to server local content when the remote artifact is removed. |
reasonerjt
left a comment
There was a problem hiding this comment.
Per my understanding, this is a break change. More discussion needed.
|
Hi @reasonerjt , I understand your point of view regarding this being considered a In the meantime, I found a workaround by returning a 429 response for each 404 returned by the remote registry. Since I have this workaround, I am willing to wait for your decision on whether to update the documentation or merge my change. However, I think that giving users a choice via a parameter might create confusion about the feature. In my opinion, serving the local cache when the remote registry responds with a 404 is the very concept of a proxy cache. And the Harbor documentation seems to agree with me. |
|
@AYDEV-FR Thanks for pointing this out, let me double check why this change in the doc was made, if this is on purpose, I agree this is a |
Hi @AYDEV-FR I am also running into the same problem and I would be interested to know how did you implement the workaround to return a 429 from the registry? |
|
Is there any update on this? Especially with bitnami's upcoming removal of images from docker.io, having the pull-through cache continue serving removed, but cached, images would be incredibly usefull. After all, that's probably one of the reasons people configure Harbor as a pull-through cache. To not just reduce traffic, but also keep workloads operational in case upstream images go away for whatever reason. |
|
@AYDEV-FR great work https://github.com/goharbor/community/pull/144/files#diff-78dea958499f3e23826611cf839d9a96615a0b420f33520e92d564f2fb17d24fR127 as per the Harbor proxy spec. proxy cache is said to delete the local manifest if it does not exist in remote. But in the document it is said to be serving the artifact even when the remote artifact is deleted. Also adding to that we did a change only in 2.14 to delete the local artifact which makes it pretty clear that harbor is previously serving the artifact. Which makes this a fix. This also delves into the question of data integrity and what to do with stale data.. I believe we should do this as a fix since I feel like harbor was serving in the past. And this is a behaviour different from the spec mentioned above. So we should update the spec. there is also this parity between the pull through cache vs proxy cache. Harbor doesn't support pull through cache. maybe we can do that. |
|
@bupd Harbor 2.14 made the opposite change: #22175 Maybe the best option is to make it configurable, leaving the current implementation in place as default? It could be a checkbox setting on the proxy cache level, something like |
|
Also from the distribution spec it is clear that Harbor should serve artifacts that it locally has even if the remote artifact is deleted. Also given the use case and demand and Harbor already might been doing this in the past makes it clear to have it in 2.14.1 |
|
Given that the Proposal PR is merged with lazy consensus with lack of reviews. given that spec is clearly titled as A true pull-through cache should be able to serve content even when the remote resource is deleted or completely offline, which is a critical distinction from an online-only proxy. |
Signed-off-by: AYDEV-FR <aymeric.deliencourt@aydev.fr>
Signed-off-by: AYDEV-FR <aymeric.deliencourt@aydev.fr>
f2e80f9 to
69adb22
Compare
|
Hi @bupd and @reasonerjt, Thanks for your review ! I’ve rebased my code and run multiple tests with the changes in my PR.
When the repo is not available, the cache is correctly served from If the repo returns 503, while Harbor is still detecting it as unhealthy: If the repo returns 404 for the manifest (e.g., the image has been deleted), it's my code behaves as expected: The change is working well. Would it be possible to merge this and release a v2.14.1 as @bupd suggested? |
|
any update? |
There was a problem hiding this comment.
Pull Request Overview
This PR fixes the proxy cache logic to serve locally cached images when they exist locally but are no longer found in the remote repository, aligning implementation with documented behavior.
- Updates
UseLocalManifestfunction to check for local artifact existence before returning "not found" error - Removes automatic deletion logic for locally cached manifests when remote returns 404
- Updates corresponding test case to expect success instead of error when local cache exists but remote doesn't
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/controller/proxy/controller.go | Modified logic to serve from local cache when remote artifact is not found |
| src/controller/proxy/controller_test.go | Updated test case to reflect new behavior of serving from local cache |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Vadim Bauer <Bauer.vadim@gmail.com>
|
we are waiting for a useful pull request |
There was a problem hiding this comment.
Because the current implementation is to delete the local cache if the artifact is not found in upstream. suggest to add an option when creating proxy cache project.
"Keep and serve images in proxy cache when they are removed from upstream registry"
when this option is checked, serve the local content and not delete it in local.
when it is not checked, keep the previous behaviour, delete the local content.
I agree with your proposal to add this feature as a configuration option for proxy cache projects. However, it shouldn't only be available to newly created proxy cache projects. I would like to open a PR for this. |
@stonezdj the problem missing here is people already have proxied artifacts that are in their proxy cache projects. which might possibly already deleted in upstream and is present only in the proxy cache projects in previous version of harbor. if they upgrade to latest version of harbor (which deletes those artifacts). that would not be useful. It is better to have this on by default and have an opt out option. (i.e the keep proxied images even when upstream is deleted - checkbox default state is checked) so basically while migrating to new version of harbor they wouldn't get their images deleted. Hope this helps. |
|
It would be great if this could be merged at last. The current implementation is totally counterintuitive. From my point of view, the behaviour that is implemented in this PR is the expected workflow (I at least was very suprised ot find out it is not already during that latest Cloudflare and Docker Hub downtimes :) ) What is still needed here to move this across the "merge-line"? |
I have started working on a PR for this feature which might have been a bit premature. While working on the implementation (as a project a metadata at project creation and after saving changes on existing proxy cache projects configuration), I realized that it is important we discuss when to serve the local manifest if there is an error with getting the upstream one (e.g. network, HTTP 5XX, authentication 401 / authorization 403 errors). We might not want to silence every type of errors? How much do we favor availability over consistency in an "offline" proxy cache? My proposal is to prioritize availability if the "offline" mode for proxy cache is enabled. This means we would serve the local manifest on any (to be discussed) upstream error (we log a warning instead of failing). Otherwise we need to change the default behavior (serve local manifest) and have a "sync" mode for proxy cache. I'd appreciate feedback on whether this is the desired behavior or if we should maintain specific exceptions. |
I would agree. This, at least, is my exact use case. Ignore Upstream errors and deliver what is already in the cache. Only error if a resource is not found locally, and also not upstream. 👍 |
|
Salut @nasseredine - really glad to hear you're looking at a PR to address the requested changes. How are you getting along with this? Would you appreciate any help? |
|
@AYDEV-FR I am fine with your current logic for keeping the legacy image if it is removed in upstream registry. but I still wait for the UI option to enable/disable "Keep and serve images in proxy cache when they are removed from upstream registry" to merge the current PR. |
I checked how other industry standard registry solutions that I worked with in the past (Sonatype Nexus, JFrog Artifactory, Google Artifact Registry) handle this, and none of them default to deleting local artifacts when the upstream returns a 404. There isn't such a thing as a sync delete model that would result in immediately deleting the local artifact because the upstream one is gone. In fact all solutions are resilient on upstream outage by default (even if TTL is expired). They keep serving local content and update artifact metadata on TTL expiration (usually hide the tag or set as not found for consistency) but don't delete the blobs. The deletion itself is usually handled by a cleanup (retention) policy. This is called lazy eviction as the data is still present on disk while artifacts are not guaranteed to be pullable. In fact there is often an option (auto-blocking enabled in Nexus and offline or global offline in Artifactory) that can be activated to serve local content in that scenario. In this case, no attempts are made to fetch from remote repository. The main issue is that today Harbor has an assumed TTL of 0 due to the implementation that systematically checks the upstream manifest and doesn't offer an option to adjust the value or prevent immediate data loss. Until this feature is implemented the default behavior should not be an assumed TTL of 0. However, implementing all of this would introduce a lot of complexity in Harbor while its essence lies in its simplicity.
I have something that has been tested locally early December 2025 (I still need to write unit tests) but the issue is that even among project maintainers (@bupd, @reasonerjt and @stonezdj) there doesn't seem to be a consensus about what should the default behavior be. So I haven't opened a PR yet and I am waiting for clarification from the maintainers. I also hope the community to be more vocal about their opinion on this issue. Once everything is clarified, @AYDEV-FR if you need help with adding the metadata handling as an option on the frontend do let me know.
|
I didn't want to "+1 / me too" but I'd like to mention this is something the company I work for is eagerly awaiting, if it happens; we trialed Harbor to protect ourselves from upstream deletions, but the above caveat was a deal-breaker for us, so it will be very warmly received if merged. |
|
I agree the UI arrangemnt, but the "Offline Proxy Cache" seems ambiguity, because it looks that Proxy cache is offline, it would not work anymore? there is another issue related to proxy cache: #22569, somebody may consider should I offline proxy cache to change the upstream registry's credential? but it actually doesn't. How about "Retain cache on upstream delete"? |
|
Echoing @MikeCockrem's point, my employer is similar - we want to use the proxy cache for various images, and we don't want the deletion of a given tag on the upstream repository to cause an outage. @nasseredine first of all, thank you so much - that looks like precisely the right approach. Seems that the only sticking point is how to display that functionality in the UI? How do people feel about that, or a variation of it? |
On a second thought, I agree with you. I'd assume that a proxy cache that is offline would only serve local content. So we do we agree that the default behaviour is having a "sync delete" model for proxy cache? |
|
I have integrated the changes requested by the maintainers (UI checkbox configuration option). To clarify the current state of this PR:
Thanks everyone for the feedback and the discussion on the wording — it helped a lot to shape the right approach. I can also see from the comments that this is a feature highly anticipated by the community, which motivates me to push it forward. If this PR is accepted and merged, I'll follow up with corresponding PRs in the Terraform provider and Crossplane provider to expose this new configuration option. |
Signed-off-by: AYDEV-FR <aymeric.deliencourt@aydev.fr>
Signed-off-by: AYDEV-FR <aymeric.deliencourt@aydev.fr>
Signed-off-by: AYDEV-FR <aymeric.deliencourt@aydev.fr>
278c23d to
d32a059
Compare



Comprehensive Summary of your change
This PR updates the proxy cache logic so that if an image exists in the local cache but has been deleted from the remote repository, Harbor will serve the cached image instead of failing with a "not found" error. This brings the implementation in line with the documented behavior and improves reliability when remote repositories are unavailable or images have been removed upstream.
You can find more details about this change in issue #22106.
Implementation Notes
I hesitated between modifying the
remote.ManifestExistfunction inharbor/src/controller/proxy/controller.go(since it does not return a 404 error directly inharbor/src/pkg/registry/client.go):However, to avoid unintended side effects by changing this function's behavior, I decided to keep the current "404 not found" handling (which returns no error) and instead update the
UseLocalManifestfunction inharbor/src/controller/proxy/controller.go:Issue being fixed
Fixes #22106
No documentation modification is needed because this PR enforces behavior that is already described in the documentation:
Please confirm you've completed the following: