Skip to content

fix: proxy cache serve local on remote not found#22153

Open
AYDEV-FR wants to merge 15 commits intogoharbor:mainfrom
AYDEV-FR:fix/proxy-cache-serve-local-on-remote-not-found
Open

fix: proxy cache serve local on remote not found#22153
AYDEV-FR wants to merge 15 commits intogoharbor:mainfrom
AYDEV-FR:fix/proxy-cache-serve-local-on-remote-not-found

Conversation

@AYDEV-FR
Copy link

@AYDEV-FR AYDEV-FR commented Jul 4, 2025

Comprehensive Summary of your change

This PR updates the proxy cache logic so that if an image exists in the local cache but has been deleted from the remote repository, Harbor will serve the cached image instead of failing with a "not found" error. This brings the implementation in line with the documented behavior and improves reliability when remote repositories are unavailable or images have been removed upstream.

You can find more details about this change in issue #22106.

Implementation Notes

I hesitated between modifying the remote.ManifestExist function in harbor/src/controller/proxy/controller.go (since it does not return a 404 error directly in harbor/src/pkg/registry/client.go):

func (c *client) ManifestExist(repository, reference string) (bool, *distribution.Descriptor, error) {
	req, err := http.NewRequest(http.MethodHead, buildManifestURL(c.url, repository, reference), nil)
	if err != nil {
		return false, nil, err
	}
	for _, mediaType := range accepts {
		req.Header.Add("Accept", mediaType)
	}
	resp, err := c.do(req)
	if err != nil {
		if errors.IsErr(err, errors.NotFoundCode) {
			return false, nil, nil
		}
		return false, nil, err
	}
	defer resp.Body.Close()
	dig := resp.Header.Get("Docker-Content-Digest")
	contentType := resp.Header.Get("Content-Type")
	contentLen := resp.Header.Get("Content-Length")
	lenth, _ := strconv.Atoi(contentLen)
	return true, &distribution.Descriptor{Digest: digest.Digest(dig), MediaType: contentType, Size: int64(lenth)}, nil
}

However, to avoid unintended side effects by changing this function's behavior, I decided to keep the current "404 not found" handling (which returns no error) and instead update the UseLocalManifest function in harbor/src/controller/proxy/controller.go:

remoteRepo := getRemoteRepo(art)
exist, desc, err := remote.ManifestExist(remoteRepo, getReference(art)) // HEAD
if err != nil {
	if errors.IsRateLimitError(err) && a != nil { // if rate limit, use local if it exists, otherwise return error
		return true, nil, nil
	}
	return false, nil, err
}
if !exist || desc == nil {
	if a != nil { // if not found, use local if it exists, because a exist, otherwise return error
		log.Errorf("Artifact not found in remote registry but exists in local cache, serving from local: %v:%v", art.Repository, art.Tag)
		return true, nil, nil
	}
	return false, nil, errors.NotFoundError(fmt.Errorf("repo %v, tag %v not found", art.Repository, art.Tag))
}

Issue being fixed

Fixes #22106

No documentation modification is needed because this PR enforces behavior that is already described in the documentation:

If the image has not been updated in the target registry, the cached image is served from the proxy cache project.
If the image has been updated in the target registry, the new image is pulled from the target registry, then served and cached in the proxy cache project.
If the target registry is not reachable, the proxy cache project serves the cached image.
If the image is no longer in the target registry, but is still in the proxy cache project, the cached image is served from the proxy cache project.

Please confirm you've completed the following:

  • Well Written Title and Summary of the PR
  • Labeled the PR as needed ("release-note/ignore-for-release", "release-note/new-feature", "release-note/update", "release-note/enhancement", "release-note/community", "release-note/breaking-change", "release-note/docs", "release-note/infra", "release-note/deprecation")
  • Accepted the DCO. Commits without the DCO will delay acceptance.
  • Made sure tests are passing and test coverage is added if needed.
  • Considered the docs impact and opened a new docs issue or PR with docs changes if needed in website repository.

@AYDEV-FR AYDEV-FR requested a review from a team as a code owner July 4, 2025 08:27
@Vad1mo Vad1mo added the release-note/update Update or Fix label Jul 4, 2025
@codecov
Copy link

codecov bot commented Jul 4, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 46.67%. Comparing base (c8c11b4) to head (92552bb).
⚠️ Report is 657 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main   #22153      +/-   ##
==========================================
+ Coverage   45.36%   46.67%   +1.30%     
==========================================
  Files         244      252       +8     
  Lines       13333    14287     +954     
  Branches     2719     2937     +218     
==========================================
+ Hits         6049     6668     +619     
- Misses       6983     7264     +281     
- Partials      301      355      +54     
Flag Coverage Δ
unittests 46.67% <ø> (+1.30%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 179 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@wy65701436 wy65701436 assigned stonezdj and unassigned OrlinVasilev Jul 7, 2025
@reasonerjt
Copy link
Contributor

@AYDEV-FR

Thanks for the PR. But IMO this is a "feat" rather than a "fix".

If the image is no longer in the target registry, but is still in the proxy cache project, the cached image is served from the proxy cache project.

This is a break-change to the original design, which returns 404 when the remote content is removed.

Because we have passed the "feature freeze" and will hit "feature complete" in July, I don't think it can be made into v2.14. Are you willing to continue working on this after the branch for 2.14 is cut? We can continue the discussion in the issue. IMO at least an option should be added, so user can choose whether to server local content when the remote artifact is removed.

Copy link
Contributor

@reasonerjt reasonerjt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my understanding, this is a break change. More discussion needed.

@AYDEV-FR
Copy link
Author

AYDEV-FR commented Jul 7, 2025

Hi @reasonerjt ,

I understand your point of view regarding this being considered a feature rather than a fix. From my perspective, it was more of a fix to align the behavior with what is described in the documentation: goharbor/website@2ee87ae
I don’t understand why the documentation was updated by @stonezdj, but nothing was changed in the code regarding this.

In the meantime, I found a workaround by returning a 429 response for each 404 returned by the remote registry. Since I have this workaround, I am willing to wait for your decision on whether to update the documentation or merge my change.

However, I think that giving users a choice via a parameter might create confusion about the feature. In my opinion, serving the local cache when the remote registry responds with a 404 is the very concept of a proxy cache. And the Harbor documentation seems to agree with me.

@reasonerjt
Copy link
Contributor

@AYDEV-FR Thanks for pointing this out, let me double check why this change in the doc was made, if this is on purpose, I agree this is a fix not a feat.

@AYDEV-FR AYDEV-FR requested a review from reasonerjt July 29, 2025 17:43
@khaibenz
Copy link

In the meantime, I found a workaround by returning a 429 response for each 404 returned by the remote registry. Since I have this workaround, I am willing to wait for your decision on whether to update the documentation or merge my change.

Hi @AYDEV-FR

I am also running into the same problem and I would be interested to know how did you implement the workaround to return a 429 from the registry?

@tamcore
Copy link

tamcore commented Sep 9, 2025

Is there any update on this? Especially with bitnami's upcoming removal of images from docker.io, having the pull-through cache continue serving removed, but cached, images would be incredibly usefull. After all, that's probably one of the reasons people configure Harbor as a pull-through cache. To not just reduce traffic, but also keep workloads operational in case upstream images go away for whatever reason.

@bupd
Copy link
Contributor

bupd commented Sep 19, 2025

@AYDEV-FR great work

https://github.com/goharbor/community/pull/144/files#diff-78dea958499f3e23826611cf839d9a96615a0b420f33520e92d564f2fb17d24fR127 as per the Harbor proxy spec. proxy cache is said to delete the local manifest if it does not exist in remote. But in the document it is said to be serving the artifact even when the remote artifact is deleted. Also adding to that we did a change only in 2.14 to delete the local artifact which makes it pretty clear that harbor is previously serving the artifact. Which makes this a fix.
With that said this is a much needed change.

This also delves into the question of data integrity and what to do with stale data..

I believe we should do this as a fix since I feel like harbor was serving in the past. And this is a behaviour different from the spec mentioned above. So we should update the spec.

there is also this parity between the pull through cache vs proxy cache.

Harbor doesn't support pull through cache. maybe we can do that.

@marevers
Copy link

marevers commented Sep 19, 2025

@bupd Harbor 2.14 made the opposite change: #22175
As you say, artifacts are now deleted from the cache if they don't exist anymore. I agree with @tamcore though, an important reason for us and I'm sure many to use Harbor as a cache is to safeguard deployments if somehow the original artifact is deleted or the repository is not reachable.

Maybe the best option is to make it configurable, leaving the current implementation in place as default? It could be a checkbox setting on the proxy cache level, something like Serve unavailable artifacts from cache.

@bupd
Copy link
Contributor

bupd commented Sep 19, 2025

Also from the distribution spec it is clear that Harbor should serve artifacts that it locally has even if the remote artifact is deleted.

Also given the use case and demand and Harbor already might been doing this in the past makes it clear to have it in 2.14.1

@bupd
Copy link
Contributor

bupd commented Sep 19, 2025

goharbor/community#144

Given that the Proposal PR is merged with lazy consensus with lack of reviews. given that spec is clearly titled as Pull through proxy cache rather than online-only proxy cache makes it clear what the intention was.

A true pull-through cache should be able to serve content even when the remote resource is deleted or completely offline, which is a critical distinction from an online-only proxy.

Copy link
Contributor

@bupd bupd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AYDEV-FR Thanks for your contribution

please do rebase, otherwise LGTM a much needed change in harbor

Signed-off-by: AYDEV-FR <aymeric.deliencourt@aydev.fr>
Signed-off-by: AYDEV-FR <aymeric.deliencourt@aydev.fr>
@AYDEV-FR AYDEV-FR force-pushed the fix/proxy-cache-serve-local-on-remote-not-found branch from f2e80f9 to 69adb22 Compare September 19, 2025 21:47
@AYDEV-FR
Copy link
Author

AYDEV-FR commented Sep 22, 2025

Hi @bupd and @reasonerjt,

Thanks for your review !
I agree, this change seems to be expected behavior for a solution like Harbor. Often used as a cache to avoid production outages or to overcome issues with image name changes, licences or maintenance (like the recent Bitnami changes).

I’ve rebased my code and run multiple tests with the changes in my PR.
The modification works well and handles several scenarios, including:

  • 404 Not Found Manifest
  • 401 Unauthorized Manifest
  • 503 Service Unavailable
  • Remote not available

When the repo is not available, the cache is correctly served from /server/middleware/repoproxy/proxy.go:266.

If the repo returns 503, while Harbor is still detecting it as unhealthy:

[ERROR] [/server/middleware/repoproxy/proxy.go:142]: failed to proxy manifest, fallback to local, request uri: /v2/local/debian/debian/manifests/latest, error: http status code: 503

If the repo returns 404 for the manifest (e.g., the image has been deleted), it's my code behaves as expected:

[ERROR] [/controller/proxy/controller.go:176]: Artifact not found in remote registry but exists in local cache, serving from local: local/debian/debian:latest

The change is working well. Would it be possible to merge this and release a v2.14.1 as @bupd suggested?
Thanks!

@patsevanton
Copy link

any update?

@Vad1mo Vad1mo requested a review from Copilot October 2, 2025 13:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the proxy cache logic to serve locally cached images when they exist locally but are no longer found in the remote repository, aligning implementation with documented behavior.

  • Updates UseLocalManifest function to check for local artifact existence before returning "not found" error
  • Removes automatic deletion logic for locally cached manifests when remote returns 404
  • Updates corresponding test case to expect success instead of error when local cache exists but remote doesn't

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/controller/proxy/controller.go Modified logic to serve from local cache when remote artifact is not found
src/controller/proxy/controller_test.go Updated test case to reflect new behavior of serving from local cache

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Vadim Bauer <Bauer.vadim@gmail.com>
@patsevanton
Copy link

we are waiting for a useful pull request

Copy link
Contributor

@stonezdj stonezdj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the current implementation is to delete the local cache if the artifact is not found in upstream. suggest to add an option when creating proxy cache project.
"Keep and serve images in proxy cache when they are removed from upstream registry"
when this option is checked, serve the local content and not delete it in local.
when it is not checked, keep the previous behaviour, delete the local content.

@nasseredine
Copy link

nasseredine commented Dec 5, 2025

Because the current implementation is to delete the local cache if the artifact is not found in upstream. suggest to add an option when creating proxy cache project. "Keep and serve images in proxy cache when they are removed from upstream registry" when this option is checked, serve the local content and not delete it in local. when it is not checked, keep the previous behaviour, delete the local content.

I agree with your proposal to add this feature as a configuration option for proxy cache projects. However, it shouldn't only be available to newly created proxy cache projects. I would like to open a PR for this.

@bupd
Copy link
Contributor

bupd commented Dec 8, 2025

Because the current implementation is to delete the local cache if the artifact is not found in upstream. suggest to add an option when creating proxy cache project. "Keep and serve images in proxy cache when they are removed from upstream registry" when this option is checked, serve the local content and not delete it in local. when it is not checked, keep the previous behaviour, delete the local content.

@stonezdj the problem missing here is people already have proxied artifacts that are in their proxy cache projects. which might possibly already deleted in upstream and is present only in the proxy cache projects in previous version of harbor. if they upgrade to latest version of harbor (which deletes those artifacts). that would not be useful.

It is better to have this on by default and have an opt out option. (i.e the keep proxied images even when upstream is deleted - checkbox default state is checked)

so basically while migrating to new version of harbor they wouldn't get their images deleted.

Hope this helps.

@kaitimmer
Copy link

It would be great if this could be merged at last. The current implementation is totally counterintuitive. From my point of view, the behaviour that is implemented in this PR is the expected workflow (I at least was very suprised ot find out it is not already during that latest Cloudflare and Docker Hub downtimes :) )

What is still needed here to move this across the "merge-line"?

@nasseredine
Copy link

nasseredine commented Dec 9, 2025

Because the current implementation is to delete the local cache if the artifact is not found in upstream. suggest to add an option when creating proxy cache project. "Keep and serve images in proxy cache when they are removed from upstream registry" when this option is checked, serve the local content and not delete it in local. when it is not checked, keep the previous behaviour, delete the local content.

@stonezdj the problem missing here is people already have proxied artifacts that are in their proxy cache projects. which might possibly already deleted in upstream and is present only in the proxy cache projects in previous version of harbor. if they upgrade to latest version of harbor (which deletes those artifacts). that would not be useful.

It is better to have this on by default and have an opt out option. (i.e the keep proxied images even when upstream is deleted - checkbox default state is checked)

so basically while migrating to new version of harbor they wouldn't get their images deleted.

Hope this helps.

I have started working on a PR for this feature which might have been a bit premature. While working on the implementation (as a project a metadata at project creation and after saving changes on existing proxy cache projects configuration), I realized that it is important we discuss when to serve the local manifest if there is an error with getting the upstream one (e.g. network, HTTP 5XX, authentication 401 / authorization 403 errors). We might not want to silence every type of errors? How much do we favor availability over consistency in an "offline" proxy cache?

My proposal is to prioritize availability if the "offline" mode for proxy cache is enabled. This means we would serve the local manifest on any (to be discussed) upstream error (we log a warning instead of failing).

Otherwise we need to change the default behavior (serve local manifest) and have a "sync" mode for proxy cache.

I'd appreciate feedback on whether this is the desired behavior or if we should maintain specific exceptions.

@kaitimmer
Copy link

My proposal is to prioritize availability if the "offline" mode for proxy cache is enabled. This means we would serve the local manifest on any (to be discussed) upstream error (we log a warning instead of failing).

I would agree. This, at least, is my exact use case. Ignore Upstream errors and deliver what is already in the cache. Only error if a resource is not found locally, and also not upstream.

👍

@blueacid
Copy link

Salut @nasseredine - really glad to hear you're looking at a PR to address the requested changes. How are you getting along with this? Would you appreciate any help?
Thank you!

@stonezdj
Copy link
Contributor

@AYDEV-FR I am fine with your current logic for keeping the legacy image if it is removed in upstream registry. but I still wait for the UI option to enable/disable "Keep and serve images in proxy cache when they are removed from upstream registry" to merge the current PR.

@nasseredine
Copy link

nasseredine commented Jan 31, 2026

@AYDEV-FR I am fine with your current logic for keeping the legacy image if it is removed in upstream registry. but I still wait for the UI option to enable/disable "Keep and serve images in proxy cache when they are removed from upstream registry" to merge the current PR.

I checked how other industry standard registry solutions that I worked with in the past (Sonatype Nexus, JFrog Artifactory, Google Artifact Registry) handle this, and none of them default to deleting local artifacts when the upstream returns a 404.

There isn't such a thing as a sync delete model that would result in immediately deleting the local artifact because the upstream one is gone. In fact all solutions are resilient on upstream outage by default (even if TTL is expired). They keep serving local content and update artifact metadata on TTL expiration (usually hide the tag or set as not found for consistency) but don't delete the blobs. The deletion itself is usually handled by a cleanup (retention) policy. This is called lazy eviction as the data is still present on disk while artifacts are not guaranteed to be pullable. In fact there is often an option (auto-blocking enabled in Nexus and offline or global offline in Artifactory) that can be activated to serve local content in that scenario. In this case, no attempts are made to fetch from remote repository.

The main issue is that today Harbor has an assumed TTL of 0 due to the implementation that systematically checks the upstream manifest and doesn't offer an option to adjust the value or prevent immediate data loss. Until this feature is implemented the default behavior should not be an assumed TTL of 0.

However, implementing all of this would introduce a lot of complexity in Harbor while its essence lies in its simplicity.

Salut @nasseredine - really glad to hear you're looking at a PR to address the requested changes. How are you getting along with this? Would you appreciate any help? Thank you!

I have something that has been tested locally early December 2025 (I still need to write unit tests) but the issue is that even among project maintainers (@bupd, @reasonerjt and @stonezdj) there doesn't seem to be a consensus about what should the default behavior be. So I haven't opened a PR yet and I am waiting for clarification from the maintainers. I also hope the community to be more vocal about their opinion on this issue.

Once everything is clarified, @AYDEV-FR if you need help with adding the metadata handling as an option on the frontend do let me know.

Screenshot 2026-01-31 at 12 06 09 Screenshot 2026-01-31 at 12 07 23 Screenshot 2026-01-31 at 12 27 47

@MikeCockrem
Copy link

MikeCockrem commented Feb 1, 2026

I also hope the community to be more vocal about their opinion on this issue.

I didn't want to "+1 / me too" but I'd like to mention this is something the company I work for is eagerly awaiting, if it happens; we trialed Harbor to protect ourselves from upstream deletions, but the above caveat was a deal-breaker for us, so it will be very warmly received if merged.

@stonezdj
Copy link
Contributor

stonezdj commented Feb 2, 2026

I agree the UI arrangemnt, but the "Offline Proxy Cache" seems ambiguity, because it looks that Proxy cache is offline, it would not work anymore? there is another issue related to proxy cache: #22569, somebody may consider should I offline proxy cache to change the upstream registry's credential? but it actually doesn't. How about "Retain cache on upstream delete"?

@blueacid
Copy link

blueacid commented Feb 3, 2026

Echoing @MikeCockrem's point, my employer is similar - we want to use the proxy cache for various images, and we don't want the deletion of a given tag on the upstream repository to cause an outage.
To mitigate this, we're pulling, retagging, and pushing. But it'd be nice to have a toggle on a Harbor proxy cache which would save the need to do this.

@nasseredine first of all, thank you so much - that looks like precisely the right approach. Seems that the only sticking point is how to display that functionality in the UI?
To that, I can throw a suggestion into the ring: would it be sensible to label the setting as "Replicate upstream deletions"?
"If this is enabled, the proxy cache copy of a given tag will be deleted when the upstream repository returns a 404. If disabled, the local copy will not be automatically deleted under those circumstances."

How do people feel about that, or a variation of it?

@nasseredine
Copy link

I agree the UI arrangemnt, but the "Offline Proxy Cache" seems ambiguity, because it looks that Proxy cache is offline, it would not work anymore? there is another issue related to proxy cache: #22569, somebody may consider should I offline proxy cache to change the upstream registry's credential? but it actually doesn't. How about "Retain cache on upstream delete"?

On a second thought, I agree with you. I'd assume that a proxy cache that is offline would only serve local content.

So we do we agree that the default behaviour is having a "sync delete" model for proxy cache?

@AYDEV-FR
Copy link
Author

AYDEV-FR commented Feb 8, 2026

I have integrated the changes requested by the maintainers (UI checkbox configuration option).

To clarify the current state of this PR:

  • The default behavior remains unchanged ("sync delete" model — local cache is removed when upstream returns 404)
  • A new configuration option "Retain cache on upstream delete or unreachable" allows users to opt-in to serving locally cached images when they are no longer available or reachable upstream

Thanks everyone for the feedback and the discussion on the wording — it helped a lot to shape the right approach. I can also see from the comments that this is a feature highly anticipated by the community, which motivates me to push it forward.

If this PR is accepted and merged, I'll follow up with corresponding PRs in the Terraform provider and Crossplane provider to expose this new configuration option.

Signed-off-by: AYDEV-FR <aymeric.deliencourt@aydev.fr>
Signed-off-by: AYDEV-FR <aymeric.deliencourt@aydev.fr>
Signed-off-by: AYDEV-FR <aymeric.deliencourt@aydev.fr>
@AYDEV-FR AYDEV-FR force-pushed the fix/proxy-cache-serve-local-on-remote-not-found branch from 278c23d to d32a059 Compare February 8, 2026 18:06
@AYDEV-FR AYDEV-FR requested review from bupd and stonezdj February 9, 2026 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proxy cache should serve cached images even if remote image is deleted

Comments