feat(registry-auth): auth for private container and artifact registries hosted by cloud providers#317
Conversation
…ains/credentials helper Signed-off-by: Christopher Haar <christopher.haar@upbound.io>
Signed-off-by: Christopher Haar <christopher.haar@upbound.io>
erhancagirici
left a comment
There was a problem hiding this comment.
@haarchri many thanks for the PR 🚀 I've left some comments for clarification
| amazonKeychain, | ||
| google.Keychain, | ||
| azureKeychain, | ||
| authn.DefaultKeychain, |
There was a problem hiding this comment.
for clarification purposes, to better understand the potential behavior change here:
-
Does the
authn.DefaultKeychainbehavior match today's behavior when there is no pullSecretRef, i.e. falls back to anonymous? I assume yes since e2e tests of this PR passed -
are those keychain resolutions "registry-aware" , e.g. does not try to auth
fooregistry with say google creds? -
one potential caveat I see is there is no opt-out of this auth chain order or influence it, that might have edge cases regarding the full intent and cause a behavior change. for example:
- today, your existing
ReleaseMR has no pullSecret, i.e. configured to use anonymous pulls, but say that provider-helm pod also has a valid AWS creds. After upgrading to this, does it (unintentionally) pick AWS creds here? If so, that might potentially break existing MRs after upgrade. - similar to above scenario, but you are aware of the new behavior. Still you have mixed usage of anonymous and AWS registry usage, so you have configured valid AWS creds (like IRSA etc). Does it properly fall back to correct auth per
Releasehere?
- today, your existing
There was a problem hiding this comment.
Does the authn.DefaultKeychain behavior match today's behavior when there is no pullSecretRef, i.e. falls back to anonymous? I assume yes since e2e tests of this PR passed
yes, in the no pull secret case, authn.DefaultKeychain ultimately resolves to Anonymous unless it finds explicit creds in the default Docker config chain.
So the intended behavior stays the same:
Before: no creds → Helm pulls anonymously
After: multi-keychain tries helpers → they return Anonymous for registries they don’t own → DefaultKeychain → still Anonymous in the “no creds configured” case
That matches your assumption and lines up with e2e passing.
are those keychain resolutions "registry-aware" , e.g. does not try to auth foo registry with say google creds?
yes, they are effectively registry-aware because each helper only returns credentials for registries it recognizes, and otherwise it returns Anonymous (or an error that is treated as Anonymous, depending on the wrapper).
So for docker.io/..., the ECR helper won’t produce creds, and for *.dkr.ecr.*.amazonaws.com, the GCR/ACR helpers won’t produce creds.
one potential caveat I see is there is no opt-out of this auth chain order or influence it, that might have edge cases regarding the full intent and cause a behavior change. for example:
lets describe this here with an example:
If the target registry is ECR and the pod has AWS credentials available (IRSA, env vars, etc.), the ECR credential helper will always attempt to mint an authorization token via: https://github.com/awslabs/amazon-ecr-credential-helper/blob/main/ecr-login/api/client.go#L220
Case 1: IAM role does not have ecr:GetAuthorizationToken
- GetAuthorizationToken fails
- The helper returns an error
- The keychain resolution falls back to Anonymous
- Behavior is unchanged compared to today
This is safe and works as expected.
Case 2: IAM role has ecr:GetAuthorizationToken but does not have repository pull permissions (ecr:BatchGetImage, ecr:GetDownloadUrlForLayer)
- GetAuthorizationToken succeeds (registry-level permission)
- Credentials are returned to Helm
- Helm attempts an authenticated pull
- ECR enforces repository-level permissions
- Pull fails with 403 Forbidden
In this scenario, an anonymous pull that previously succeeded due to an ECR repository resource policy may now fail, because the presence of AWS credentials causes the pull to be attempted using authenticated ECR access, which takes precedence over anonymous access.
This behavior is not unique to providers. Crossplane core uses the same keychain resolution logic when fetching packages (xpkg) https://github.com/crossplane/crossplane/blob/main/internal/xpkg/fetch.go#L133 , so the same rules apply there as well. The only difference is that Crossplane core is additionally allowed (via RBAC) to resolve Kubernetes ServiceAccount–based credentials, whereas providers do not have that permission and therefore rely solely on the cloud-provider keychains.
In my point of view this is acceptable as this is not a bug / regression_
GetAuthorizationTokendoes not validate repository access- Repository permissions are intentionally checked later by ECR
- Having
ecr:GetAuthorizationTokenwithout pull permissions is a misconfigured IAM policy
Proposed release note (to make the behavior explicit) - and this is the same for the other registries:
If AWS credentials are present (e.g., via IRSA), ECR images will be pulled using ECR authentication.
Ensure the principal has both ecr:GetAuthorizationToken and repository pull permissions (ecr:BatchGetImage, ecr:GetDownloadUrlForLayer).
Note: If an ECR repository allows anonymous pulls via a resource policy, but the workload has AWS credentials with only ecr:GetAuthorizationToken (and not pull permissions), the authenticated pull may fail with 403 even though an anonymous pull would have succeeded.
There was a problem hiding this comment.
got it, thanks for the detailed explanation!
Since the new helpers are registry-aware, I think we are safe regarding existing MRs with anonymous pulls in general 👍
Note: If an ECR repository allows anonymous pulls via a resource policy, but the workload has AWS credentials with only ecr:GetAuthorizationToken (and not pull permissions), the authenticated pull may fail with 403 even though an anonymous pull would have succeeded.
I think this is fair 👍 Just for the sake of completeness, this would affect consumers with the specific scenario:
- today, the provider pod had IRSA (or more generally, similar pod-level auth) config, due to some other arbitrary purpose
- using a chart from an anonymous ECR (or respective cloud provider's containter registry)
- now upgrades to this PR, they cannot opt-out of resolving AWS creds and use anonymous pulls.
Arguably this is rare today, and would be still fair to ask to define public ECR pull access to the principal in this case.
feat(auth): update dependency to get AWSWebIdentityCredentials #316 is not released yet, so I expect no existing functional IRSA (or similar specific auth) users. So, no practical breaking behavior change here 👍
With that said, it should definitely go into the docs and release notes.
Signed-off-by: Christopher Haar <christopher.haar@upbound.io>
erhancagirici
left a comment
There was a problem hiding this comment.
thanks @haarchri for tackling this! LGTM.
Description of your changes
This PR enables access to private container and artifact registries hosted by cloud providers when
provider-helmis configured with identities such asWorkloadIdentity,IRSA, orPod Identity.When a identity is available, the provider automatically discovers and uses the associated credentials to authenticate against the registry, without requiring explicit registry secrets in the release API.
This allows users to interact with private registries in a seamless and secure way, while continuing to support explicit credentials when needed.
Fixes #
I have:
make reviewableto ensure this PR is ready for review.How has this code been tested
tested with a helm release hosted in a private ECR repository and a public available helm release
Using RegistryAuth from within upbound saas environment
first we need to configure proidc, that we get the right token injected in the provider-helm:
on the AWS side you need an identity-trust and an assigned policy:
trust:
policy:
simply apply the following release: