-
Notifications
You must be signed in to change notification settings - Fork 18
Description
Problem Summary
Creating the vpc-cni addon after creating a new EKS cluster fails due to PolicyEndpoint CRD conflict.
Problem Details
If you create a new EKS cluster after November 13th 2025 without the vpc-cni addon, attempting to add it after the fact, will result in the following error:
code: ConfigurationConflict
message: Conflicts found when trying to apply. Will not continue due to resolve conflicts mode. Conflicts: CustomResourceDefinition.apiextensions.k8s.io policyendpoints.networking.k8s.aws - .spec.versions
Steps to reproduce
- Create EKS Cluster without the vpc-cni addon. The EKS version does not seem to matter. I have only tried with 1.34 and 1.33.
This can be done in a number of ways with the same result:- Using eksctl or the aws cli.
- Using the AWS console using EKS Auto mode (This will not install the
vpc-cniaddon) - Using the AWS console WITHOUT using EKS Auto mode. Ensure to de-select the
vpc-cniaddon on the addons page. - Using the Terraform eks resource, without adding any
cluster_addonitems.
- Create the
vpc-cniaddon. As above, this can be done in a number of ways with the same result. I will only demonstrate the aws cli method that follows the official docs here:- Create the
vpc-cniaddon.
aws eks create-addon --cluster-name my-cluster --addon-name vpc-cni --addon-version v1.20.3-eksbuild.1 - Either observe the addon status in the AWS Console or check using the aws cli...
aws eks describe-addon --cluster-name my-cluster --addon-name vpc-cni - Observe that the addon has the
CREATE_FAILEDstatus with the following:
{ "addon": { "addonName": "vpc-cni", "clusterName": "my-cluster", "status": "CREATE_FAILED", "addonVersion": "v1.20.4-eksbuild.1", "health": { "issues": [ { "code": "ConfigurationConflict", "message": "Conflicts found when trying to apply. Will not continue due to resolve conflicts mode. Conflicts:\nCustomResourceDefinition.apiextensions.k8s.io policyendpoints.networking.k8s.aws - .spec.versions" } ] }, ... } } - Create the
Affected Versions
- In my existing workflow I was using the terraform method mentioned above.
- I had deployed some time on November 12th/13th without issue (i.e. Create cluster without the addon, and then create the addon separately).
- Attempting this deployment on November 14th failed as desribed above.
- I discovered that a similar issue (PolicyEndpoint CRD deletion on addon upgrade can lead to indefinite network interruption in rare circumstances #199 ) was raised on Nov 13th, I do wonder if they are somehow related.
- I see that the
domainNamefield was added to the CRD in the as-of-yet unreleasedrelease-v1.1branch early in the morning on November 14th 2025 ( 7a70271 )
Workaround
It is easy to work around this issue if you set the conflict resolution method to Override, but this is not an intuitive thing to do when adding a NEW addon, and is not included in the official docs for doing so. It also skirts around the core problem of the mismatched PolicyEndpoint CRDs in use.
Suspected Cause of Problem
Upon further investigation, it appears that when creating a new EKS cluster, it uses the version of the CRD that is included in the upcoming 1.1 release of the network policy controller. This new version of the CRD contains a new domainName field inside the egress and ingress rule schemas, but the version of the API remains at v1alpha1.
I don't know the mechanisms used by the AWS Control Plane when creating an EKS cluster, but based on the evidence, it seems that it likely uses the Helm Chart in this repository to deploy the policy controller?
If this is the case, then it is worth noting that v1.1 of the Helm Chart release installs the v1.1 CRD, but also installs the existing v1.0.x of the CRD watcher component. The result of this is that if you manually remove the PolicyEndpoint CRD, when it gets reconciled, it is replaced by the v1.0.x of the CRD and NOT the v1.1 of the CRD that gets installed with a new EKS cluster.
It seems to me that changes in the schema for a CRD should be accompanied by a bump in the schema version to avoid such conflicts. I do understand that this is an alpha schema version, which means that breaking changes may occur at any time. This would be fine if it were not for the fact that this, potentially breaking, schema is included in all production EKS clusters that may rely on stable schemas.
Summary
- Release 1.1 of the network policy controller should not be released until the CRD created at deploy time match the CRD created by the watcher.
- EKS clusters should not be created using this unreleased version of the network policy controller.
- Production EKS clusters should not be created with core components that rely on CRDs with version
v1alpha1as this can lead to unpredictable behaviour as we see here.- If they do, then there needs to be guarantees that the
v1alpha1CRD never changes. - Structural changes to any CRD should result in a new version (e.g.
v1alpha2)
- If they do, then there needs to be guarantees that the