EKS VPC CNI cannot be disabled because AWS now installs via Helm

/kind bug

**What steps did you take and what happened:**

Creating an EKS cluster with a custom CNI (such as Cilium) is currently problematic because CAPA does not correctly remove the automatically preinstalled VPC CNI. That can be seen by `aws-node` pods running on the cluster.

CAPA has code to delete the AWS VPC CNI resources if `AWSManagedControlPlane.spec.vpcCni.disable=true`, but only deletes if they're not managed by Helm. I presume that is on purpose so that users of CAPA could deploy the VPC CNI with Helm in their own way.

https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/3618d1c1567afe72781059cd9a3498e7ae44b3b5/pkg/cloud/services/awsnode/cni.go#L269-L293

Unfortunately, it seems that AWS introduced a breaking change by switching their own automagic deployment method to Helm, including the relevant labels. This is what a newly-created EKS cluster looks like (VPC CNI _not_ disabled, cluster created by CAPA ~v2.3.0):

```
$ kubectl get ds -n kube-system aws-node -o yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    deprecated.daemonset.template.generation: "1"
  creationTimestamp: "2024-01-18T15:40:42Z"
  generation: 1
  labels:
    app.kubernetes.io/instance: aws-vpc-cni
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: aws-node
    app.kubernetes.io/version: v1.15.1
    helm.sh/chart: aws-vpc-cni-1.15.1
    k8s-app: aws-node
  name: aws-node
  namespace: kube-system
  # [...]
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: aws-node
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: aws-vpc-cni
        app.kubernetes.io/name: aws-node
        k8s-app: aws-node
```

The deletion code must be fixed. Sadly, AWS does not provide extra labels to denote that the deployment is AWS-managed. And this breaking change even applies to older Kubernetes versions like 1.24.

Related: [E2E test wanted to cover this feature (issue)](https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/3544)

**Environment:**

- Cluster-api-provider-aws version:  ~v2.3.0 (fork with some backports)
- Kubernetes version: (use `kubectl version`): 1.24

	func (s *Service) deleteResource(ctx context.Context, remoteClient client.Client, key client.ObjectKey, obj client.Object) error {
	if err := remoteClient.Get(ctx, key, obj); err != nil {
	if !apierrors.IsNotFound(err) {
	return fmt.Errorf("deleting resource %s: %w", key, err)
	}
	s.scope.Debug(fmt.Sprintf("resource %s was not found, no action", key))
	} else {
	// resource found, delete if no label or not managed by helm
	if val, ok := obj.GetLabels()[konfig.ManagedbyLabelKey]; !ok \|\| val != "Helm" {
	if err := remoteClient.Delete(ctx, obj, &client.DeleteOptions{}); err != nil {
	if !apierrors.IsNotFound(err) {
	return fmt.Errorf("deleting %s: %w", key, err)
	}
	s.scope.Debug(fmt.Sprintf(
	"resource %s was not found, not deleted", key))
	} else {
	s.scope.Debug(fmt.Sprintf("resource %s was deleted", key))
	}
	} else {
	s.scope.Debug(fmt.Sprintf("resource %s is managed by helm, not deleted", key))
	}
	}

	return nil
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

EKS VPC CNI cannot be disabled because AWS now installs via Helm #4743

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

EKS VPC CNI cannot be disabled because AWS now installs via Helm #4743

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions