Skip to content

EKS e2e tests permanently failing #5237

@nrb

Description

@nrb

/kind bug

What steps did you take and what happened:
[A clear and concise description of what the bug is.]

CI is failing with errors like this: https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api-provider-aws/5211/pull-cluster-api-provider-aws-e2e-eks/1856371404925046784

It appears that the EKS control plane with addons test is never succeeding.

Further investigation showed that the control plane was constantly blocked on CoreDNS updating.

What did you expect to happen:

Tests pass

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

I have tried running the test locally and have found that the CoreDNS pods never get scheduled.

A sample kubectl describe output:

Name:                 coredns-787cb67946-g9qjz
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Service Account:      coredns
Node:                 <none>
Labels:               eks.amazonaws.com/component=coredns
                      k8s-app=kube-dns
                      pod-template-hash=787cb67946
Annotations:          <none>
Status:               Pending
IP:                   
IPs:                  <none>
Controlled By:        ReplicaSet/coredns-787cb67946
Containers:
  coredns:
    Image:       602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/coredns:v1.11.1-eksbuild.8
    Ports:       53/UDP, 53/TCP, 9153/TCP
    Host Ports:  0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-kf9wg (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  kube-api-access-kf9wg:
    Type:                     Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:   3607
    ConfigMapName:            kube-root-ca.crt
    ConfigMapOptional:        <nil>
    DownwardAPI:              true
QoS Class:                    Burstable
Node-Selectors:               <none>
Tolerations:                  CriticalAddonsOnly op=Exists
                              node-role.kubernetes.io/control-plane:NoSchedule
                              node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints:  topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector k8s-app=kube-dns
Events:
  Type     Reason            Age                   From               Message
  ----     ------            ----                  ----               -------
  Warning  FailedScheduling  2m38s (x64 over 12m)  default-scheduler  no nodes available to schedule pods

I'm trying to get access to the EKS nodes to validate the taints defined on them, but so far haven't been able to do so.

Also, changing the version of Kubernetes to 1.29 also results in this behavior.

Environment:

  • Cluster-api-provider-aws version: main
  • Kubernetes version: (use kubectl version): v1.30
  • OS (e.g. from /etc/os-release):

Metadata

Metadata

Labels

kind/bugCategorizes issue or PR as related to a bug.priority/critical-urgentHighest priority. Must be actively worked on as someone's top priority right now.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions