-
Couldn't load subscription status.
- Fork 636
Description
/kind bug
What steps did you take and what happened:
[A clear and concise description of what the bug is.]
CI is failing with errors like this: https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api-provider-aws/5211/pull-cluster-api-provider-aws-e2e-eks/1856371404925046784
It appears that the EKS control plane with addons test is never succeeding.
Further investigation showed that the control plane was constantly blocked on CoreDNS updating.
What did you expect to happen:
Tests pass
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
I have tried running the test locally and have found that the CoreDNS pods never get scheduled.
A sample kubectl describe output:
Name: coredns-787cb67946-g9qjz
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Service Account: coredns
Node: <none>
Labels: eks.amazonaws.com/component=coredns
k8s-app=kube-dns
pod-template-hash=787cb67946
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/coredns-787cb67946
Containers:
coredns:
Image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/coredns:v1.11.1-eksbuild.8
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-kf9wg (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
kube-api-access-kf9wg:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints: topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector k8s-app=kube-dns
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m38s (x64 over 12m) default-scheduler no nodes available to schedule pods
I'm trying to get access to the EKS nodes to validate the taints defined on them, but so far haven't been able to do so.
Also, changing the version of Kubernetes to 1.29 also results in this behavior.
Environment:
- Cluster-api-provider-aws version:
main - Kubernetes version: (use
kubectl version): v1.30 - OS (e.g. from
/etc/os-release):