-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
What steps did you take and what happened?
Hi,
We are having constant errors in capi-controller-manager despite everything seems to work fine, we can create nodegroups and update the existing ones.
We run EKS 1.32, AWSManagedControlPlanes and AWSManagedMachinePools with AL2 custom AMIs
The upgrade was done two stages, first to "latest 1beta1" then latest 1beta2 as it is recommended here
Since 1.10 the errors are showing up. After updating the CAPI resources to 1beta2 version and upgrading to 1.11 the error is still showing up, constantly.
The error looks like this for all the clusters:
E0924 08:03:54.795656 1 controller.go:353] "Reconciler error" err="[failed to reconcile bootstrap config: Object prod/services.sa-east-1.prod.alienvault.cloud is already owned by another MachinePool controller services-prod-pool-prometheus-sa-east-1, failed to reconcile infrastructure: error getting client: connection to the workload cluster is down, failed to reconcile nodeRefs: error creating watch machinepool-watchNodes for *v1.Node: connection to the workload cluster is down]" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="prod/services-prod-pool-sa-east-1a" namespace="prod" name="services-prod-pool-sa-east-1a" reconcileID="c207ca98-25c3-4ec7-9bd6-26f5b89de1f6"
This may be related (different cluster, but it happens for all of them)
I0924 08:17:55.288364 1 cluster_accessor.go:324] "Disconnecting" controller="clustercache" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="devops/playground.REDACTED" namespace="devops" name="playground.us-east-1.REDACTED" reconcileID="12d6872b-72ac-47a4-8c96-4f084aee1ed2"
I0924 08:17:55.288403 1 cluster_accessor.go:331] "Disconnected" controller="clustercache" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="devops/playground.REDACTED" namespace="devops" name="playground.REDACTED" reconcileID="12d6872b-72ac-47a4-8c96-4f084aee1ed2"
I0924 08:17:55.289657 1 cluster_accessor.go:252] "Connecting" controller="clustercache" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="devops/playground.REDACTED" namespace="devops" name="playground.REDACTED" reconcileID="dea206e5-b7f2-4e3c-b78b-9798baa32f83"
E0924 08:17:55.290016 1 controller.go:353] "Reconciler error" err="[failed to reconcile bootstrap config: Object devops/playground.REDACTED is already owned by another MachinePool controller playground-devops-pool-prometheus-us-east-1, failed to reconcile infrastructure: error getting client: connection to the workload cluster is down, failed to reconcile nodeRefs: error creating watch machinepool-watchNodes for *v1.Node: connection to the workload cluster is down]" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="devops/playground-devops-pool-us-east-1a" namespace="devops" name="playground-devops-pool-us-east-1a" reconcileID="b35344ce-f2f7-4c71-98ee-8f668fda953b"
I double-checked the ownerReferences and the infrastructureRef, it has not changed and all seems correct.
What did you expect to happen?
No errors in CAPI controller
Cluster API version
1.11.1
CAPA 2.9.1
Kubernetes version
1.32
Anything else you would like to add?
No response
Label(s) to be applied
/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.