Skip to content

Reconciler error after 1.10 upgrade #12798

@josemrs

Description

@josemrs

What steps did you take and what happened?

Hi,

We are having constant errors in capi-controller-manager despite everything seems to work fine, we can create nodegroups and update the existing ones.

We run EKS 1.32, AWSManagedControlPlanes and AWSManagedMachinePools with AL2 custom AMIs

The upgrade was done two stages, first to "latest 1beta1" then latest 1beta2 as it is recommended here

Since 1.10 the errors are showing up. After updating the CAPI resources to 1beta2 version and upgrading to 1.11 the error is still showing up, constantly.

The error looks like this for all the clusters:

E0924 08:03:54.795656       1 controller.go:353] "Reconciler error" err="[failed to reconcile bootstrap config: Object prod/services.sa-east-1.prod.alienvault.cloud is already owned by another MachinePool controller services-prod-pool-prometheus-sa-east-1, failed to reconcile infrastructure: error getting client: connection to the workload cluster is down, failed to reconcile nodeRefs: error creating watch machinepool-watchNodes for *v1.Node: connection to the workload cluster is down]" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="prod/services-prod-pool-sa-east-1a" namespace="prod" name="services-prod-pool-sa-east-1a" reconcileID="c207ca98-25c3-4ec7-9bd6-26f5b89de1f6"

This may be related (different cluster, but it happens for all of them)

I0924 08:17:55.288364       1 cluster_accessor.go:324] "Disconnecting" controller="clustercache" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="devops/playground.REDACTED" namespace="devops" name="playground.us-east-1.REDACTED" reconcileID="12d6872b-72ac-47a4-8c96-4f084aee1ed2"
I0924 08:17:55.288403       1 cluster_accessor.go:331] "Disconnected" controller="clustercache" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="devops/playground.REDACTED" namespace="devops" name="playground.REDACTED" reconcileID="12d6872b-72ac-47a4-8c96-4f084aee1ed2"
I0924 08:17:55.289657       1 cluster_accessor.go:252] "Connecting" controller="clustercache" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="devops/playground.REDACTED" namespace="devops" name="playground.REDACTED" reconcileID="dea206e5-b7f2-4e3c-b78b-9798baa32f83"
E0924 08:17:55.290016       1 controller.go:353] "Reconciler error" err="[failed to reconcile bootstrap config: Object devops/playground.REDACTED is already owned by another MachinePool controller playground-devops-pool-prometheus-us-east-1, failed to reconcile infrastructure: error getting client: connection to the workload cluster is down, failed to reconcile nodeRefs: error creating watch machinepool-watchNodes for *v1.Node: connection to the workload cluster is down]" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="devops/playground-devops-pool-us-east-1a" namespace="devops" name="playground-devops-pool-us-east-1a" reconcileID="b35344ce-f2f7-4c71-98ee-8f668fda953b"

I double-checked the ownerReferences and the infrastructureRef, it has not changed and all seems correct.

What did you expect to happen?

No errors in CAPI controller

Cluster API version

1.11.1
CAPA 2.9.1

Kubernetes version

1.32

Anything else you would like to add?

No response

Label(s) to be applied

/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/machinepoolIssues or PRs related to machinepoolskind/bugCategorizes issue or PR as related to a bug.needs-priorityIndicates an issue lacks a `priority/foo` label and requires one.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions