-
Notifications
You must be signed in to change notification settings - Fork 29
Open
Description
/kind bug
What steps did you take and what happened:
- Create kind cluster as a management cluster for this bug repro. Run
clusterctl init --infrastructure v1.2.1to initialize the providers. Make sure that~/.cluster-api/clusterctl.yamlcontains relevant values needed for cluster initialization, before running this command. - use clusterctl to generate a cluster yaml for CAPX provider type. For the sake of this bug report, lets name the CAPI cluster referred to in the label
cluster.x-k8s.io/cluster-namebe called ascapx-cluster. For this report, keep the WMD and KCP pointing to same NutanixMachineTemplate, saycapx-cluster-mt-0and keep the worker node count and control node count as 1. - Change the image name for NutanixMachineTemplate, under
.spec.template.spec.image.nameto a random string value such that, the image does not exist in PC. - Deploy this cluster yaml on a management kind cluster. Note that status of
capx-clusterafter few mins. This will say that cluster is inProvisionedstate. This is incorrect ! it must either be inProvisioningstate or inFailedstate. !! - Next delete this cluster object using
kubectl delete cl capx-cluster. This command will be stuck as there arefinalizersset on the capx-cluster object. Now open another terminal with same cluster context as kind management cluster used earlier. Check the logs of CAPI controller manager and CAPX controller manager. - CAPI controller manager will report:
I0511 23:54:36.485905 1 machine_controller.go:318] "Deleting Kubernetes Node associated with Machine is not allowed" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/capx-cluster-kcp-kj4cp" namespace="default" name="capx-cluster-kcp-kj4cp" reconcileID=861994a7-ea7e-421d-8ed5-cbb0b5f95fb1 KubeadmControlPlane="default/capx-cluster-kcp" Cluster="default/capx-cluster" Node="" cause="cluster is being deleted"
E0511 23:54:36.510267 1 controller.go:329] "Reconciler error" err="machines.cluster.x-k8s.io \"capx-cluster-kcp-kj4cp\" not found" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/capx-cluster-kcp-kj4cp" namespace="default" name="capx-cluster-kcp-kj4cp" reconcileID=861994a7-ea7e-421d-8ed5-cbb0b5f95fb1
I0511 23:54:36.548737 1 cluster_controller.go:329] "Cluster still has descendants - need to requeue" controller="cluster" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="default/capx-cluster" namespace="default" name="capx-cluster" reconcileID=5bf051b7-537e-422c-9750-a4d148a73867 infrastructureRef="capx-cluster"
- CAPX cluster manager will report:
I0512 00:00:06.558316 1 nutanixcluster_controller.go:122] NutanixCluster[namespace: default, name: capx-cluster] Reconciling the NutanixCluster.
I0512 00:00:06.558407 1 nutanixcluster_controller.go:157] NutanixCluster[namespace: default, name: capx-cluster] Fetched the owner Cluster: capx-cluster
I0512 00:00:06.558616 1 nutanixcluster_controller.go:333] Credential ref is kind Secret for cluster capx-cluster
E0512 00:00:06.558636 1 nutanixcluster_controller.go:342] error occurred while fetching cluster capx-cluster secret for credential ref: Secret "capx-cluster" not found
E0512 00:00:06.558650 1 nutanixcluster_controller.go:178] NutanixCluster[namespace: default, name: capx-cluster] error occurred while reconciling credential ref for cluster capx-cluster: error occurred while fetching cluster capx-cluster secret for credential ref: Secret "capx-cluster" not found
I0512 00:00:06.559019 1 nutanixcluster_controller.go:172] NutanixCluster[namespace: default, name: capx-cluster] Patched NutanixCluster. Status: {Ready:true FailureDomains:map[] Conditions:[{Type:ClusterCategoryCreated Status:False Severity:Info LastTransitionTime:2023-05-11 23:54:38 +0000 UTC Reason:Deleting Message:} {Type:CredentialRefSecretOwnerSet Status:False Severity:Error LastTransitionTime:2023-05-11 23:54:38 +0000 UTC Reason:CredentialRefSecretOwnerSetFailed Message:error occurred while fetching cluster capx-cluster secret for credential ref: Secret "capx-cluster" not found} {Type:PrismClientInit Status:True Severity: LastTransitionTime:2023-05-11 23:47:53 +0000 UTC Reason: Message:}] FailureReason:<nil> FailureMessage:<nil>}
1.6838496065590758e+09 ERROR Reconciler error {"controller": "nutanixcluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "NutanixCluster", "NutanixCluster": {"name":"capx-cluster","namespace":"default"}, "namespace": "default", "name": "capx-cluster", "reconcileID": "93f1b33d-9269-4d15-b94a-64f2e08bcc72", "error": "error occurred while fetching cluster capx-cluster secret for credential ref: Secret \"capx-cluster\" not found"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:326
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234
Essentially it is looking for the secret object, but the object does not exist, as its deleted earlier.
What did you expect to happen:
- Cluster should not change to
Provisionedstatus. It must either be inProvisioningstate orFailurestate. - Cluster secret must not be deleted first. If it is deleted, then reconciler must omit the search of that in delete logic.
Anything else you would like to add:
None
Environment:
- Cluster-api-provider-nutanix version: v1.2.1
- Kubernetes version: (use
kubectl version): v1.25.3 - OS (e.g. from
/etc/os-release): "CentOS Linux 7 (Core)"
Metadata
Metadata
Assignees
Labels
No labels