Skip to content

Investigate rke2 control plane unhealthy node remediation stuck in machine deleting phase #61

@aleixrm

Description

@aleixrm

Description

When deploying a rke2 cluster with 3 control plane nodes, if we suddenly remove a control plane virtual machine (e.g. onevm terminate --hard <node_vm>) we have found that the machinehealthcheck sets the node in Unhealthy status as expected, but the rke2 control plane controller gets stuck waiting to the etcd node membership to get safely removed, preventing a new node to be deployed, even if the etcd database has automatically removed the terminated node from its membership. If we remove the pre-terminate.delete.hook.machine.cluster.x-k8s.io/rke2-cleanup annotation from the Machine object, the node it's properly deleted from the cluster and a new control plane node is provisioned.

Probably related:

Context

Cluster initial status

Virtual machines:

  ID USER     GROUP    NAME                             STAT  CPU     MEM HOST                        TIME
1574 oneadmin oneadmin one-zpxgf                        runn    1      3G localhost               0d 00h04
1573 oneadmin oneadmin one-p9tdm                        runn    1      3G localhost               0d 00h07
1572 oneadmin oneadmin one-md-0-wdtvv-l9qc8             runn    1      3G localhost               0d 00h07
1571 oneadmin oneadmin one-md-0-wdtvv-g42w8             runn    1      3G localhost               0d 00h07
1570 oneadmin oneadmin one-lnxc9                        runn    1      3G localhost               0d 00h09
1569 oneadmin oneadmin vr-one-cp-0                      runn    1    512M localhost               0d 00h10

Nodes:

❯ k get nodes
NAME                   STATUS   ROLES                       AGE     VERSION
one-lnxc9              Ready    control-plane,etcd,master   7m10s   v1.31.4+rke2r1
one-md-0-wdtvv-g42w8   Ready    <none>                      5m12s   v1.31.4+rke2r1
one-md-0-wdtvv-l9qc8   Ready    <none>                      4m57s   v1.31.4+rke2r1
one-p9tdm              Ready    control-plane,etcd,master   3m47s   v1.31.4+rke2r1
one-zpxgf              Ready    control-plane,etcd,master   89s     v1.31.4+rke2r1

Machines:

❯ k get machines
NAME                   CLUSTER   NODENAME               PROVIDERID   PHASE     AGE     VERSION
one-lnxc9              one       one-lnxc9              one://1570   Running   9m27s   v1.31.4+rke2r1
one-md-0-wdtvv-g42w8   one       one-md-0-wdtvv-g42w8   one://1571   Running   6m58s   v1.31.4+rke2r1
one-md-0-wdtvv-l9qc8   one       one-md-0-wdtvv-l9qc8   one://1572   Running   6m58s   v1.31.4+rke2r1
one-p9tdm              one       one-p9tdm              one://1573   Running   6m54s   v1.31.4+rke2r1
one-zpxgf              one       one-zpxgf              one://1574   Running   3m40s   v1.31.4+rke2r1

Machine healthchecks:

❯ k get machinehealthcheck
NAME           CLUSTER   EXPECTEDMACHINES   MAXUNHEALTHY   CURRENTHEALTHY   AGE
one-cp-mhc     one       3                  1              3                11m
one-md-0-mhc   one       2                  100%           2                11m

Control plane nodes machinehealthcheck:

❯ k describe machinehealthcheck one-cp-mhc
Name:         one-cp-mhc
Namespace:    default
Labels:       cluster.x-k8s.io/cluster-name=one
Annotations:  <none>
API Version:  cluster.x-k8s.io/v1beta1
Kind:         MachineHealthCheck
Metadata:
[...]
Spec:
  Cluster Name:          one
  Max Unhealthy:         1
  Node Startup Timeout:  15m0s
  Selector:
    Match Labels:
      cluster.x-k8s.io/control-plane:
  Unhealthy Conditions:
    Status:   False
    Timeout:  5m0s
    Type:     Ready
    Status:   Unknown
    Timeout:  5m0s
    Type:     Ready
Status:
  Conditions:
    Last Transition Time:  2025-11-10T12:22:09Z
    Status:                True
    Type:                  RemediationAllowed
  Current Healthy:         3
  Expected Machines:       3
  Observed Generation:     1
  Remediations Allowed:    1
  Targets:
    one-lnxc9
    one-p9tdm
    one-zpxgf
  v1beta2:
    Conditions:
      Last Transition Time:  2025-11-10T12:22:09Z
      Message:
      Observed Generation:   1
      Reason:                RemediationAllowed
      Status:                True
      Type:                  RemediationAllowed
      Last Transition Time:  2025-11-10T12:22:09Z
      Message:
      Observed Generation:   1
      Reason:                NotPaused
      Status:                False
      Type:                  Paused
Events:                      <none>

Specific machine description:

❯ k describe machine one-p9tdm
Name:         one-p9tdm
Namespace:    default
Labels:       cluster.x-k8s.io/cluster-name=one
              cluster.x-k8s.io/control-plane=
              cluster.x-k8s.io/control-plane-name=one
Annotations:  controlplane.cluster.x-k8s.io/rke2-server-configuration:
                {"disableComponents":{"kubernetesComponents":["cloudController"]},"cni":"canal","etcd":{"backupConfig":{}},"cloudProviderName":"external"}
              pre-terminate.delete.hook.machine.cluster.x-k8s.io/rke2-cleanup:
API Version:  cluster.x-k8s.io/v1beta1
Kind:         Machine
Metadata:
  Creation Timestamp:  2025-11-10T12:24:43Z
  Finalizers:
    machine.cluster.x-k8s.io
[...]
Spec:
  Bootstrap:
    Config Ref:
      API Version:     bootstrap.cluster.x-k8s.io/v1beta1
      Kind:            RKE2Config
      Name:            one-wxg26
      Namespace:       default
      UID:             bf6ae836-8765-4de2-81a9-d325f06fda81
    Data Secret Name:  one-wxg26
  Cluster Name:        one
  Infrastructure Ref:
    API Version:               infrastructure.cluster.x-k8s.io/v1beta1
    Kind:                      ONEMachine
    Name:                      one-cp-k76hr
    Namespace:                 default
    UID:                       b23e1d82-b459-4b8a-ac79-f4f0b2b064f7
  Node Deletion Timeout:       10s
  Node Drain Timeout:          2m0s
  Node Volume Detach Timeout:  5m0s
  Provider ID:                 one://1573
  Version:                     v1.31.4+rke2r1
Status:
  Addresses:
    Address:        172.20.0.8
    Type:           ExternalIP
    Address:        172.20.0.8
    Type:           InternalIP
  Bootstrap Ready:  true
  Conditions:
    Last Transition Time:  2025-11-10T12:24:44Z
    Status:                True
    Type:                  Ready
    Last Transition Time:  2025-11-10T12:27:57Z
    Status:                True
    Type:                  AgentHealthy
    Last Transition Time:  2025-11-10T12:24:43Z
    Status:                True
    Type:                  BootstrapReady
    Last Transition Time:  2025-11-10T12:27:38Z
    Status:                True
    Type:                  EtcdMemberHealthy
    Last Transition Time:  2025-11-10T12:27:57Z
    Status:                True
    Type:                  HealthCheckSucceeded
    Last Transition Time:  2025-11-10T12:24:44Z
    Status:                True
    Type:                  InfrastructureReady
    Last Transition Time:  2025-11-10T12:27:57Z
    Status:                True
    Type:                  NodeHealthy
    Last Transition Time:  2025-11-10T12:27:38Z
    Status:                True
    Type:                  NodeMetadataUpToDate
  Infrastructure Ready:    true
  Last Updated:            2025-11-10T12:27:28Z
  Node Info:
    Architecture:               amd64
    Boot ID:                    52423c7c-a045-4dc8-ada8-7f8b77a98547
    Container Runtime Version:  containerd://1.7.23-k3s2
    Kernel Version:             5.15.0-140-generic
    Kube Proxy Version:         v1.31.4+rke2r1
    Kubelet Version:            v1.31.4+rke2r1
    Machine ID:                 277874935f6a4dbaae8e2b9446151789
    Operating System:           linux
    Os Image:                   Ubuntu 22.04.5 LTS
    System UUID:                27787493-5f6a-4dba-ae8e-2b9446151789
  Node Ref:
    API Version:        v1
    Kind:               Node
    Name:               one-p9tdm
    UID:                3006100e-e5e8-4335-841b-92fb970ab64f
  Observed Generation:  3
  Phase:                Running
  v1beta2:
    Conditions:
      Last Transition Time:  2025-11-10T12:27:57Z
      Message:
      Observed Generation:   3
      Reason:                Available
      Status:                True
      Type:                  Available
      Last Transition Time:  2025-11-10T12:27:57Z
      Message:
      Observed Generation:   3
      Reason:                Ready
      Status:                True
      Type:                  Ready
      Last Transition Time:  2025-11-10T12:24:43Z
      Message:
      Observed Generation:   3
      Reason:                Ready
      Status:                True
      Type:                  BootstrapConfigReady
      Last Transition Time:  2025-11-10T12:24:44Z
      Message:
      Observed Generation:   3
      Reason:                Ready
      Status:                True
      Type:                  InfrastructureReady
      Last Transition Time:  2025-11-10T12:27:57Z
      Message:
      Observed Generation:   3
      Reason:                NodeHealthy
      Status:                True
      Type:                  NodeHealthy
      Last Transition Time:  2025-11-10T12:27:57Z
      Message:
      Observed Generation:   3
      Reason:                NodeReady
      Status:                True
      Type:                  NodeReady
      Last Transition Time:  2025-11-10T12:27:57Z
      Message:
      Observed Generation:   3
      Reason:                HealthCheckSucceeded
      Status:                True
      Type:                  HealthCheckSucceeded
      Last Transition Time:  2025-11-10T12:24:43Z
      Message:
      Observed Generation:   3
      Reason:                NotPaused
      Status:                False
      Type:                  Paused
      Last Transition Time:  2025-11-10T12:24:43Z
      Message:
      Observed Generation:   3
      Reason:                NotDeleting
      Status:                False
      Type:                  Deleting
Events:
  Type    Reason                Age                     From                           Message
  ----    ------                ----                    ----                           -------
  Normal  DetectedUnhealthy     6m36s (x25 over 8m21s)  machinehealthcheck-controller  Machine default/one-p9tdm has unhealthy Node
  Normal  SuccessfulSetNodeRef  5m36s (x2 over 5m36s)   machine-controller             one-p9tdm

Abrupt node termination status

Node VM abrupt termination via opennebula:

onevm terminate --hard one-p9tdm

RKE2 controller logs:

I1110 12:48:14.986660      15 rke2controlplane_controller.go:577] "Reconcile RKE2 Control Plane" controller="rke2controlplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="RKE2ControlPlane" RKE2ControlPlane="default/one" namespace="default" name="one" reconcileID="2137d291-fae0-4a25-b103-de08d73a4147"
I1110 12:48:15.370925      15 remediation.go:596] "etcd cluster before remediation" controller="rke2controlplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="RKE2ControlPlane" RKE2ControlPlane="default/one" namespace="default" name="one" reconcileID="2137d291-fae0-4a25-b103-de08d73a4147" currentTotalMembers=2 currentMembers=["one-lnxc9","one-zpxgf"]
I1110 12:48:15.370989      15 remediation.go:657] "etcd cluster projected after remediation of one-p9tdm" controller="rke2controlplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="RKE2ControlPlane" RKE2ControlPlane="default/one" namespace="default" name="one" reconcileID="2137d291-fae0-4a25-b103-de08d73a4147" healthyMembers=["one-lnxc9 (one-lnxc9)","one-zpxgf (one-zpxgf)"] unhealthyMembers=[] targetTotalMembers=2 targetQuorum=2 targetUnhealthyMembers=0 canSafelyRemediate=true
I1110 12:48:15.607849      15 remediation.go:325] "Remediating unhealthy machine" controller="rke2controlplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="RKE2ControlPlane" RKE2ControlPlane="default/one" namespace="default" name="one" reconcileID="2137d291-fae0-4a25-b103-de08d73a4147" Machine="default/one-p9tdm" initialized=true
I1110 12:48:15.667783      15 rke2controlplane_controller.go:566] "Successfully updated RKE2ControlPlane status" controller="rke2controlplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="RKE2ControlPlane" RKE2ControlPlane="default/one" reconcileID="2137d291-fae0-4a25-b103-de08d73a4147" namespace="default" name="one"
I1110 12:48:15.699606      15 rke2controlplane_webhook.go:81] "defaulting" logger="RKE2ControlPlane" RKE2ControlPlane="default/one"
I1110 12:48:15.709543      15 rke2controlplane_webhook.go:171] "validate update" logger="RKE2ControlPlane" RKE2ControlPlane="default/one"
I1110 12:48:15.726279      15 rke2controlplane_controller.go:577] "Reconcile RKE2 Control Plane" controller="rke2controlplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="RKE2ControlPlane" RKE2ControlPlane="default/one" namespace="default" name="one" reconcileID="976191e1-03dd-4ab5-b2af-bee81c3224c5"
I1110 12:48:16.035957      15 rke2controlplane_controller.go:566] "Successfully updated RKE2ControlPlane status" controller="rke2controlplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="RKE2ControlPlane" RKE2ControlPlane="default/one" reconcileID="976191e1-03dd-4ab5-b2af-bee81c3224c5" namespace="default" name="one"
I1110 12:48:16.049675      15 rke2controlplane_controller.go:577] "Reconcile RKE2 Control Plane" controller="rke2controlplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="RKE2ControlPlane" RKE2ControlPlane="default/one" namespace="default" name="one" reconcileID="ca27255b-8a39-44d0-a79a-d819b6ff5936"
I1110 12:48:16.507536      15 workload_cluster_etcd.go:75] "Removed member: one-p9tdm" controller="rke2controlplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="RKE2ControlPlane" RKE2ControlPlane="default/one" namespace="default" name="one" reconcileID="ca27255b-8a39-44d0-a79a-d819b6ff5936" Machine="default/one-p9tdm"
I1110 12:48:16.518129      15 lifecycle_hook.go:231] "Waiting for etcd member for Machine to be safely removed" controller="rke2controlplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="RKE2ControlPlane" RKE2ControlPlane="default/one" namespace="default" name="one" reconcileID="ca27255b-8a39-44d0-a79a-d819b6ff5936" Machine="default/one-p9tdm" machine="default/one-p9tdm"
I1110 12:48:16.551317      15 rke2controlplane_controller.go:566] "Successfully updated RKE2ControlPlane status" controller="rke2controlplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="RKE2ControlPlane" RKE2ControlPlane="default/one" reconcileID="ca27255b-8a39-44d0-a79a-d819b6ff5936" namespace="default" name="one"

Machine healthcheck status:

❯ k get machinehealthcheck
NAME           CLUSTER   EXPECTEDMACHINES   MAXUNHEALTHY   CURRENTHEALTHY   AGE
one-cp-mhc     one       3                  1              2                26m

Machine stuck in Deleting phase:

❯ k get machines
NAME                   CLUSTER   NODENAME               PROVIDERID   PHASE      AGE   VERSION
one-p9tdm              one       one-p9tdm              one://1573   Deleting   24m   v1.31.4+rke2r1
[...]

Machine status after abrupt VM termination:

❯ k describe machine one-p9tdm
Name:         one-p9tdm
Namespace:    default
Labels:       cluster.x-k8s.io/cluster-name=one
              cluster.x-k8s.io/control-plane=
              cluster.x-k8s.io/control-plane-name=one
Annotations:  controlplane.cluster.x-k8s.io/rke2-server-configuration:
                {"disableComponents":{"kubernetesComponents":["cloudController"]},"cni":"canal","etcd":{"backupConfig":{}},"cloudProviderName":"external"}
              pre-terminate.delete.hook.machine.cluster.x-k8s.io/rke2-cleanup:
API Version:  cluster.x-k8s.io/v1beta1
Kind:         Machine
Metadata:
  Creation Timestamp:             2025-11-10T12:24:43Z
  Deletion Grace Period Seconds:  0
  Deletion Timestamp:             2025-11-10T12:48:15Z
  Finalizers:
    machine.cluster.x-k8s.io
  Generation:  4
  Owner References:
    API Version:           controlplane.cluster.x-k8s.io/v1beta1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  RKE2ControlPlane
    Name:                  one
    UID:                   14a219dd-83cf-4560-afbd-e5304eb1e0b5
  Resource Version:        48003
  UID:                     25d9c0be-65d7-4cde-b3c9-812a39cacb0d
Spec:
  Bootstrap:
    Config Ref:
      API Version:     bootstrap.cluster.x-k8s.io/v1beta1
      Kind:            RKE2Config
      Name:            one-wxg26
      Namespace:       default
      UID:             bf6ae836-8765-4de2-81a9-d325f06fda81
    Data Secret Name:  one-wxg26
  Cluster Name:        one
  Infrastructure Ref:
    API Version:               infrastructure.cluster.x-k8s.io/v1beta1
    Kind:                      ONEMachine
    Name:                      one-cp-k76hr
    Namespace:                 default
    UID:                       b23e1d82-b459-4b8a-ac79-f4f0b2b064f7
  Node Deletion Timeout:       10s
  Node Drain Timeout:          2m0s
  Node Volume Detach Timeout:  5m0s
  Provider ID:                 one://1573
  Version:                     v1.31.4+rke2r1
Status:
  Addresses:
    Address:        172.20.0.8
    Type:           ExternalIP
    Address:        172.20.0.8
    Type:           InternalIP
  Bootstrap Ready:  true
  Conditions:
    Last Transition Time:  2025-11-10T12:48:14Z
    Reason:                NodeNotFound
    Severity:              Warning
    Status:                False
    Type:                  Ready
    Last Transition Time:  2025-11-10T12:48:14Z
    Message:               Missing node
    Reason:                PodFailed
    Severity:              Error
    Status:                False
    Type:                  AgentHealthy
    Last Transition Time:  2025-11-10T12:24:43Z
    Status:                True
    Type:                  BootstrapReady
    Last Transition Time:  2025-11-10T12:48:15Z
    Status:                True
    Type:                  DrainingSucceeded
    Last Transition Time:  2025-11-10T12:27:38Z
    Status:                True
    Type:                  EtcdMemberHealthy
    Last Transition Time:  2025-11-10T12:48:14Z
    Reason:                NodeNotFound
    Severity:              Warning
    Status:                False
    Type:                  HealthCheckSucceeded
    Last Transition Time:  2025-11-10T12:24:44Z
    Status:                True
    Type:                  InfrastructureReady
    Last Transition Time:  2025-11-10T12:48:14Z
    Reason:                NodeNotFound
    Severity:              Error
    Status:                False
    Type:                  NodeHealthy
    Last Transition Time:  2025-11-10T12:48:14Z
    Message:               associated node not found
    Reason:                NodePatchFailed
    Status:                Unknown
    Type:                  NodeMetadataUpToDate
    Last Transition Time:  2025-11-10T12:48:15Z
    Reason:                RemediationInProgress
    Severity:              Warning
    Status:                False
    Type:                  OwnerRemediated
    Last Transition Time:  2025-11-10T12:48:15Z
    Status:                True
    Type:                  PreDrainDeleteHookSucceeded
    Last Transition Time:  2025-11-10T12:48:15Z
    Reason:                WaitingExternalHook
    Severity:              Info
    Status:                False
    Type:                  PreTerminateDeleteHookSucceeded
    Last Transition Time:  2025-11-10T12:48:15Z
    Status:                True
    Type:                  VolumeDetachSucceeded
  Deletion:
    Node Drain Start Time:                   2025-11-10T12:48:15Z
    Wait For Node Volume Detach Start Time:  2025-11-10T12:48:15Z
  Infrastructure Ready:                      true
  Last Updated:                              2025-11-10T12:48:15Z
  Node Info:
    Architecture:               amd64
    Boot ID:                    52423c7c-a045-4dc8-ada8-7f8b77a98547
    Container Runtime Version:  containerd://1.7.23-k3s2
    Kernel Version:             5.15.0-140-generic
    Kube Proxy Version:         v1.31.4+rke2r1
    Kubelet Version:            v1.31.4+rke2r1
    Machine ID:                 277874935f6a4dbaae8e2b9446151789
    Operating System:           linux
    Os Image:                   Ubuntu 22.04.5 LTS
    System UUID:                27787493-5f6a-4dba-ae8e-2b9446151789
  Node Ref:
    API Version:        v1
    Kind:               Node
    Name:               one-p9tdm
    UID:                3006100e-e5e8-4335-841b-92fb970ab64f
  Observed Generation:  4
  Phase:                Deleting
  v1beta2:
    Conditions:
      Last Transition Time:  2025-11-10T12:48:14Z
      Message:
      Observed Generation:   4
      Reason:                NotReady
      Status:                False
      Type:                  Available
      Last Transition Time:  2025-11-10T12:48:14Z
      Message:               * Deleting: Machine deletion in progress, stage: WaitingForPreTerminateHook
* HealthCheckSucceeded: Health check failed: Node one-p9tdm has been deleted
      Observed Generation:   4
      Reason:                NotReady
      Status:                False
      Type:                  Ready
      Last Transition Time:  2025-11-10T12:24:43Z
      Message:
      Observed Generation:   4
      Reason:                Ready
      Status:                True
      Type:                  BootstrapConfigReady
      Last Transition Time:  2025-11-10T12:24:44Z
      Message:
      Observed Generation:   4
      Reason:                Ready
      Status:                True
      Type:                  InfrastructureReady
      Last Transition Time:  2025-11-10T12:48:14Z
      Message:               Node one-p9tdm has been deleted
      Observed Generation:   4
      Reason:                NodeDeleted
      Status:                False
      Type:                  NodeHealthy
      Last Transition Time:  2025-11-10T12:48:14Z
      Message:               Node one-p9tdm has been deleted
      Observed Generation:   4
      Reason:                NodeDeleted
      Status:                False
      Type:                  NodeReady
      Last Transition Time:  2025-11-10T12:48:14Z
      Message:               Health check failed: Node one-p9tdm has been deleted
      Observed Generation:   4
      Reason:                NodeDeleted
      Status:                False
      Type:                  HealthCheckSucceeded
      Last Transition Time:  2025-11-10T12:48:14Z
      Message:               Waiting for remediation
      Observed Generation:   3
      Reason:                WaitingForRemediation
      Status:                False
      Type:                  OwnerRemediated
      Last Transition Time:  2025-11-10T12:24:43Z
      Message:
      Observed Generation:   4
      Reason:                NotPaused
      Status:                False
      Type:                  Paused
      Last Transition Time:  2025-11-10T12:48:15Z
      Message:               Waiting for pre-terminate hooks to succeed (hooks: pre-terminate.delete.hook.machine.cluster.x-k8s.io/rke2-cleanup)
      Observed Generation:   4
      Reason:                WaitingForPreTerminateHook
      Status:                True
      Type:                  Deleting
Events:
  Type    Reason                  Age                     From                           Message
  ----    ------                  ----                    ----                           -------
  Normal  DetectedUnhealthy       26m (x25 over 27m)      machinehealthcheck-controller  Machine default/one-p9tdm has unhealthy Node
  Normal  SuccessfulSetNodeRef    25m (x2 over 25m)       machine-controller             one-p9tdm
  Normal  DetectedUnhealthy       4m16s (x15 over 25m)    machinehealthcheck-controller  Machine default/one-p9tdm has unhealthy Node one-p9tdm
  Normal  SuccessfulDrainNode     4m14s (x2 over 4m15s)   machine-controller             success draining Machine's node "one-p9tdm"
  Normal  MachineMarkedUnhealthy  2m25s (x11 over 4m16s)  machinehealthcheck-controller  Machine default/one-p9tdm has been marked as unhealthy by default/one-cp-mhc
  Normal  NodeVolumesDetached     114s (x3 over 4m15s)    machine-controller             success waiting for node volumes detaching Machine's node "one-p9tdm"

Final status after deleting the pre-terminate hook annotation

rke2 control plane provider logs:

E1110 12:56:04.777318      15 controller.go:347] "Reconciler error" err="failed to patch Machine default/one-p9tdm: [Machine.cluster.x-k8s.io \"one-p9tdm\" not found, machines.cluster.x-k8s.io \"one-p9tdm\" not found]" controller="rke2controlplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="RKE2ControlPlane" RKE2ControlPlane="default/one" namespace="default" name="one" reconcileID="0c0f89e9-b5e6-45f0-a899-5d95b124f80f"
I1110 12:56:04.777405      15 rke2controlplane_controller.go:577] "Reconcile RKE2 Control Plane" controller="rke2controlplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="RKE2ControlPlane" RKE2ControlPlane="default/one" namespace="default" name="one" reconcileID="50556562-368e-4e02-b26c-26eb0381f6ca"
I1110 12:56:05.200690      15 rke2controlplane_controller.go:725] "Scaling up control plane" controller="rke2controlplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="RKE2ControlPlane" RKE2ControlPlane="default/one" namespace="default" name="one" reconcileID="50556562-368e-4e02-b26c-26eb0381f6ca" Desired=3 Existing=2
I1110 12:56:05.304675      15 rke2controlplane_controller.go:566] "Successfully updated RKE2ControlPlane status" controller="rke2controlplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="RKE2ControlPlane" RKE2ControlPlane="default/one" reconcileID="50556562-368e-4e02-b26c-26eb0381f6ca" namespace="default" name="one"
I1110 12:56:05.345251      15 rke2controlplane_controller.go:577] "Reconcile RKE2 Control Plane" controller="rke2controlplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="RKE2ControlPlane" RKE2ControlPlane="default/one" namespace="default" name="one" reconcileID="c21197cd-8525-486d-b1c8-7236b822d8ad"
I1110 12:56:05.871298      15 lifecycle_hook.go:257] "Applying hook on machine" controller="rke2controlplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="RKE2ControlPlane" RKE2ControlPlane="default/one" namespace="default" name="one" reconcileID="c21197cd-8525-486d-b1c8-7236b822d8ad" hook="pre-terminate.delete.hook.machine.cluster.x-k8s.io/rke2-cleanup" machine="one-vj75m"
I1110 12:56:05.953612      15 rke2controlplane_controller.go:566] "Successfully updated RKE2ControlPlane status" controller="rke2controlplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="RKE2ControlPlane" RKE2ControlPlane="default/one" reconcileID="c21197cd-8525-486d-b1c8-7236b822d8ad" namespace="default" name="one"

OpenNebula VMs:

  ID USER     GROUP    NAME                             STAT  CPU     MEM HOST                        TIME
1575 oneadmin oneadmin one-vj75m                        runn    1      3G localhost               0d 00h05
1574 oneadmin oneadmin one-zpxgf                        runn    1      3G localhost               0d 00h33
1572 oneadmin oneadmin one-md-0-wdtvv-l9qc8             runn    1      3G localhost               0d 00h37
1571 oneadmin oneadmin one-md-0-wdtvv-g42w8             runn    1      3G localhost               0d 00h37
1570 oneadmin oneadmin one-lnxc9                        runn    1      3G localhost               0d 00h39
1569 oneadmin oneadmin vr-one-cp-0                      runn    1    512M localhost               0d 00h39

Machines:

❯ k get machine
NAME                   CLUSTER   NODENAME               PROVIDERID   PHASE         AGE   VERSION
one-lnxc9              one       one-lnxc9              one://1570   Running       34m   v1.31.4+rke2r1
one-md-0-wdtvv-g42w8   one       one-md-0-wdtvv-g42w8   one://1571   Running       31m   v1.31.4+rke2r1
one-md-0-wdtvv-l9qc8   one       one-md-0-wdtvv-l9qc8   one://1572   Running       31m   v1.31.4+rke2r1
one-vj75m              one                              one://1575   Provisioned   27s   v1.31.4+rke2r1
one-zpxgf              one       one-zpxgf              one://1574   Running       28m   v1.31.4+rke2r1

MachineHealthChecks:

❯ k get machinehealthchecks.cluster.x-k8s.io
NAME           CLUSTER   EXPECTEDMACHINES   MAXUNHEALTHY   CURRENTHEALTHY   AGE
one-cp-mhc     one       3                  1              3                39m
one-md-0-mhc   one       2                  100%           2                39m

Kubernetes nodes:

❯ k get nodes
NAME                   STATUS   ROLES                       AGE     VERSION
one-lnxc9              Ready    control-plane,etcd,master   37m     v1.31.4+rke2r1
one-md-0-wdtvv-g42w8   Ready    <none>                      35m     v1.31.4+rke2r1
one-md-0-wdtvv-l9qc8   Ready    <none>                      35m     v1.31.4+rke2r1
one-vj75m              Ready    control-plane,etcd,master   3m31s   v1.31.4+rke2r1
one-zpxgf              Ready    control-plane,etcd,master   31m     v1.31.4+rke2r1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions