|
| 1 | +--- |
| 2 | +layout: blog |
| 3 | +title: 'kubeadm: Use etcd Learner to Join a Control Plane Node Safely' |
| 4 | +date: 2023-09-25 |
| 5 | +slug: kubeadm-use-etcd-learner-mode |
| 6 | +--- |
| 7 | + |
| 8 | +**Author:** Paco Xu (DaoCloud) |
| 9 | + |
| 10 | +The [`kubeadm`](/docs/reference/setup-tools/kubeadm/) tool now supports etcd learner mode, which |
| 11 | +allows you to enhance the resilience and stability |
| 12 | +of your Kubernetes clusters by leveraging the [learner mode](https://etcd.io/docs/v3.4/learning/design-learner/#appendix-learner-implementation-in-v34) |
| 13 | +feature introduced in etcd version 3.4. |
| 14 | +This guide will walk you through using etcd learner mode with kubeadm. By default, kubeadm runs |
| 15 | +a local etcd instance on each control plane node. |
| 16 | + |
| 17 | +In v1.27, kubeadm introduced a new feature gate `EtcdLearnerMode`. With this feature gate enabled, |
| 18 | +when joining a new control plane node, a new etcd member will be created as a learner and |
| 19 | +promoted to a voting member only after the etcd data are fully aligned. |
| 20 | + |
| 21 | +## What are the advantages of using etcd learner mode? |
| 22 | + |
| 23 | +etcd learner mode offers several compelling reasons to consider its adoption |
| 24 | +in Kubernetes clusters: |
| 25 | + |
| 26 | + 1. **Enhanced Resilience**: etcd learner nodes are non-voting members that catch up with |
| 27 | + the leader's logs before becoming fully operational. This prevents new cluster members |
| 28 | + from disrupting the quorum or causing leader elections, making the cluster more resilient |
| 29 | + during membership changes. |
| 30 | + 2. **Reduced Cluster Unavailability**: Traditional approaches to adding new members often |
| 31 | + result in cluster unavailability periods, especially in slow infrastructure or misconfigurations. |
| 32 | + etcd learner mode minimizes such disruptions. |
| 33 | + 3. **Simplified Maintenance**: Learner nodes provide a safer and reversible way to add or replace |
| 34 | + cluster members. This reduces the risk of accidental cluster outages due to misconfigurations or |
| 35 | + missteps during member additions. |
| 36 | + 4. **Improved Network Tolerance**: In scenarios involving network partitions, learner mode allows |
| 37 | + for more graceful handling. Depending on the partition a new member lands, it can seamlessly |
| 38 | + integrate with the existing cluster without causing disruptions. |
| 39 | + |
| 40 | +In summary, the etcd learner mode improves the reliability and manageability of Kubernetes clusters |
| 41 | +during member additions and changes, making it a valuable feature for cluster operators. |
| 42 | + |
| 43 | +## How nodes join a cluster that's using the new mode |
| 44 | + |
| 45 | +### Create a Kubernetes cluster backed by etcd in learner mode {#create-K8s-cluster-etcd-learner-mode} |
| 46 | + |
| 47 | +For a general explanation about creating highly available clusters with kubeadm, you can refer to |
| 48 | +[Creating Highly Available Clusters with kubeadm](/docs/setup/production-environment/tools/kubeadm/high-availability/). |
| 49 | + |
| 50 | +To create a Kubernetes cluster, backed by etcd in learner mode, using kubeadm, follow these steps: |
| 51 | + |
| 52 | +```shell |
| 53 | +# kubeadm init --feature-gates=EtcdLearnerMode=true ... |
| 54 | +kubeadm init --config=kubeadm-config.yaml |
| 55 | +``` |
| 56 | + |
| 57 | +The kubeadm configuration file is like below: |
| 58 | + |
| 59 | +```yaml |
| 60 | +apiVersion: kubeadm.k8s.io/v1beta3 |
| 61 | +kind: ClusterConfiguration |
| 62 | +featureGates: |
| 63 | + EtcdLearnerMode: true |
| 64 | +``` |
| 65 | +
|
| 66 | +The kubeadm tool deploys a single-node Kubernetes cluster with etcd set to use learner mode. |
| 67 | +
|
| 68 | +### Join nodes to the Kubernetes cluster |
| 69 | +
|
| 70 | +Before joining a control-plane node to the new Kubernetes cluster, ensure that the existing control plane nodes |
| 71 | +and all etcd members are healthy. |
| 72 | +
|
| 73 | +Check the cluster health with `etcdctl`. If `etcdctl` isn't available, you can run this tool inside a container image. |
| 74 | +You would do that directly with your container runtime using a tool such as `crictl run` and not through Kubernetes |
| 75 | + |
| 76 | +Here is an example on a client command that uses secure communication to check the cluster health of the etcd cluster: |
| 77 | + |
| 78 | +```shell |
| 79 | +ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 \ |
| 80 | + --cert=/etc/kubernetes/pki/etcd/server.crt \ |
| 81 | + --key=/etc/kubernetes/pki/etcd/server.key \ |
| 82 | + --cacert=/etc/kubernetes/pki/etcd/ca.crt \ |
| 83 | + member list |
| 84 | +... |
| 85 | +dc543c4d307fadb9, started, node1, https://10.6.177.40:2380, https://10.6.177.40:2379, false |
| 86 | +``` |
| 87 | + |
| 88 | +To check if the Kubernetes control plane is healthy, run `kubectl get node -l node-role.kubernetes.io/control-plane=` |
| 89 | +and check if the nodes are ready. |
| 90 | + |
| 91 | +Note: It is recommended to have an odd number of members in a etcd cluster. |
| 92 | + |
| 93 | +Before joining a worker node to the new Kubernetes cluster, ensure that the control plane nodes are healthy. |
| 94 | + |
| 95 | +## What's next |
| 96 | + |
| 97 | +- The feature gate `EtcdLearnerMode` is alpha in v1.27 and we expect it to graduate to beta in the next |
| 98 | + minor release of Kubernetes (v1.29). |
| 99 | +- etcd has an open issue that may make the process more automatic: |
| 100 | + [Support auto-promoting a learner member to a voting member](https://github.com/etcd-io/etcd/issues/15107). |
| 101 | +- Learn more about the kubeadm [configuration format](/docs/reference/config-api/kubeadm-config.v1beta3/) here. |
| 102 | + |
| 103 | +## Feedback |
| 104 | + |
| 105 | +Was this guide helpful? If you have any feedback or encounter any issues, please let us know. |
| 106 | +Your feedback is always welcome! Join the bi-weekly [SIG Cluster Lifecycle meeting](https://docs.google.com/document/d/1Gmc7LyCIL_148a9Tft7pdhdee0NBHdOfHS1SAF0duI4/edit) |
| 107 | +or weekly [kubeadm office hours](https://docs.google.com/document/d/130_kiXjG7graFNSnIAgtMS1G8zPDwpkshgfRYS0nggo/edit). Or reach us via [Slack](https://slack.k8s.io/) (channel **#kubeadm**), or the [SIG's mailing list](https://groups.google.com/g/kubernetes-sig-cluster-lifecycle). |
0 commit comments