Skip to content

Commit 263fc03

Browse files
jpbetzHan Kangjihoon-seo
authored
Include how to route away from broken etcd in etcd maintenance docs (#35882)
* Include how to route away from broken etcd in etcd maintenance docs * Apply suggestions from code review Apply suggestions and use 1. for all numbering (markdown will set the numbering automatically this way) Co-authored-by: Han Kang <[email protected]> Co-authored-by: Jihoon Seo <[email protected]> * Update content/en/docs/tasks/administer-cluster/configure-upgrade-etcd.md Co-authored-by: Jihoon Seo <[email protected]> Co-authored-by: Han Kang <[email protected]> Co-authored-by: Jihoon Seo <[email protected]>
1 parent 59cd910 commit 263fc03

File tree

1 file changed

+27
-9
lines changed

1 file changed

+27
-9
lines changed

content/en/docs/tasks/administer-cluster/configure-upgrade-etcd.md

Lines changed: 27 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
reviewers:
33
- mml
44
- wojtek-t
5+
- jpbetz
56
title: Operating etcd clusters for Kubernetes
67
content_type: task
78
---
@@ -187,7 +188,21 @@ replace it with `member4=http://10.0.0.4`.
187188
fd422379fda50e48, started, member3, http://10.0.0.3:2380, http://10.0.0.3:2379
188189
```
189190

190-
2. Remove the failed member:
191+
1. Do either of the following:
192+
193+
1. If each Kubernetes API server is configured to communicate with all etcd
194+
members, remove the failed member from the `--etcd-servers` flag, then
195+
restart each Kubernetes API server.
196+
1. If each Kubernetes API server communicates with a single etcd member,
197+
then stop the Kubernetes API server that communicates with the failed
198+
etcd.
199+
200+
1. Stop the etcd server on the broken node. It is possible that other
201+
clients besides the Kubernetes API server is causing traffic to etcd
202+
and it is desirable to stop all traffic to prevent writes to the data
203+
dir.
204+
205+
1. Remove the failed member:
191206

192207
```shell
193208
etcdctl member remove 8211f1d0f64f3269
@@ -199,7 +214,7 @@ replace it with `member4=http://10.0.0.4`.
199214
Removed member 8211f1d0f64f3269 from cluster
200215
```
201216

202-
3. Add the new member:
217+
1. Add the new member:
203218

204219
```shell
205220
etcdctl member add member4 --peer-urls=http://10.0.0.4:2380
@@ -211,7 +226,7 @@ replace it with `member4=http://10.0.0.4`.
211226
Member 2be1eb8f84b7f63e added to cluster ef37ad9dc622a7c4
212227
```
213228

214-
4. Start the newly added member on a machine with the IP `10.0.0.4`:
229+
1. Start the newly added member on a machine with the IP `10.0.0.4`:
215230

216231
```shell
217232
export ETCD_NAME="member4"
@@ -220,13 +235,16 @@ replace it with `member4=http://10.0.0.4`.
220235
etcd [flags]
221236
```
222237

223-
5. Do either of the following:
238+
1. Do either of the following:
224239

225-
1. Update the `--etcd-servers` flag for the Kubernetes API servers to make
226-
Kubernetes aware of the configuration changes, then restart the
227-
Kubernetes API servers.
228-
2. Update the load balancer configuration if a load balancer is used in the
229-
deployment.
240+
1. If each Kubernetes API server is configured to communicate with all etcd
241+
members, add the newly added member to the `--etcd-servers` flag, then
242+
restart each Kubernetes API server.
243+
1. If each Kubernetes API server communicates with a single etcd member,
244+
start the Kubernetes API server that was stopped in step 2. Then
245+
configure Kubernetes API server clients to again route requests to the
246+
Kubernetes API server that was stopped. This can often be done by
247+
configuring a load balancer.
230248

231249
For more information on cluster reconfiguration, see
232250
[etcd reconfiguration documentation](https://etcd.io/docs/current/op-guide/runtime-configuration/#remove-a-member).

0 commit comments

Comments
 (0)