Skip to content

Commit 558fc36

Browse files
authored
Merge pull request #32579 from bergerhoffer/BZ-1880759
BZ-1880759: Adding verification step for more than 3 etcd members
2 parents 4add08e + 3979e2e commit 558fc36

File tree

2 files changed

+42
-2
lines changed

2 files changed

+42
-2
lines changed

modules/restore-replace-crashlooping-etcd-member.adoc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,9 @@ $ oc patch etcd cluster -p='{"spec": {"forceRedeploymentReason": "single-master-
198198
+
199199
When the etcd cluster Operator performs a redeployment, it ensures that all master nodes have a functioning etcd pod.
200200

201-
. Verify that the new member is available and healthy.
201+
.Verification
202+
203+
* Verify that the new member is available and healthy.
202204

203205
.. Connect to the running etcd container again.
204206
+

modules/restore-replace-stopped-etcd-member.adoc

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -326,7 +326,9 @@ clustername-8qw5l-worker-us-east-1c-pkg26 Running m4.large us-east-1
326326
+
327327
It might take a few minutes for the new machine to be created. The etcd cluster Operator will automatically sync when the machine or node returns to a healthy state.
328328

329-
. Verify that all etcd pods are running properly:
329+
.Verification
330+
331+
. Verify that all etcd pods are running properly.
330332
+
331333
In a terminal that has access to the cluster as a `cluster-admin` user, run the following command:
332334
+
@@ -350,4 +352,40 @@ If the output from the previous command only lists two pods, you can manually fo
350352
$ oc patch etcd cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge <1>
351353
----
352354
<1> The `forceRedeploymentReason` value must be unique, which is why a timestamp is appended.
355+
356+
. Verify that there are exactly three etcd members.
357+
358+
.. Connect to the running etcd container, passing in the name of a pod that was not on the affected node:
359+
+
360+
In a terminal that has access to the cluster as a `cluster-admin` user, run the following command:
361+
+
362+
[source,terminal]
363+
----
364+
$ oc rsh -n openshift-etcd etcd-ip-10-0-154-204.ec2.internal
365+
----
366+
367+
.. View the member list:
353368
+
369+
[source,terminal]
370+
----
371+
sh-4.2# etcdctl member list -w table
372+
----
373+
+
374+
.Example output
375+
[source,terminal]
376+
----
377+
+------------------+---------+------------------------------+---------------------------+---------------------------+
378+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
379+
+------------------+---------+------------------------------+---------------------------+---------------------------+
380+
| 5eb0d6b8ca24730c | started | ip-10-0-133-53.ec2.internal | https://10.0.133.53:2380 | https://10.0.133.53:2379 |
381+
| 757b6793e2408b6c | started | ip-10-0-164-97.ec2.internal | https://10.0.164.97:2380 | https://10.0.164.97:2379 |
382+
| ca8c2990a0aa29d1 | started | ip-10-0-154-204.ec2.internal | https://10.0.154.204:2380 | https://10.0.154.204:2379 |
383+
+------------------+---------+------------------------------+---------------------------+---------------------------+
384+
----
385+
+
386+
If the output from the previous command lists more than three etcd members, you must carefully remove the unwanted member.
387+
+
388+
[WARNING]
389+
====
390+
Be sure to remove the correct etcd member; removing a good etcd member might lead to quorum loss.
391+
====

0 commit comments

Comments
 (0)