Merge pull request #32579 from bergerhoffer/BZ-1880759

bergerhoffer · web-flow · commit 558fc36ef6ff · 2021-05-20T11:48:27.000-04:00
BZ-1880759: Adding verification step for more than 3 etcd members
diff --git a/modules/restore-replace-crashlooping-etcd-member.adoc b/modules/restore-replace-crashlooping-etcd-member.adoc
@@ -198,7 +198,9 @@ $ oc patch etcd cluster -p='{"spec": {"forceRedeploymentReason": "single-master-
 +
 When the etcd cluster Operator performs a redeployment, it ensures that all master nodes have a functioning etcd pod.
 
-. Verify that the new member is available and healthy.
+.Verification
+
+* Verify that the new member is available and healthy.
 
 .. Connect to the running etcd container again.
 +
diff --git a/modules/restore-replace-stopped-etcd-member.adoc b/modules/restore-replace-stopped-etcd-member.adoc
@@ -326,7 +326,9 @@ clustername-8qw5l-worker-us-east-1c-pkg26   Running        m4.large    us-east-1
 +
 It might take a few minutes for the new machine to be created. The etcd cluster Operator will automatically sync when the machine or node returns to a healthy state.
 
-. Verify that all etcd pods are running properly:
+.Verification
+
+. Verify that all etcd pods are running properly.
 +
 In a terminal that has access to the cluster as a `cluster-admin` user, run the following command:
 +
@@ -350,4 +352,40 @@ If the output from the previous command only lists two pods, you can manually fo
 $ oc patch etcd cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge <1>
 ----
 <1> The `forceRedeploymentReason` value must be unique, which is why a timestamp is appended.
+
+. Verify that there are exactly three etcd members.
+
+.. Connect to the running etcd container, passing in the name of a pod that was not on the affected node:
++
+In a terminal that has access to the cluster as a `cluster-admin` user, run the following command:
++
+[source,terminal]
+----
+$ oc rsh -n openshift-etcd etcd-ip-10-0-154-204.ec2.internal
+----
+
+.. View the member list:
 +
+[source,terminal]
+----
+sh-4.2# etcdctl member list -w table
+----
++
+.Example output
+[source,terminal]
+----
++------------------+---------+------------------------------+---------------------------+---------------------------+
+|        ID        | STATUS  |             NAME             |        PEER ADDRS         |       CLIENT ADDRS        |
++------------------+---------+------------------------------+---------------------------+---------------------------+
+| 5eb0d6b8ca24730c | started |  ip-10-0-133-53.ec2.internal |  https://10.0.133.53:2380 |  https://10.0.133.53:2379 |
+| 757b6793e2408b6c | started |  ip-10-0-164-97.ec2.internal |  https://10.0.164.97:2380 |  https://10.0.164.97:2379 |
+| ca8c2990a0aa29d1 | started | ip-10-0-154-204.ec2.internal | https://10.0.154.204:2380 | https://10.0.154.204:2379 |
++------------------+---------+------------------------------+---------------------------+---------------------------+
+----
++
+If the output from the previous command lists more than three etcd members, you must carefully remove the unwanted member.
++
+[WARNING]
+====
+Be sure to remove the correct etcd member; removing a good etcd member might lead to quorum loss.
+====

Original file line number	Diff line number	Diff line change
`@@ -198,7 +198,9 @@ $ oc patch etcd cluster -p='{"spec": {"forceRedeploymentReason": "single-master-`
`198`	`198`	`+`
`199`	`199`	`When the etcd cluster Operator performs a redeployment, it ensures that all master nodes have a functioning etcd pod.`
`200`	`200`
`201`		`-. Verify that the new member is available and healthy.`
	`201`	`+.Verification`
	`202`	`+`
	`203`	`+* Verify that the new member is available and healthy.`
`202`	`204`
`203`	`205`	`.. Connect to the running etcd container again.`
`204`	`206`	`+`