Skip to content

Commit eda9110

Browse files
author
JoeAldinger
committed
OCPBUGS-19403: updates etcd procedure with OVN-K i/c
1 parent 0b42f2b commit eda9110

File tree

1 file changed

+23
-28
lines changed

1 file changed

+23
-28
lines changed

modules/dr-restoring-cluster-state.adoc

Lines changed: 23 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -281,26 +281,29 @@ etcd-ip-10-0-143-125.ec2.internal 1/1 Running 1
281281
+
282282
If the status is `Pending`, or the output lists more than one running etcd pod, wait a few minutes and check again.
283283

284-
. If you are using the `OVNKubernetes` network plugin, delete the node objects that are associated with control plane hosts that are not the recovery control plane host.
284+
. If you are using the `OVNKubernetes` network plugin, you must restart `ovnkube-controlplane` pods.
285+
.. Delete all of the `ovnkube-controlplane` pods by running the following command:
285286
+
286287
[source,terminal]
287288
----
288-
$ oc delete node <non-recovery-controlplane-host-1> <non-recovery-controlplane-host-2>
289+
$ oc -n openshift-ovn-kubernetes delete pod -l app=ovnkube-control-plane
289290
----
290-
291-
. Verify that the Cluster Network Operator (CNO) redeploys the OVN-Kubernetes control plane and that it no longer references the non-recovery controller IP addresses. To verify this result, regularly check the output of the following command. Wait until it returns an empty result before you proceed to restart the Open Virtual Network (OVN) Kubernetes pods on all of the hosts in the next step.
291+
.. Verify that all of the `ovnkube-controlplane` pods were redeployed by running the following command:
292292
+
293293
[source,terminal]
294294
----
295-
$ oc -n openshift-ovn-kubernetes get ds/ovnkube-master -o yaml | grep -E '<non-recovery_controller_ip_1>|<non-recovery_controller_ip_2>'
295+
$ oc -n openshift-ovn-kubernetes get pod -l app=ovnkube-control-plane
296296
----
297+
298+
. If you are using the OVN-Kubernetes network plugin, restart the Open Virtual Network (OVN) Kubernetes pods on all the nodes one by one. Use the following steps to restart OVN-Kubernetes pods on each node:
297299
+
298-
[NOTE]
300+
[IMPORTANT]
299301
====
300-
It can take at least 5-10 minutes for the OVN-Kubernetes control plane to be redeployed and the previous command to return empty output.
302+
.Restart OVN-Kubernetes pods in the following order:
303+
. The recovery control plane host
304+
. The other control plane hosts (if available)
305+
. The other nodes
301306
====
302-
303-
. Restart the Open Virtual Network (OVN) Kubernetes pods on all the hosts.
304307
+
305308
[NOTE]
306309
====
@@ -313,43 +316,35 @@ Alternatively, you can temporarily set the `failurePolicy` to `Ignore` while res
313316
+
314317
[source,terminal]
315318
----
316-
$ sudo rm -f /var/lib/ovn/etc/*.db
319+
$ sudo rm -f /var/lib/ovn-ic/etc/*.db
317320
----
318321

319-
.. Delete all OVN-Kubernetes control plane pods by running the following command:
322+
.. Restart the OpenVSwitch services. Access the node by using Secure Shell (SSH) and run the following command:
320323
+
321324
[source,terminal]
322325
----
323-
$ oc delete pods -l app=ovnkube-master -n openshift-ovn-kubernetes
326+
$ sudo systemctl restart ovs-vswitchd ovsdb-server
324327
----
325328

326-
.. Ensure that any OVN-Kubernetes control plane pods are deployed again and are in a `Running` state by running the following command:
329+
.. Delete the `ovnkube-node` pod on the node by running the following command, replacing `<node>` with the name of the node that you are restarting:
327330
+
328331
[source,terminal]
329332
----
330-
$ oc get pods -l app=ovnkube-master -n openshift-ovn-kubernetes
333+
$ oc -n openshift-ovn-kubernetes delete pod -l app=ovnkube-node --field-selector=spec.nodeName==<node>
331334
----
332335
+
333-
.Example output
334-
[source,terminal]
335-
----
336-
NAME READY STATUS RESTARTS AGE
337-
ovnkube-master-nb24h 4/4 Running 0 48s
338-
----
339336
340-
.. Delete all `ovnkube-node` pods by running the following command:
337+
.. Verify that the `ovnkube-node` pod is running again with the following command:
341338
+
342339
[source,terminal]
343340
----
344-
$ oc get pods -n openshift-ovn-kubernetes -o name | grep ovnkube-node | while read p ; do oc delete $p -n openshift-ovn-kubernetes ; done
341+
$ oc -n openshift-ovn-kubernetes get pod -l app=ovnkube-node --field-selector=spec.nodeName==<node>
345342
----
346-
347-
.. Ensure that all the `ovnkube-node` pods are deployed again and are in a `Running` state by running the following command:
348343
+
349-
[source,terminal]
350-
----
351-
$ oc get pods -n openshift-ovn-kubernetes | grep ovnkube-node
352-
----
344+
[NOTE]
345+
====
346+
It might take several minutes for the pods to restart.
347+
====
353348

354349
. Delete and re-create other non-recovery, control plane machines, one by one. After the machines are re-created, a new revision is forced and etcd automatically scales up.
355350
+

0 commit comments

Comments
 (0)