Merge pull request #63936 from aireilly/OCPBUGS-18111

stevsmit · web-flow · commit d41e075b7132 · 2023-08-25T11:07:57.000-04:00
OCPBUGS-18111 - Move misplaced SNO reboot topic
diff --git a/modules/sno-clusters-reboot-without-drain.adoc b/modules/sno-clusters-reboot-without-drain.adoc
@@ -0,0 +1,23 @@
+// Module included in the following assemblies:
+//
+// * nodes/nodes/nodes-nodes-working.adoc
+
+:_content-type: CONCEPT
+[id="sno-clusters-reboot-without-drain_{context}"]
+= Handling errors in {sno} clusters when the node reboots without draining application pods
+
+In {sno} clusters and in {product-title} clusters in general, a situation can arise where a node reboot occurs without first draining the node. This can occur where an application pod requesting devices fails with the `UnexpectedAdmissionError` error. `Deployment`, `ReplicaSet`, or `DaemonSet` errors are reported because the application pods that require those devices start before the pod serving those devices. You cannot control the order of pod restarts.
+
+While this behavior is to be expected, it can cause a pod to remain on the cluster even though it has failed to deploy successfully. The pod continues to report `UnexpectedAdmissionError`. This issue is mitigated by the fact that application pods are typically included in a `Deployment`, `ReplicaSet`, or `DaemonSet`. If a pod is in this error state, it is of little concern because another instance should be running. Belonging to a `Deployment`, `ReplicaSet`, or `DaemonSet` guarantees the successful creation and execution of subsequent pods and ensures the successful deployment of the application.
+
+There is ongoing work upstream to ensure that such pods are gracefully terminated. Until that work is resolved, run the following command for a {sno} cluster to remove the failed pods:
+
+[source,terminal,subs="+quotes"]
+----
+$ oc delete pods --field-selector status.phase=Failed -n _<POD_NAMESPACE>_
+----
+
+[NOTE]
+====
+The option to drain the node is unavailable for {sno} clusters.
+====
diff --git a/modules/ztp-sno-node-reboot-scenarios.adoc b/modules/ztp-sno-node-reboot-scenarios.adoc
diff --git a/nodes/nodes/nodes-nodes-working.adoc b/nodes/nodes/nodes-nodes-working.adoc
@@ -6,20 +6,26 @@ include::_attributes/common-attributes.adoc[]
 
 toc::[]
 
-As an administrator, you can perform a number of tasks to make your clusters more efficient.
+As an administrator, you can perform several tasks to make your clusters more efficient.
 
 // The following include statements pull in the module files that comprise
 // the assembly. Include any combination of concept, procedure, or reference
 // modules required to cover the user story. You can also include other
 // assemblies.
 
-
 include::modules/nodes-nodes-working-evacuating.adoc[leveloffset=+1]
 
 include::modules/nodes-nodes-working-updating.adoc[leveloffset=+1]
 
 include::modules/nodes-nodes-working-marking.adoc[leveloffset=+1]
 
+include::modules/sno-clusters-reboot-without-drain.adoc[leveloffset=+1]
+
+[role="_additional-resources"]
+.Additional resources
+
+* xref:../../nodes/nodes/nodes-nodes-working.adoc#nodes-nodes-working-evacuating_nodes-nodes-working[Understanding how to evacuate pods on nodes]
+
 == Deleting nodes
 
 include::modules/nodes-nodes-working-deleting.adoc[leveloffset=+2]
@@ -31,4 +37,3 @@ include::modules/nodes-nodes-working-deleting.adoc[leveloffset=+2]
 see xref:../../machine_management/manually-scaling-machineset.adoc#machineset-manually-scaling-manually-scaling-machineset[Manually scaling a MachineSet].
 
 include::modules/nodes-nodes-working-deleting-bare-metal.adoc[leveloffset=+2]
-
diff --git a/scalability_and_performance/ztp_far_edge/ztp-reference-cluster-configuration-for-vdu.adoc b/scalability_and_performance/ztp_far_edge/ztp-reference-cluster-configuration-for-vdu.adoc
@@ -86,12 +86,7 @@ include::modules/ztp-sno-du-configuring-lvms.adoc[leveloffset=+2]
 
 include::modules/ztp-sno-du-disabling-network-diagnostics.adoc[leveloffset=+2]
 
-include::modules/ztp-sno-node-reboot-scenarios.adoc[leveloffset=+2]
-
 [role="_additional-resources"]
 .Additional resources
 
-* xref:../../nodes/nodes/nodes-nodes-working.adoc#nodes-nodes-working-evacuating_nodes-nodes-working[Understanding how to evacuate pods on nodes
-]
-
 * xref:../../scalability_and_performance/ztp_far_edge/ztp-deploying-far-edge-sites.adoc#ztp-deploying-far-edge-sites[Deploying far edge sites using ZTP]