Merge pull request #31755 from jcfrye77/KNIDEPLOY-4016

kalexand-rh · web-flow · commit 4ad9f80b3ce4 · 2021-05-13T12:03:20.000-04:00
KNIDEPLOY-4016
diff --git a/modules/machine-health-checks-about.adoc b/modules/machine-health-checks-about.adoc
@@ -6,32 +6,28 @@
 [id="machine-health-checks-about_{context}"]
 = About machine health checks
 
-You can define conditions under which machines in a cluster are considered unhealthy by using a `MachineHealthCheck` resource.
-Machines matching the conditions are automatically remediated.
+Machine health checks automatically repair unhealthy machines in a particular machine pool.
 
-To monitor machine health, create a `MachineHealthCheck` custom resource (CR) that includes a label for the set of machines to monitor and a condition to check, such as staying in the `NotReady` status for 15 minutes or displaying a permanent condition in the node-problem-detector.
+To monitor machine health, create a resource to define the configuration for a controller. Set a condition to check, such as staying in the `NotReady` status for five minutes or displaying a permanent condition in the node-problem-detector, and a label for the set of machines to monitor.
 
-The controller that observes a `MachineHealthCheck` CR checks for the condition that you defined. If a machine fails the health check, the machine is automatically deleted and a new one is created to take its place. When a machine is deleted, you see a `machine deleted` event.
+[NOTE]
+====
+You cannot apply a machine health check to a machine with the master role.
+====
+
+The controller that observes a `MachineHealthCheck` resource checks for the defined condition. If a machine fails the health check, the machine is automatically deleted and one is created to take its place. When a machine is deleted, you see a `machine deleted` event.
+
+To limit disruptive impact of the machine deletion, the controller drains and deletes only one node at a time. If there are more unhealthy machines than the `maxUnhealthy` threshold allows for in the targeted pool of machines, remediation stops and therefore enables manual intervention.
 
 [NOTE]
 ====
-For machines with the master role, the machine health check reports the number of unhealthy nodes, but the machine is not deleted. For example:
-
-.Example output
-[source,terminal]
-----
-$ oc get machinehealthcheck example -n openshift-machine-api
-----
-[source,terminal]
-----
-NAME      MAXUNHEALTHY   EXPECTEDMACHINES   CURRENTHEALTHY
-example   40%            3                  1
-----
-
-To limit the disruptive impact of machine deletions, the controller drains and deletes only one node at a time. If there are more unhealthy machines than the `maxUnhealthy` threshold allows for in the targeted pool of machines, the controller stops deleting machines and you must manually intervene.
+Consider the timeouts carefully, accounting for workloads and requirements.
+
+* Long timeouts can result in long periods of downtime for the workload on the unhealthy machine.
+* Too short timeouts can result in a remediation loop. For example, the timeout for checking the `NotReady` status must be long enough to allow the machine to complete the startup process.
 ====
 
-To stop the check, remove the custom resource.
+To stop the check, remove the resource.
 
 [id="machine-health-checks-limitations_{context}"]
 == Limitations when deploying machine health checks