Skip to content

Commit 68ca07d

Browse files
author
Bob Furu
authored
Merge pull request #36102 from abhatt-rh/telcodocs-291-mhc
2 parents 134bce0 + 87d06b5 commit 68ca07d

9 files changed

+79
-8
lines changed

modules/machine-health-checks-about.adoc

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,9 +29,6 @@ Consider the timeouts carefully, accounting for workloads and requirements.
2929

3030
To stop the check, remove the resource.
3131

32-
For example, you should stop the check during the upgrade process because the nodes in the cluster might become temporarily unavailable. The `MachineHealthCheck` might identify such nodes as unhealthy and reboot them. To avoid rebooting such nodes, remove any `MachineHealthCheck` resource that you have deployed before updating the cluster.
33-
However, a `MachineHealthCheck` resource that is deployed by default (such as `machine-api-termination-handler`) cannot be removed and will be recreated.
34-
3532
[id="machine-health-checks-limitations_{context}"]
3633
== Limitations when deploying machine health checks
3734

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
// Module included in the following assemblies:
2+
3+
// * updating/updating-cluster-cli.adoc
4+
// * updating/updating-cluster-between-minor.adoc
5+
// * updating/updating-restricted-network-cluster.adoc
6+
7+
[id="machine-health-checks-pausing_{context}"]
8+
= Pausing a MachineHealthCheck resource
9+
10+
During the upgrade process, nodes in the cluster might become temporarily unavailable. In the case of worker nodes, the machine health check might identify such nodes as unhealthy and reboot them. To avoid rebooting such nodes, pause all the `MachineHealthCheck` resources before updating the cluster.
11+
12+
.Prerequisites
13+
14+
* Install the OpenShift CLI (`oc`).
15+
16+
.Procedure
17+
18+
. To list all the available `MachineHealthCheck` resources that you want to pause, run the following command:
19+
+
20+
[source,terminal]
21+
----
22+
$ oc get machinehealthcheck -n openshift-machine-api
23+
----
24+
25+
. To pause the machine health checks, add the `cluster.x-k8s.io/paused=""` annotation to the `MachineHealthCheck` resource. Run the following command:
26+
+
27+
[source,terminal]
28+
----
29+
$ oc -n openshift-machine-api annotate mhc <mhc-name> cluster.x-k8s.io/paused=""
30+
----
31+
+
32+
The annotated `MachineHealthCheck` resource resembles the following YAML file:
33+
+
34+
[source,yaml]
35+
----
36+
apiVersion: machine.openshift.io/v1beta1
37+
kind: MachineHealthCheck
38+
metadata:
39+
name: example
40+
namespace: openshift-machine-api
41+
annotations:
42+
cluster.x-k8s.io/paused: ""
43+
spec:
44+
selector:
45+
matchLabels:
46+
role: worker
47+
unhealthyConditions:
48+
- type: "Ready"
49+
status: "Unknown"
50+
timeout: "300s"
51+
- type: "Ready"
52+
status: "False"
53+
timeout: "300s"
54+
maxUnhealthy: "40%"
55+
status:
56+
currentHealthy: 5
57+
expectedMachines: 5
58+
----
59+
+
60+
[IMPORTANT]
61+
====
62+
Resume the machine health checks after updating the cluster. To resume the check, remove the pause annotation from the `MachineHealthCheck` resource by running the following command:
63+
64+
[source,terminal]
65+
----
66+
$ oc -n openshift-machine-api annotate mhc <mhc-name> cluster.x-k8s.io/paused-
67+
----
68+
====
69+

modules/update-restricted.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ If you have a local OpenShift Update Service, you can update by using the connec
2020
* You applied the release image signature ConfigMap for the new release to your cluster.
2121
* You obtained the sha256 sum value for the release from the image signature ConfigMap.
2222
* Install the OpenShift CLI (`oc`), version 4.4.8 or later.
23+
* Pause all `MachineHealthCheck` resources.
2324

2425
.Procedure
2526

modules/update-service-overview.adoc

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -45,8 +45,3 @@ With the specification for the new version applied to the old kubelet, the {op-s
4545

4646
The OpenShift Update Service is composed of an Operator and one or more application instances.
4747

48-
[NOTE]
49-
====
50-
During the upgrade process, nodes in the cluster might become temporarily unavailable. The `MachineHealthCheck` might identify such nodes as unhealthy and reboot them. To avoid rebooting such nodes, remove any `MachineHealthCheck` resource that you have deployed before updating the cluster.
51-
However, a MachineHealthCheck resource that is deployed by default (such as `machine-api-termination-handler`) cannot be removed and will be recreated.
52-
====

modules/update-upgrading-cli.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ of the Customer Portal.
1818
* Install the OpenShift CLI (`oc`) that matches the version for your updated version.
1919
* Log in to the cluster as user with `cluster-admin` privileges.
2020
* Install the `jq` package.
21+
* Pause all `MachineHealthCheck` resources.
2122

2223
.Procedure
2324

modules/update-upgrading-web.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ link:https://access.redhat.com/downloads/content/290[in the errata section] of t
2525
.Prerequisites
2626

2727
* Have access to the web console as a user with `admin` privileges.
28+
* Pause all `MachineHealthCheck` resources.
2829

2930
.Procedure
3031

updating/updating-cluster-between-minor.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,8 @@ include::modules/update-using-custom-machine-config-pools-canary.adoc[leveloffse
5454

5555
If you want to use the canary rollout update process, see xref:../updating/update-using-custom-machine-config-pools.adoc#update-using-custom-machine-config-pools[Performing a canary rollout update].
5656

57+
include::modules/machine-health-checks-pausing.adoc[leveloffset=+1]
58+
5759
include::modules/update-upgrading-web.adoc[leveloffset=+1]
5860

5961
include::modules/update-changing-update-server-web.adoc[leveloffset=+1]

updating/updating-cluster-cli.adoc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ See xref:../authentication/using-rbac.adoc[Using RBAC to define and apply permis
1818
* If your cluster uses manually maintained credentials with the AWS Secure Token Service (STS), obtain a copy of the `ccoctl` utility from the release image being upgraded to and use it to process any updated credentials. For more information, see xref:../authentication/managing_cloud_provider_credentials/cco-mode-sts.adoc#sts-mode-upgrading[_Upgrading an OpenShift Container Platform cluster configured for manual mode with STS_].
1919
* Ensure that you address all `Upgradeable=False` conditions so the cluster allows an upgrade to the next minor version. You can run the `oc adm upgrade` command for an output of all `Upgradeable=False` conditions and the condition reasoning to help you prepare for a minor version upgrade.
2020

21+
2122
[IMPORTANT]
2223
====
2324
Using the `unsupportedConfigOverrides` section to modify the configuration of an Operator is unsupported and might block cluster upgrades. You must remove this setting before you can upgrade your cluster.
@@ -35,6 +36,8 @@ include::modules/update-service-overview.adoc[leveloffset=+1]
3536

3637
include::modules/understanding-upgrade-channels.adoc[leveloffset=+1]
3738

39+
include::modules/machine-health-checks-pausing.adoc[leveloffset=+1]
40+
3841
include::modules/update-upgrading-cli.adoc[leveloffset=+1]
3942

4043
include::modules/update-changing-update-server-cli.adoc[leveloffset=+1]

updating/updating-restricted-network-cluster.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,8 @@ include::modules/update-oc-configmap-signature-verification.adoc[leveloffset=+2]
5454

5555
include::modules/update-configuring-image-signature.adoc[leveloffset=+2]
5656

57+
include::modules/machine-health-checks-pausing.adoc[leveloffset=+1]
58+
5759
include::modules/update-restricted.adoc[leveloffset=+1]
5860

5961
include::modules/images-configuration-registry-mirror.adoc[leveloffset=+1]

0 commit comments

Comments
 (0)