Skip to content

Commit 9634963

Browse files
committed
Document causes of automated reboots and MCO pause behavior in Console
1 parent 245ae9e commit 9634963

File tree

4 files changed

+238
-98
lines changed

4 files changed

+238
-98
lines changed
Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * support/troubleshooting/troubleshooting-operator-issues.adoc
4+
5+
[id="troubleshooting-disabling-autoreboot-mco-cli_{context}"]
6+
= Disabling the Machine Config Operator from automatically rebooting by using the CLI
7+
8+
To avoid unwanted disruptions from changes made by the Machine Config Operator (MCO), you can modify the machine config pool (MCP) using the OpenShift CLI (oc) to prevent the MCO from making any changes to nodes in that pool. This prevents any reboots that would normally be part of the MCO update process.
9+
10+
[NOTE]
11+
====
12+
Pausing an MCP stops all updates to your {op-system} nodes, including updates to the operating system, security, certificate, as well as any other updates related to the machine config. Pausing should be done for short periods of time only.
13+
====
14+
15+
.Prerequisites
16+
17+
* You have access to the cluster as a user with the `cluster-admin` role.
18+
* You have installed the OpenShift CLI (`oc`).
19+
20+
.Procedure
21+
22+
To pause or unpause automatic MCO update rebooting:
23+
24+
* Pause the autoreboot process:
25+
26+
. Update the `MachineConfigPool` custom resource to set the `spec.paused` field to `true`.
27+
+
28+
.Control plane (master) nodes
29+
[source,terminal]
30+
----
31+
$ oc patch --type=merge --patch='{"spec":{"paused":true}}' machineconfigpool/master
32+
----
33+
+
34+
.Worker nodes
35+
[source,terminal]
36+
----
37+
$ oc patch --type=merge --patch='{"spec":{"paused":true}}' machineconfigpool/worker
38+
----
39+
40+
. Verify that the MCP is paused:
41+
+
42+
.Control plane (master) nodes
43+
[source,terminal]
44+
----
45+
$ oc get machineconfigpool/master --template='{{.spec.paused}}'
46+
----
47+
+
48+
.Worker nodes
49+
[source,terminal]
50+
----
51+
$ oc get machineconfigpool/worker --template='{{.spec.paused}}'
52+
----
53+
+
54+
.Example output
55+
[source,terminal]
56+
----
57+
true
58+
----
59+
+
60+
The `spec.paused` field is `true` and the MCP is paused.
61+
62+
. Determine if the MCP has pending changes:
63+
+
64+
[source,terminal]
65+
----
66+
# oc get machineconfigpool
67+
----
68+
+
69+
.Example output
70+
----
71+
NAME CONFIG UPDATED UPDATING
72+
master rendered-master-33cf0a1254318755d7b48002c597bf91 True False
73+
worker rendered-worker-e405a5bdb0db1295acea08bcca33fa60 False False
74+
----
75+
+
76+
If the *UPDATED* column is *False* and *UPDATING* is *False*, there are pending changes. When *UPDATED* is *True* and *UPDATING* is *False*, there are no pending changes. In the previous example, the worker node has pending changes. The master node does not have any pending changes.
77+
+
78+
[IMPORTANT]
79+
====
80+
If there are pending changes (where both the *Updated* and *Updating* columns are *False*), it is recommended to schedule a maintenance window for a reboot as early as possible. Use the following steps for unpausing the autoreboot process to apply the changes that were queued since the last reboot.
81+
====
82+
83+
* Unpause the autoreboot process:
84+
85+
. Update the `MachineConfigPool` custom resource to set the `spec.paused` field to `false`.
86+
+
87+
.Control plane (master) nodes
88+
[source,terminal]
89+
----
90+
$ oc patch --type=merge --patch='{"spec":{"paused":false}}' machineconfigpool/master
91+
----
92+
+
93+
.Worker nodes
94+
[source,terminal]
95+
----
96+
$ oc patch --type=merge --patch='{"spec":{"paused":false}}' machineconfigpool/worker
97+
----
98+
+
99+
[NOTE]
100+
====
101+
By unpausing an MCP, the MCO applies all paused changes and reboots {op-system-first} as needed.
102+
====
103+
+
104+
. Verify that the MCP is unpaused:
105+
+
106+
.Control plane (master) nodes
107+
[source,terminal]
108+
----
109+
$ oc get machineconfigpool/master --template='{{.spec.paused}}'
110+
----
111+
+
112+
.Worker nodes
113+
[source,terminal]
114+
----
115+
$ oc get machineconfigpool/worker --template='{{.spec.paused}}'
116+
----
117+
+
118+
.Example output
119+
[source,terminal]
120+
----
121+
false
122+
----
123+
+
124+
The `spec.paused` field is `false` and the MCP is unpaused.
125+
126+
. Determine if the MCP has pending changes:
127+
+
128+
[source,terminal]
129+
----
130+
$ oc get machineconfigpool
131+
----
132+
+
133+
.Example output
134+
----
135+
NAME CONFIG UPDATED UPDATING
136+
master rendered-master-546383f80705bd5aeaba93 True False
137+
worker rendered-worker-b4c51bb33ccaae6fc4a6a5 False True
138+
----
139+
+
140+
If the MCP is applying any pending changes, the *UPDATED* column is *False* and the *UPDATING* column is *True*. When *UPDATED* is *True* and *UPDATING* is *False*, there are no further changes being made. In the previous example, the MCO is updating the worker node.
141+
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * support/troubleshooting/troubleshooting-operator-issues.adoc
4+
5+
[id="troubleshooting-disabling-autoreboot-mco-console_{context}"]
6+
= Disabling the Machine Config Operator from automatically rebooting by using the console
7+
8+
To avoid unwanted disruptions from changes made by the Machine Config Operator (MCO), you can use the {product-title} web console to modify the machine config pool (MCP) to prevent the MCO from making any changes to nodes in that pool. This prevents any reboots that would normally be part of the MCO update process.
9+
10+
[NOTE]
11+
====
12+
Pausing an MCP stops all updates to your {op-system} nodes, including updates to the operating system, security, certificate, and any other updates related to the machine config. Pausing should be done for short periods of time only.
13+
====
14+
15+
.Prerequisites
16+
17+
* You have access to the cluster as a user with the `cluster-admin` role.
18+
19+
.Procedure
20+
21+
To pause or unpause automatic MCO update rebooting:
22+
23+
* Pause the autoreboot process:
24+
25+
. Log in to the {product-title} web console as a user with the `cluster-admin` role.
26+
27+
. Click *Compute* -> *Machine Config Pools*.
28+
29+
. On the *Machine Config Pools* page, click either *master* or *worker*, depending upon which nodes you want to pause rebooting for.
30+
31+
. On the *master* or *worker* page, click *YAML*.
32+
33+
. In the YAML, update the `spec.paused` field to `true`.
34+
+
35+
.Sample MachineConfigPool object
36+
[source,yaml]
37+
----
38+
apiVersion: machineconfiguration.openshift.io/v1
39+
kind: MachineConfigPool
40+
...
41+
spec:
42+
...
43+
paused: true <1>
44+
----
45+
<1> Update the `spec.paused` field to `true` to pause rebooting.
46+
47+
. To verify that the MCP is paused, return to the *Machine Config Pools* page.
48+
+
49+
On the *Machine Config Pools* page, the *Paused* column reports *True* for the MCP you modified.
50+
+
51+
If the MCP has pending changes while paused, the *Updated* column is *False* and *Updating* is *False*. When *Updated* is *True* and *Updating* is *False*, there are no pending changes.
52+
+
53+
[IMPORTANT]
54+
====
55+
If there are pending changes (where both the *Updated* and *Updating* columns are *False*), it is recommended to schedule a maintenance window for a reboot as early as possible. Use the following steps for unpausing the autoreboot process to apply the changes that were queued since the last reboot.
56+
====
57+
58+
* Unpause the autoreboot process:
59+
60+
. Log in to the {product-title} web console as a user with the `cluster-admin` role.
61+
62+
. Click *Compute* -> *Machine Config Pools*.
63+
64+
. On the *Machine Config Pools* page, click either *master* or *worker*, depending upon which nodes you want to pause rebooting for.
65+
66+
. On the *master* or *worker* page, click *YAML*.
67+
68+
. In the YAML, update the `spec.paused` field to `false`.
69+
+
70+
.Sample MachineConfigPool object
71+
[source,yaml]
72+
----
73+
apiVersion: machineconfiguration.openshift.io/v1
74+
kind: MachineConfigPool
75+
...
76+
spec:
77+
...
78+
paused: false <1>
79+
----
80+
<1> Update the `spec.paused` field to `false` to allow rebooting.
81+
+
82+
[NOTE]
83+
====
84+
By unpausing an MCP, the MCO applies all paused changes reboots {op-system-first} as needed.
85+
====
86+
87+
. To verify that the MCP is paused, return to the *Machine Config Pools* page.
88+
+
89+
On the *Machine Config Pools* page, the *Paused* column reports *False* for the MCP you modified.
90+
+
91+
If the MCP is applying any pending changes, the *Updated* column is *False* and the *Updating* column is *True*. When *Updated* is *True* and *Updating* is *False*, there are no further changes being made.
92+

modules/troubleshooting-disabling-autoreboot-mco.adoc

Lines changed: 2 additions & 98 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
// * support/troubleshooting/troubleshooting-operator-issues.adoc
44

55
[id="troubleshooting-disabling-autoreboot-mco_{context}"]
6-
= Disabling Machine Config Operator from automatically rebooting
6+
= Disabling the Machine Config Operator from automatically rebooting
77

88
When configuration changes are made by the Machine Config Operator (MCO), {op-system-first} must reboot for the changes to take effect. Whether the configuration change is automatic, such as when a `kube-apiserver-to-kubelet-signer` certificate authority (CA) is rotated, or manual, an {op-system} node reboots automatically unless it is paused.
99

@@ -18,106 +18,10 @@ The following modifications do not trigger a node reboot:
1818
When the MCO detects any of these changes, it drains the corresponding nodes, applies the changes, and uncordons the nodes.
1919
====
2020

21-
To avoid unwanted disruptions, you can modify the machine config pool to prevent automatic rebooting after the Operator makes changes to the machine config.
21+
To avoid unwanted disruptions, you can modify the machine config pool (MCP) to prevent automatic rebooting after the Operator makes changes to the machine config.
2222

2323
[NOTE]
2424
====
2525
Pausing a machine config pool stops all system reboot processes and all configuration changes from being applied.
2626
====
2727

28-
.Prerequisites
29-
30-
* You have access to the cluster as a user with the `cluster-admin` role.
31-
* You have installed the OpenShift CLI (`oc`).
32-
* You have root access in {product-title}.
33-
34-
.Procedure
35-
. To pause the autoreboot process after machine config changes are applied:
36-
37-
* As root, update the `spec.paused` field to `true` in the `MachineConfigPool` custom resource.
38-
+
39-
.Control plane (master) nodes
40-
[source,terminal]
41-
----
42-
# oc patch --type=merge --patch='{"spec":{"paused":true}}' machineconfigpool/master
43-
----
44-
+
45-
.Worker nodes
46-
[source,terminal]
47-
----
48-
# oc patch --type=merge --patch='{"spec":{"paused":true}}' machineconfigpool/worker
49-
----
50-
51-
. To verify that the machine config pool is paused:
52-
+
53-
.Control plane (master) nodes
54-
[source,terminal]
55-
----
56-
# oc get machineconfigpool/master --template='{{.spec.paused}}'
57-
----
58-
+
59-
.Worker nodes
60-
[source,terminal]
61-
----
62-
# oc get machineconfigpool/worker --template='{{.spec.paused}}'
63-
----
64-
+
65-
The `spec.paused` field is `true` and the machine config pool is paused.
66-
67-
. Alternatively, to unpause the autoreboot process:
68-
69-
* As root, update the `spec.paused` field to `false` in the MachineConfigPool CustomResourceDefinition (CRD).
70-
+
71-
.Control plane (master) nodes
72-
[source,terminal]
73-
----
74-
# oc patch --type=merge --patch='{"spec":{"paused":false}}' machineconfigpool/master
75-
----
76-
+
77-
.Worker nodes
78-
[source,terminal]
79-
----
80-
# oc patch --type=merge --patch='{"spec":{"paused":false}}' machineconfigpool/worker
81-
----
82-
+
83-
[NOTE]
84-
====
85-
By unpausing a machine config pool, all paused changes are applied at reboot.
86-
====
87-
+
88-
. To verify that the machine config pool is unpaused:
89-
+
90-
.Control plane (master) nodes
91-
[source,terminal]
92-
----
93-
# oc get machineconfigpool/master --template='{{.spec.paused}}'
94-
----
95-
+
96-
.Worker nodes
97-
[source,terminal]
98-
----
99-
# oc get machineconfigpool/worker --template='{{.spec.paused}}'
100-
----
101-
+
102-
The `spec.paused` field is `false` and the machine config pool is unpaused.
103-
104-
. To see if the machine config pool has pending changes:
105-
+
106-
[source,terminal]
107-
----
108-
# oc get machineconfigpool
109-
----
110-
+
111-
.Example output
112-
----
113-
NAME CONFIG UPDATED UPDATING
114-
master rendered-master-546383f80705bd5aeaba93 True False
115-
worker rendered-worker-b4c51bb33ccaae6fc4a6a5 True False
116-
----
117-
+
118-
When `UPDATED` is `True` and `UPDATING` is `False`, there are no pending changes, and vice versa.
119-
120-
[IMPORTANT]
121-
====
122-
It is recommended to schedule a maintenance window for a reboot as early as possible by setting `spec.paused` to `false` so that the queued changes since last reboot will take effect.
123-
====

support/troubleshooting/troubleshooting-operator-issues.adoc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,9 @@ include::modules/gathering-operator-logs.adoc[leveloffset=+1]
2727

2828
// Disabling Machine Config Operator from autorebooting
2929
include::modules/troubleshooting-disabling-autoreboot-mco.adoc[leveloffset=+1]
30+
include::modules/troubleshooting-disabling-autoreboot-mco-console.adoc[leveloffset=+2]
31+
include::modules/troubleshooting-disabling-autoreboot-mco-cli.adoc[leveloffset=+2]
3032

3133
// Refreshing failing subscriptions
3234
include::modules/olm-refresh-subs.adoc[leveloffset=+1]
35+

0 commit comments

Comments
 (0)