You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: modules/eco-node-health-check-operator-about.adoc
+17-9Lines changed: 17 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,29 +22,37 @@ metadata:
22
22
namespace: openshift-operators
23
23
spec:
24
24
minHealthy: 51% <1>
25
-
remediationTemplate: <2>
25
+
pauseRequests: <2>
26
+
- <pause-test-cluster>
27
+
remediationTemplate: <3>
26
28
apiVersion: poison-pill.medik8s.io/v1alpha1
27
29
name: group-x
28
30
namespace: openshift-operators
29
31
kind: PoisonPillRemediationTemplate
30
-
selector: <3>
32
+
selector: <4>
31
33
matchExpressions:
32
34
- key: node-role.kubernetes.io/worker
33
35
operator: Exists
34
-
unhealthyConditions: <4>
36
+
unhealthyConditions: <5>
35
37
- type: Ready
36
38
status: "False"
37
-
duration: 300s <5>
39
+
duration: 300s <6>
38
40
- type: Ready
39
41
status: Unknown
40
-
duration: 300s <5>
42
+
duration: 300s <6>
41
43
----
42
44
43
45
<1> Specifies the amount (in percentage) of nodes allowed to be concurrently remediated in the targeted pool. If the number of healthy nodes equals to or exceeds the limit set by `minHealthy`, remediation occurs. The default value is 51%.
44
-
<2> Specifies a remediation template from the remediation provider. For example, from the Poison Pill Operator.
45
-
<3> Specifies a `selector` that matches labels or expressions that you want to check. The default value is empty, which selects all nodes.
46
-
<4> Specifies a list of the conditions that determine whether a node is considered unhealthy.
47
-
<5> Specifies the timeout duration for a node condition. If a condition is met for the duration of the timeout, the node will be remediated. Long timeouts can result in long periods of downtime for a workload on an unhealthy node.
46
+
<2> Prevents any new remediation from starting, while allowing any ongoing remediations to persist. The default value is empty. However, you can enter an array of strings that identify the cause of pausing the remediation. For example, `pause-test-cluster`.
47
+
+
48
+
[NOTE]
49
+
====
50
+
During the upgrade process, nodes in the cluster might become temporarily unavailable and get identified as unhealthy. In the case of worker nodes, when the Operator detects that the cluster is upgrading, it stops remediating new unhealthy nodes to prevent such nodes from rebooting.
51
+
====
52
+
<3> Specifies a remediation template from the remediation provider. For example, from the Poison Pill Operator.
53
+
<4> Specifies a `selector` that matches labels or expressions that you want to check. The default value is empty, which selects all nodes.
54
+
<5> Specifies a list of the conditions that determine whether a node is considered unhealthy.
55
+
<6> Specifies the timeout duration for a node condition. If a condition is met for the duration of the timeout, the node will be remediated. Long timeouts can result in long periods of downtime for a workload on an unhealthy node.
0 commit comments