Skip to content

Commit b42e1ca

Browse files
Merge pull request #37531 from abhatt-rh/telcodocs-296-updates
TELCODOCS296 - Updated latest specs for the NHC resource
2 parents 4b457d8 + 81704fd commit b42e1ca

File tree

1 file changed

+17
-9
lines changed

1 file changed

+17
-9
lines changed

modules/eco-node-health-check-operator-about.adoc

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -22,29 +22,37 @@ metadata:
2222
namespace: openshift-operators
2323
spec:
2424
minHealthy: 51% <1>
25-
remediationTemplate: <2>
25+
pauseRequests: <2>
26+
- <pause-test-cluster>
27+
remediationTemplate: <3>
2628
apiVersion: poison-pill.medik8s.io/v1alpha1
2729
name: group-x
2830
namespace: openshift-operators
2931
kind: PoisonPillRemediationTemplate
30-
selector: <3>
32+
selector: <4>
3133
matchExpressions:
3234
- key: node-role.kubernetes.io/worker
3335
operator: Exists
34-
unhealthyConditions: <4>
36+
unhealthyConditions: <5>
3537
- type: Ready
3638
status: "False"
37-
duration: 300s <5>
39+
duration: 300s <6>
3840
- type: Ready
3941
status: Unknown
40-
duration: 300s <5>
42+
duration: 300s <6>
4143
----
4244

4345
<1> Specifies the amount (in percentage) of nodes allowed to be concurrently remediated in the targeted pool. If the number of healthy nodes equals to or exceeds the limit set by `minHealthy`, remediation occurs. The default value is 51%.
44-
<2> Specifies a remediation template from the remediation provider. For example, from the Poison Pill Operator.
45-
<3> Specifies a `selector` that matches labels or expressions that you want to check. The default value is empty, which selects all nodes.
46-
<4> Specifies a list of the conditions that determine whether a node is considered unhealthy.
47-
<5> Specifies the timeout duration for a node condition. If a condition is met for the duration of the timeout, the node will be remediated. Long timeouts can result in long periods of downtime for a workload on an unhealthy node.
46+
<2> Prevents any new remediation from starting, while allowing any ongoing remediations to persist. The default value is empty. However, you can enter an array of strings that identify the cause of pausing the remediation. For example, `pause-test-cluster`.
47+
+
48+
[NOTE]
49+
====
50+
During the upgrade process, nodes in the cluster might become temporarily unavailable and get identified as unhealthy. In the case of worker nodes, when the Operator detects that the cluster is upgrading, it stops remediating new unhealthy nodes to prevent such nodes from rebooting.
51+
====
52+
<3> Specifies a remediation template from the remediation provider. For example, from the Poison Pill Operator.
53+
<4> Specifies a `selector` that matches labels or expressions that you want to check. The default value is empty, which selects all nodes.
54+
<5> Specifies a list of the conditions that determine whether a node is considered unhealthy.
55+
<6> Specifies the timeout duration for a node condition. If a condition is met for the duration of the timeout, the node will be remediated. Long timeouts can result in long periods of downtime for a workload on an unhealthy node.
4856

4957
[id="understanding-nhc-operator-workflow_{context}"]
5058
== Understanding the Node Health Check Operator workflow

0 commit comments

Comments
 (0)