You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: modules/eco-self-node-remediation-operator-about.adoc
+24-8Lines changed: 24 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,6 +8,22 @@
8
8
9
9
The Self Node Remediation Operator runs on the cluster nodes and reboots nodes that are identified as unhealthy. The Operator uses the `MachineHealthCheck` or `NodeHealthCheck` controller to detect the health of a node in the cluster. When a node is identified as unhealthy, the `MachineHealthCheck` or the `NodeHealthCheck` resource creates the `SelfNodeRemediation` custom resource (CR), which triggers the Self Node Remediation Operator.
10
10
11
+
The `SelfNodeRemediation` CR resembles the following YAML file:
<1> Displays the last error that occurred during remediation. When remediation succeeds or if no errors occur, the field is left empty.
26
+
11
27
The Self Node Remediation Operator minimizes downtime for stateful applications and restores compute capacity if transient failures occur. You can use this Operator regardless of the management interface, such as IPMI or an API to provision a node, and regardless of the cluster installation type, such as installer-provisioned infrastructure or user-provisioned infrastructure.
If a watchdog device is unavailable, the `SelfNodeRemediationConfig` CR uses a software reboot.
46
62
<3> Specify if you want to enable software reboot of the unhealthy nodes. By default, the value of `isSoftwareRebootEnabled` is set to `true`. To disable the software reboot, set the parameter value to `false`.
47
-
<4> Specify the timeout duration to check connectivity with each API server. When this duration elapses, the Operator starts remediation.
48
-
<5> Specify the frequency to check connectivity with each API server.
49
-
<6> Specify a threshold value. After reaching this threshold, the node starts contacting its peers.
50
-
<7> Specify the timeout duration for the peer to connect the API server.
51
-
<8> Specify the timeout duration for establishing connection with the peer.
52
-
<9> Specify the timeout duration to get a response from the peer.
53
-
<10> Specify the frequency to update peer information, such as IP address.
63
+
<4> Specify the timeout duration to check connectivity with each API server. When this duration elapses, the Operator starts remediation. The timeout duration must be more than or equal to 10 milliseconds.
64
+
<5> Specify the frequency to check connectivity with each API server. The timeout duration must be more than or equal to 1 second.
65
+
<6> Specify a threshold value. After reaching this threshold, the node starts contacting its peers. The threshold value must be more than or equal to 1 second.
66
+
<7> Specify the duration of the timeout for the peer to connect the API server. The timeout duration must be more than or equal to 10 milliseconds.
67
+
<8> Specify the duration of the timeout for establishing connection with the peer. The timeout duration must be more than or equal to 10 milliseconds.
68
+
<9> Specify the duration of the timeout to get a response from the peer. The timeout duration must be more than or equal to 10 milliseconds.
69
+
<10> Specify the frequency to update peer information, such as IP address. The timeout duration must be more than or equal to 10 seconds.
0 commit comments