You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: modules/nodes-edge-remote-workers-strategies.adoc
+11-8Lines changed: 11 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -112,15 +112,13 @@ spec:
112
112
<2> Specify the frequency that the kubelet checks the status of a node associated with this `MachineConfig` object. The default value is `10s`. If you change this default, the `node-status-report-frequency` value is changed to the same value.
113
113
<3> Specify the frequency that the kubelet reports the status of a node associated with this `MachineConfig` object. The default value is `1m`.
114
114
115
-
The `node-status-update-frequency` parameter works with the `node-monitor-grace-period` and `pod-eviction-timeout` parameters.
115
+
The `node-status-update-frequency` parameter works with the `node-monitor-grace-period` parameter.
116
116
117
117
* The `node-monitor-grace-period` parameter specifies how long {product-title} waits after a node associated with a `MachineConfig` object is marked `Unhealthy` if the controller manager does not receive the node heartbeat. Workloads on the node continue to run after this time. If the remote worker node rejoins the cluster after `node-monitor-grace-period` expires, pods continue to run. New pods can be scheduled to that node. The `node-monitor-grace-period` interval is `40s`. The `node-status-update-frequency` value must be lower than the `node-monitor-grace-period` value.
118
118
119
-
* The `pod-eviction-timeout` parameter specifies the amount of time {product-title} waits after marking a node that is associated with a `MachineConfig` object as `Unreachable` to start marking pods for eviction. Evicted pods are rescheduled on other nodes. If the remote worker node rejoins the cluster after `pod-eviction-timeout` expires, the pods running on the remote worker node are terminated because the node controller has evicted the pods on-premise. Pods can then be rescheduled to that node. The `pod-eviction-timeout` interval is `5m0s`.
120
-
121
119
[NOTE]
122
120
====
123
-
Modifying the `node-monitor-grace-period` and `pod-eviction-timeout` parameters is not supported.
121
+
Modifying the `node-monitor-grace-period` parameter is not supported.
124
122
====
125
123
126
124
--
@@ -133,7 +131,12 @@ A taint with the `NoExecute` effect affects pods that are running on the node in
133
131
134
132
* Pods that do not tolerate the taint are queued for eviction.
135
133
* Pods that tolerate the taint without specifying a `tolerationSeconds` value in their toleration specification remain bound forever.
136
-
* Pods that tolerate the taint with a specified `tolerationSeconds` value remain bound for the specified amount of time. After the time elapses, the pods are queued for eviction.
134
+
* Pods that tolerate the taint with a specified `tolerationSeconds` value remain bound for the specified amount of time. After the time elapses, the pods are queued for eviction.
135
+
136
+
[NOTE]
137
+
====
138
+
Unless tolerations are explicitly set, Kubernetes automatically adds a toleration for `node.kubernetes.io/not-ready` and `node.kubernetes.io/unreachable` with `tolerationSeconds=300`, meaning that pods remain bound for 5 minutes if either of these taints is detected.
139
+
====
137
140
138
141
You can delay or avoid pod eviction by configuring pods tolerations with the `NoExecute` effect for the `node.kubernetes.io/unreachable` and `node.kubernetes.io/not-ready` taints.
139
142
@@ -148,14 +151,14 @@ tolerations:
148
151
- key: "node.kubernetes.io/not-ready"
149
152
operator: "Exists"
150
153
effect: "NoExecute" <2>
151
-
tolerationSeconds: 600
154
+
tolerationSeconds: 600 <3>
152
155
...
153
156
----
154
157
<1> The `NoExecute` effect without `tolerationSeconds` lets pods remain forever if the control plane cannot reach the node.
155
158
<2> The `NoExecute` effect with `tolerationSeconds`: 600 lets pods remain for 10 minutes if the control plane marks the node as `Unhealthy`.
159
+
<3> You can specify your own `tolerationSeconds` value.
156
160
157
-
{product-title} uses the `tolerationSeconds` value after the `pod-eviction-timeout` value elapses.
You can use replica sets, deployments, and replication controllers. The scheduler can reschedule these pods onto other nodes after the node is disconnected for five minutes. Rescheduling onto other nodes can be beneficial for some workloads, such as REST APIs, where an administrator can guarantee a specific number of pods are running and accessible.
Copy file name to clipboardExpand all lines: operators/operator_sdk/osdk-leader-election.adoc
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ During the lifecycle of an Operator, it is possible that there may be more than
10
10
11
11
There are two different leader election implementations to choose from, each with its own trade-off:
12
12
13
-
Leader-for-life:: The leader pod only gives up leadership, using garbage collection, when it is deleted. This implementation precludes the possibility of two instances mistakenly running as leaders, a state also known as split brain. However, this method can be subject to a delay in electing a new leader. For example, when the leader pod is on an unresponsive or partitioned node, the link:https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/#options[`pod-eviction-timeout`] dictates long how it takes for the leader pod to be deleted from the node and step down, with a default of `5m`. See the link:https://godoc.org/github.com/operator-framework/operator-sdk/pkg/leader[Leader-for-life] Go documentation for more.
13
+
Leader-for-life:: The leader pod only gives up leadership, using garbage collection, when it is deleted. This implementation precludes the possibility of two instances mistakenly running as leaders, a state also known as split brain. However, this method can be subject to a delay in electing a new leader. For example, when the leader pod is on an unresponsive or partitioned node, you can specify `node.kubernetes.io/unreachable` and `node.kubernetes.io/not-ready` tolerations on the leader pod and use the `tolerationSeconds` value to dictate how long it takes for the leader pod to be deleted from the node and step down. These tolerations are added to the pod by default on admission with a `tolerationSeconds` value of 5 minutes. See the link:https://godoc.org/github.com/operator-framework/operator-sdk/pkg/leader[Leader-for-life] Go documentation for more.
14
14
15
15
Leader-with-lease:: The leader pod periodically renews the leader lease and gives up leadership when it cannot renew the lease. This implementation allows for a faster transition to a new leader when the existing leader is isolated, but there is a possibility of split brain in link:https://github.com/kubernetes/client-go/blob/30b06a83d67458700a5378239df6b96948cb9160/tools/leaderelection/leaderelection.go#L21-L24[certain situations]. See the link:https://godoc.org/github.com/kubernetes-sigs/controller-runtime/pkg/leaderelection[Leader-with-lease] Go documentation for more.
0 commit comments