You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you use remote worker nodes, consider which objects to use to run your applications.
8
+
If you use remote worker nodes, consider which objects to use to run your applications.
9
9
10
10
It is recommend to use daemon sets or static pods based on the behavior you want in the event of network issues or power loss. In addition, you can use Kubernetes zones and tolerations to control or avoid pod evictions if the control plane cannot reach remote worker nodes.
Daemon sets are the best approach to managing pods on remote worker nodes for the following reasons:
15
15
--
16
-
* Daemon sets do not typically need rescheduling behavior. If a node disconnects from the cluster, pods on the node can continue to run. {product-title} does not change the state of daemon set pods, and leaves the pods in the state they last reported. For example, if a daemon set pod is in the `Running` state, when a node stops communicating, the pod keeps running and is assumed to be running by {product-title}.
16
+
* Daemon sets do not typically need rescheduling behavior. If a node disconnects from the cluster, pods on the node can continue to run. {product-title} does not change the state of daemon set pods, and leaves the pods in the state they last reported. For example, if a daemon set pod is in the `Running` state, when a node stops communicating, the pod keeps running and is assumed to be running by {product-title}.
17
17
18
18
* Daemon set pods, by default, are created with `NoExecute` tolerations for the `node.kubernetes.io/unreachable` and `node.kubernetes.io/not-ready` taints with no `tolerationSeconds` value. These default values ensure that daemon set pods are never evicted if the control plane cannot reach a node. For example:
19
19
+
@@ -41,13 +41,13 @@ Daemon sets are the best approach to managing pods on remote worker nodes for th
41
41
effect: NoSchedule
42
42
----
43
43
44
-
* Daemon sets can use labels to ensure that a workload runs on a matching worker node.
44
+
* Daemon sets can use labels to ensure that a workload runs on a matching worker node.
45
45
46
-
* You can use an {product-title} service endpoint to load balance daemon set pods.
46
+
* You can use an {product-title} service endpoint to load balance daemon set pods.
47
47
48
48
[NOTE]
49
49
====
50
-
Daemon sets do not schedule pods after a reboot of the node if {product-title} cannot reach the node.
50
+
Daemon sets do not schedule pods after a reboot of the node if {product-title} cannot reach the node.
51
51
====
52
52
--
53
53
@@ -57,7 +57,7 @@ If you want pods restart if a node reboots, after a power loss for example, cons
@@ -116,7 +116,7 @@ The `node-status-update-frequency` parameter works with the `node-monitor-grace-
116
116
117
117
* The `node-monitor-grace-period` parameter specifies how long {product-title} waits after a node associated with a `MachineConfig` object is marked `Unhealthy` if the controller manager does not receive the node heartbeat. Workloads on the node continue to run after this time. If the remote worker node rejoins the cluster after `node-monitor-grace-period` expires, pods continue to run. New pods can be scheduled to that node. The `node-monitor-grace-period` interval is `40s`. The `node-status-update-frequency` value must be lower than the `node-monitor-grace-period` value.
118
118
119
-
* The `pod-eviction-timeout` parameter specifies the amount of time {product-title} waits after marking a node that is associated with a `MachineConfig` object as `Unreachable` to start marking pods for eviction. Evicted pods are rescheduled on other nodes. If the remote worker node rejoins the cluster after `pod-eviction-timeout` expires, the pods running on the remote worker node are terminated because the node controller has evicted the pods on-premise. Pods can then be rescheduled to that node. The `pod-eviction-timeout` interval is `5m0s`.
119
+
* The `pod-eviction-timeout` parameter specifies the amount of time {product-title} waits after marking a node that is associated with a `MachineConfig` object as `Unreachable` to start marking pods for eviction. Evicted pods are rescheduled on other nodes. If the remote worker node rejoins the cluster after `pod-eviction-timeout` expires, the pods running on the remote worker node are terminated because the node controller has evicted the pods on-premise. Pods can then be rescheduled to that node. The `pod-eviction-timeout` interval is `5m0s`.
120
120
121
121
[NOTE]
122
122
====
@@ -127,7 +127,7 @@ Modifying the `node-monitor-grace-period` and `pod-eviction-timeout` parameters
You can use pod tolerations to mitigate the effects if the on-premise node controller adds a `node.kubernetes.io/unreachable` taint with a `NoExecute` effect to a node it cannot reach.
130
+
You can use pod tolerations to mitigate the effects if the on-premise node controller adds a `node.kubernetes.io/unreachable` taint with a `NoExecute` effect to a node it cannot reach.
131
131
132
132
A taint with the `NoExecute` effect affects pods that are running on the node in the following ways:
133
133
@@ -145,17 +145,16 @@ tolerations:
145
145
- key: "node.kubernetes.io/unreachable"
146
146
operator: "Exists"
147
147
effect: "NoExecute" <1>
148
-
tolerationSeconds: 0
149
148
- key: "node.kubernetes.io/not-ready"
150
149
operator: "Exists"
151
150
effect: "NoExecute" <2>
152
-
tolerationSeconds: 0
151
+
tolerationSeconds: 600
153
152
...
154
153
----
155
-
<1> The `NoExecute` effect with `tolerationSeconds`: 0 allows pods to remain if the control plane cannot reach the node.
156
-
<2> The `NoExecute` effect with `tolerationSeconds`: 0 allows pods to remain if the control plane marks the node as `Unhealthy`.
154
+
<1> The `NoExecute` effect without `tolerationSeconds` lets pods remain forever if the control plane cannot reach the node.
155
+
<2> The `NoExecute` effect with `tolerationSeconds`: 600 lets pods remain for 10 minutes if the control plane marks the node as `Unhealthy`.
157
156
158
-
{product-title} uses the `tolerationSeconds` value after the `pod-eviction-timeout` value elapses.
157
+
{product-title} uses the `tolerationSeconds` value after the `pod-eviction-timeout` value elapses.
159
158
160
159
Other types of {product-title} objects::
161
160
You can use replica sets, deployments, and replication controllers. The scheduler can reschedule these pods onto other nodes after the node is disconnected for five minutes. Rescheduling onto other nodes can be beneficial for some workloads, such as REST APIs, where an administrator can guarantee a specific number of pods are running and accessible.
@@ -166,8 +165,6 @@ When working with remote worker nodes, rescheduling pods on different nodes migh
https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/[stateful sets] do not get restarted when there is an outage. The pods remain in the `terminating` state until the control plane can acknowledge that the pods are terminated.
168
+
https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/[stateful sets] do not get restarted when there is an outage. The pods remain in the `terminating` state until the control plane can acknowledge that the pods are terminated.
170
169
171
170
To avoid scheduling a to a node that does not have access to the same type of persistent storage, {product-title} cannot migrate pods that require persistent volumes to other zones in the case of network separation.
0 commit comments