Skip to content

Commit d0b3ba5

Browse files
author
Grigoris Thanasoulas
committed
Update DaemonSet guide
Rewrite "How Daemon Pods are scheduled" section of the DaemonSet guide to align with the current state and be more clear. Signed-off-by: Grigoris Thanasoulas <[email protected]>
1 parent 237fdab commit d0b3ba5

File tree

1 file changed

+52
-43
lines changed

1 file changed

+52
-43
lines changed

content/en/docs/concepts/workloads/controllers/daemonset.md

Lines changed: 52 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -105,30 +105,24 @@ If you do not specify either, then the DaemonSet controller will create Pods on
105105

106106
## How Daemon Pods are scheduled
107107

108-
### Scheduled by default scheduler
109-
110-
{{< feature-state for_k8s_version="1.17" state="stable" >}}
111-
112-
A DaemonSet ensures that all eligible nodes run a copy of a Pod. Normally, the
113-
node that a Pod runs on is selected by the Kubernetes scheduler. However,
114-
DaemonSet pods are created and scheduled by the DaemonSet controller instead.
115-
That introduces the following issues:
116-
117-
* Inconsistent Pod behavior: Normal Pods waiting to be scheduled are created
118-
and in `Pending` state, but DaemonSet pods are not created in `Pending`
119-
state. This is confusing to the user.
120-
* [Pod preemption](/docs/concepts/scheduling-eviction/pod-priority-preemption/)
121-
is handled by default scheduler. When preemption is enabled, the DaemonSet controller
122-
will make scheduling decisions without considering pod priority and preemption.
123-
124-
`ScheduleDaemonSetPods` allows you to schedule DaemonSets using the default
125-
scheduler instead of the DaemonSet controller, by adding the `NodeAffinity` term
126-
to the DaemonSet pods, instead of the `.spec.nodeName` term. The default
127-
scheduler is then used to bind the pod to the target host. If node affinity of
128-
the DaemonSet pod already exists, it is replaced (the original node affinity was
129-
taken into account before selecting the target host). The DaemonSet controller only
130-
performs these operations when creating or modifying DaemonSet pods, and no
131-
changes are made to the `spec.template` of the DaemonSet.
108+
A DaemonSet ensures that all eligible nodes run a copy of a Pod. The DaemonSet
109+
controller creates a Pod for each eligible node and adds the
110+
`spec.affinity.nodeAffinity` field of the Pod to match the target host. After
111+
the Pod is created, the default scheduler typically takes over and then binds
112+
the Pod to the target host by setting the `.spec.nodeName` field. If the new
113+
Pod cannot fit on the node, the default scheduler may preempt (evict) some of
114+
the existing Pods based on the
115+
[priority](/docs/concepts/scheduling-eviction/pod-priority-preemption/#pod-priority)
116+
of the new Pod.
117+
118+
The user can specify a different scheduler for the Pods of the DamonSet, by
119+
setting the `.spec.template.spec.schedulerName` field of the DaemonSet.
120+
121+
The original node affinity specified at the
122+
`.spec.template.spec.affinity.nodeAffinity` field (if specified) is taken into
123+
consideration by the DaemonSet controller when evaluating the eligible nodes,
124+
but is replaced on the created Pod with the node affinity that matches the name
125+
of the eligible node.
132126

133127
```yaml
134128
nodeAffinity:
@@ -141,25 +135,40 @@ nodeAffinity:
141135
- target-host-name
142136
```
143137
144-
In addition, `node.kubernetes.io/unschedulable:NoSchedule` toleration is added
145-
automatically to DaemonSet Pods. The default scheduler ignores
146-
`unschedulable` Nodes when scheduling DaemonSet Pods.
147-
148-
### Taints and Tolerations
149-
150-
Although Daemon Pods respect
151-
[taints and tolerations](/docs/concepts/scheduling-eviction/taint-and-toleration/),
152-
the following tolerations are added to DaemonSet Pods automatically according to
153-
the related features.
154-
155-
| Toleration Key | Effect | Version | Description |
156-
| ---------------------------------------- | ---------- | ------- | ----------- |
157-
| `node.kubernetes.io/not-ready` | NoExecute | 1.13+ | DaemonSet pods will not be evicted when there are node problems such as a network partition. |
158-
| `node.kubernetes.io/unreachable` | NoExecute | 1.13+ | DaemonSet pods will not be evicted when there are node problems such as a network partition. |
159-
| `node.kubernetes.io/disk-pressure` | NoSchedule | 1.8+ | DaemonSet pods tolerate disk-pressure attributes by default scheduler. |
160-
| `node.kubernetes.io/memory-pressure` | NoSchedule | 1.8+ | DaemonSet pods tolerate memory-pressure attributes by default scheduler. |
161-
| `node.kubernetes.io/unschedulable` | NoSchedule | 1.12+ | DaemonSet pods tolerate unschedulable attributes by default scheduler. |
162-
| `node.kubernetes.io/network-unavailable` | NoSchedule | 1.12+ | DaemonSet pods, who uses host network, tolerate network-unavailable attributes by default scheduler. |
138+
139+
### Taints and tolerations
140+
141+
The DaemonSet controller automatically adds a set of {{< glossary_tooltip
142+
text="tolerations" term_id="toleration" >}} to DaemonSet Pods:
143+
144+
{{< table caption="Tolerations for DaemonSet pods" >}}
145+
146+
| Toleration key | Effect | Details |
147+
| --------------------------------------------------------------------------------------------------------------------- | ------------ | --------------------------------------------------------------------------------------------------------------------------------------------- |
148+
| [`node.kubernetes.io/not-ready`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-not-ready) | `NoExecute` | DaemonSet Pods can be scheduled onto nodes that are not healthy or ready to accept Pods. Any DaemonSet Pods running on such nodes will not be evicted. |
149+
| [`node.kubernetes.io/unreachable`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-unreachable) | `NoExecute` | DaemonSet Pods can be scheduled onto nodes that are unreachable from the node controller. Any DaemonSet Pods running on such nodes will not be evicted. |
150+
| [`node.kubernetes.io/disk-pressure`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-disk-pressure) | `NoSchedule` | DaemonSet Pods can be scheduled onto nodes with disk pressure issues. |
151+
| [`node.kubernetes.io/memory-pressure`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-memory-pressure) | `NoSchedule` | DaemonSet Pods can be scheduled onto nodes with memory pressure issues. |
152+
| [`node.kubernetes.io/pid-pressure`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-pid-pressure) | `NoSchedule` | DaemonSet Pods can be scheduled onto nodes with process pressure issues. |
153+
| [`node.kubernetes.io/unschedulable`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-unschedulable) | `NoSchedule` | DaemonSet Pods can be scheduled onto nodes that are unschedulable. |
154+
| [`node.kubernetes.io/network-unavailable`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-network-unavailable) | `NoSchedule` | **Only added for DaemonSet Pods that request host networking**, i.e., Pods having `spec.hostNetwork: true`. Such DaemonSet Pods can be scheduled onto nodes with unavailable network.|
155+
156+
{{< /table >}}
157+
158+
You can add your own tolerations to the Pods of a Daemonset as well, by
159+
defining these in the Pod template of the DaemonSet.
160+
161+
Because the DaemonSet controller sets the
162+
`node.kubernetes.io/unschedulable:NoSchedule` toleration automatically,
163+
Kubernetes can run DaemonSet Pods on nodes that are marked as _unschedulable_.
164+
165+
If you use a DaemonSet to provide an important node-level function, such as
166+
[cluster networking](/docs/concepts/cluster-administration/networking/), it is
167+
helpful that Kubernetes places DaemonSet Pods on nodes before they are ready.
168+
For example, without that special toleration, you could end up in a deadlock
169+
situation where the node is not marked as ready because the network plugin is
170+
not running there, and at the same time the network plugin is not running on
171+
that node because the node is not yet ready.
163172

164173
## Communicating with Daemon Pods
165174

0 commit comments

Comments
 (0)