@@ -17,33 +17,27 @@ itself. Unless resources are set aside for these system daemons, pods and system
1717daemons compete for resources and lead to resource starvation issues on the
1818node.
1919
20- The ` kubelet ` exposes a feature named ` Node Allocatable ` that helps to reserve
20+ The ` kubelet ` exposes a feature named ' Node Allocatable' that helps to reserve
2121compute resources for system daemons. Kubernetes recommends cluster
22- administrators to configure ` Node Allocatable ` based on their workload density
22+ administrators to configure ' Node Allocatable' based on their workload density
2323on each node.
2424
25-
26-
27-
2825## {{% heading "prerequisites" %}}
2926
30-
3127{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
3228Your Kubernetes server must be at or later than version 1.17 to use
3329the kubelet command line option ` --reserved-cpus ` to set an
3430[ explicitly reserved CPU list] ( #explicitly-reserved-cpu-list ) .
3531
36-
37-
3832<!-- steps -->
3933
4034## Node Allocatable
4135
4236![ node capacity] ( /images/docs/node-capacity.svg )
4337
44- ` Allocatable ` on a Kubernetes node is defined as the amount of compute resources
38+ ' Allocatable' on a Kubernetes node is defined as the amount of compute resources
4539that are available for pods. The scheduler does not over-subscribe
46- ` Allocatable ` . ` CPU ` , ` memory ` and ` ephemeral-storage ` are supported as of now.
40+ ' Allocatable'. ' CPU', ' memory' and ' ephemeral-storage' are supported as of now.
4741
4842Node Allocatable is exposed as part of ` v1.Node ` object in the API and as part
4943of ` kubectl describe node ` in the CLI.
9791It is recommended that the kubernetes system daemons are placed under a top
9892level control group (` runtime.slice ` on systemd machines for example). Each
9993system daemon should ideally run within its own child control group. Refer to
100- [ this
101- doc] ( https://git.k8s.io/community/contributors/design-proposals/node/node-allocatable.md#recommended-cgroups-setup )
94+ [ the design proposal] ( https://git.k8s.io/community/contributors/design-proposals/node/node-allocatable.md#recommended-cgroups-setup )
10295for more details on recommended control group hierarchy.
10396
10497Note that Kubelet ** does not** create ` --kube-reserved-cgroup ` if it doesn't
@@ -109,7 +102,6 @@ exist. Kubelet will fail if an invalid cgroup is specified.
109102- ** Kubelet Flag** : ` --system-reserved=[cpu=100m][,][memory=100Mi][,][ephemeral-storage=1Gi][,][pid=1000] `
110103- ** Kubelet Flag** : ` --system-reserved-cgroup= `
111104
112-
113105` system-reserved ` is meant to capture resource reservation for OS system daemons
114106like ` sshd ` , ` udev ` , etc. ` system-reserved ` should reserve ` memory ` for the
115107` kernel ` too since ` kernel ` memory is not accounted to pods in Kubernetes at this time.
@@ -127,13 +119,14 @@ kubelet flag.
127119It is recommended that the OS system daemons are placed under a top level
128120control group (` system.slice ` on systemd machines for example).
129121
130- Note that Kubelet ** does not** create ` --system-reserved-cgroup ` if it doesn't
131- exist. Kubelet will fail if an invalid cgroup is specified.
122+ Note that ` kubelet ` ** does not** create ` --system-reserved-cgroup ` if it doesn't
123+ exist. ` kubelet ` will fail if an invalid cgroup is specified.
132124
133125### Explicitly Reserved CPU List
126+
134127{{< feature-state for_k8s_version="v1.17" state="stable" >}}
135128
136- - ** Kubelet Flag** : ` --reserved-cpus=0-3 `
129+ ** Kubelet Flag** : ` --reserved-cpus=0-3 `
137130
138131` reserved-cpus ` is meant to define an explicit CPU set for OS system daemons and
139132kubernetes system daemons. ` reserved-cpus ` is for systems that do not intend to
@@ -154,32 +147,33 @@ For example: in Centos, you can do this using the tuned toolset.
154147
155148### Eviction Thresholds
156149
157- - ** Kubelet Flag** : ` --eviction-hard=[memory.available<500Mi] `
150+ ** Kubelet Flag** : ` --eviction-hard=[memory.available<500Mi] `
158151
159152Memory pressure at the node level leads to System OOMs which affects the entire
160153node and all pods running on it. Nodes can go offline temporarily until memory
161154has been reclaimed. To avoid (or reduce the probability of) system OOMs kubelet
162- provides [ ` Out of Resource ` ] ( /docs/tasks/administer-cluster/out-of-resource/ ) management. Evictions are
155+ provides [ out of resource] ( /docs/concepts/scheduling-eviction/node-pressure-eviction/ )
156+ management. Evictions are
163157supported for ` memory ` and ` ephemeral-storage ` only. By reserving some memory via
164- ` --eviction-hard ` flag, the ` kubelet ` attempts to ` evict ` pods whenever memory
158+ ` --eviction-hard ` flag, the ` kubelet ` attempts to evict pods whenever memory
165159availability on the node drops below the reserved value. Hypothetically, if
166160system daemons did not exist on a node, pods cannot use more than `capacity -
167161eviction-hard`. For this reason, resources reserved for evictions are not
168162available for pods.
169163
170164### Enforcing Node Allocatable
171165
172- - ** Kubelet Flag** : ` --enforce-node-allocatable=pods[,][system-reserved][,][kube-reserved] `
166+ ** Kubelet Flag** : ` --enforce-node-allocatable=pods[,][system-reserved][,][kube-reserved] `
173167
174- The scheduler treats ` Allocatable ` as the available ` capacity ` for pods.
168+ The scheduler treats ' Allocatable' as the available ` capacity ` for pods.
175169
176- ` kubelet ` enforce ` Allocatable ` across pods by default. Enforcement is performed
170+ ` kubelet ` enforce ' Allocatable' across pods by default. Enforcement is performed
177171by evicting pods whenever the overall usage across all pods exceeds
178- ` Allocatable ` . More details on eviction policy can be found
179- [ here] ( /docs/tasks/administer-cluster/out-of-resource/#eviction-policy ) . This enforcement is controlled by
172+ 'Allocatable'. More details on eviction policy can be found
173+ on the [ node pressure eviction] ( /docs/concepts/scheduling-eviction/node-pressure-eviction/ )
174+ page. This enforcement is controlled by
180175specifying ` pods ` value to the kubelet flag ` --enforce-node-allocatable ` .
181176
182-
183177Optionally, ` kubelet ` can be made to enforce ` kube-reserved ` and
184178` system-reserved ` by specifying ` kube-reserved ` & ` system-reserved ` values in
185179the same flag. Note that to enforce ` kube-reserved ` or ` system-reserved ` ,
@@ -188,10 +182,10 @@ respectively.
188182
189183## General Guidelines
190184
191- System daemons are expected to be treated similar to ` Guaranteed ` pods. System
185+ System daemons are expected to be treated similar to ' Guaranteed' pods. System
192186daemons can burst within their bounding control groups and this behavior needs
193187to be managed as part of kubernetes deployments. For example, ` kubelet ` should
194- have its own control group and share ` Kube -reserved` resources with the
188+ have its own control group and share ` kube -reserved` resources with the
195189container runtime. However, Kubelet cannot burst and use up all available Node
196190resources if ` kube-reserved ` is enforced.
197191
@@ -200,9 +194,9 @@ to critical system services being CPU starved, OOM killed, or unable
200194to fork on the node. The
201195recommendation is to enforce ` system-reserved ` only if a user has profiled their
202196nodes exhaustively to come up with precise estimates and is confident in their
203- ability to recover if any process in that group is oom_killed .
197+ ability to recover if any process in that group is oom-killed .
204198
205- * To begin with enforce ` Allocatable ` on ` pods ` .
199+ * To begin with enforce ' Allocatable' on ` pods ` .
206200* Once adequate monitoring and alerting is in place to track kube system
207201 daemons, attempt to enforce ` kube-reserved ` based on usage heuristics.
208202* If absolutely necessary, enforce ` system-reserved ` over time.
@@ -212,8 +206,6 @@ more features are added. Over time, kubernetes project will attempt to bring
212206down utilization of node system daemons, but that is not a priority as of now.
213207So expect a drop in ` Allocatable ` capacity in future releases.
214208
215-
216-
217209<!-- discussion -->
218210
219211## Example Scenario
@@ -225,15 +217,15 @@ Here is an example to illustrate Node Allocatable computation:
225217* ` --system-reserved ` is set to ` cpu=500m,memory=1Gi,ephemeral-storage=1Gi `
226218* ` --eviction-hard ` is set to ` memory.available<500Mi,nodefs.available<10% `
227219
228- Under this scenario, ` Allocatable ` will be ` 14.5 CPUs ` , ` 28.5Gi ` of memory and
220+ Under this scenario, ' Allocatable' will be 14.5 CPUs, 28.5Gi of memory and
229221` 88Gi ` of local storage.
230222Scheduler ensures that the total memory ` requests ` across all pods on this node does
231- not exceed ` 28.5Gi ` and storage doesn't exceed ` 88Gi ` .
232- Kubelet evicts pods whenever the overall memory usage across pods exceeds ` 28.5Gi ` ,
233- or if overall disk usage exceeds ` 88Gi ` If all processes on the node consume as
234- much CPU as they can, pods together cannot consume more than ` 14.5 CPUs ` .
223+ not exceed 28.5Gi and storage doesn't exceed 88Gi.
224+ Kubelet evicts pods whenever the overall memory usage across pods exceeds 28.5Gi,
225+ or if overall disk usage exceeds 88Gi If all processes on the node consume as
226+ much CPU as they can, pods together cannot consume more than 14.5 CPUs.
235227
236228If ` kube-reserved ` and/or ` system-reserved ` is not enforced and system daemons
237229exceed their reservation, ` kubelet ` evicts pods whenever the overall node memory
238- usage is higher than ` 31.5Gi ` or ` storage ` is greater than ` 90Gi `
230+ usage is higher than 31.5Gi or ` storage ` is greater than 90Gi.
239231
0 commit comments