Skip to content

Commit 1e154d2

Browse files
authored
Merge pull request #32328 from tengqm/podoverhead-ga
The PodOverhead feature is GA
2 parents f80cf4d + 0bc8468 commit 1e154d2

File tree

4 files changed

+89
-80
lines changed

4 files changed

+89
-80
lines changed

content/en/docs/concepts/containers/runtime-class.md

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -59,27 +59,30 @@ The RuntimeClass resource currently only has 2 significant fields: the RuntimeCl
5959
(`metadata.name`) and the handler (`handler`). The object definition looks like this:
6060

6161
```yaml
62-
apiVersion: node.k8s.io/v1 # RuntimeClass is defined in the node.k8s.io API group
62+
# RuntimeClass is defined in the node.k8s.io API group
63+
apiVersion: node.k8s.io/v1
6364
kind: RuntimeClass
6465
metadata:
65-
name: myclass # The name the RuntimeClass will be referenced by
66-
# RuntimeClass is a non-namespaced resource
67-
handler: myconfiguration # The name of the corresponding CRI configuration
66+
# The name the RuntimeClass will be referenced by.
67+
# RuntimeClass is a non-namespaced resource.
68+
name: myclass
69+
# The name of the corresponding CRI configuration
70+
handler: myconfiguration
6871
```
6972
7073
The name of a RuntimeClass object must be a valid
7174
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
7275
7376
{{< note >}}
7477
It is recommended that RuntimeClass write operations (create/update/patch/delete) be
75-
restricted to the cluster administrator. This is typically the default. See [Authorization
76-
Overview](/docs/reference/access-authn-authz/authorization/) for more details.
78+
restricted to the cluster administrator. This is typically the default. See
79+
[Authorization Overview](/docs/reference/access-authn-authz/authorization/) for more details.
7780
{{< /note >}}
7881
7982
## Usage
8083
81-
Once RuntimeClasses are configured for the cluster, using them is very simple. Specify a
82-
`runtimeClassName` in the Pod spec. For example:
84+
Once RuntimeClasses are configured for the cluster, you can specify a
85+
`runtimeClassName` in the Pod spec to use it. For example:
8386

8487
```yaml
8588
apiVersion: v1
@@ -113,14 +116,14 @@ Runtime handlers are configured through containerd's configuration at
113116
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.${HANDLER_NAME}]
114117
```
115118

116-
See containerd's config documentation for more details:
117-
https://github.com/containerd/cri/blob/master/docs/config.md
119+
See containerd's [config documentation](https://github.com/containerd/cri/blob/master/docs/config.md)
120+
for more details:
118121

119122
#### {{< glossary_tooltip term_id="cri-o" >}}
120123

121124
Runtime handlers are configured through CRI-O's configuration at `/etc/crio/crio.conf`. Valid
122-
handlers are configured under the [crio.runtime
123-
table](https://github.com/cri-o/cri-o/blob/master/docs/crio.conf.5.md#crioruntime-table):
125+
handlers are configured under the
126+
[crio.runtime table](https://github.com/cri-o/cri-o/blob/master/docs/crio.conf.5.md#crioruntime-table):
124127

125128
```
126129
[crio.runtime.runtimes.${HANDLER_NAME}]
@@ -148,19 +151,17 @@ can add `tolerations` to the RuntimeClass. As with the `nodeSelector`, the toler
148151
with the pod's tolerations in admission, effectively taking the union of the set of nodes tolerated
149152
by each.
150153
151-
To learn more about configuring the node selector and tolerations, see [Assigning Pods to
152-
Nodes](/docs/concepts/scheduling-eviction/assign-pod-node/).
154+
To learn more about configuring the node selector and tolerations, see
155+
[Assigning Pods to Nodes](/docs/concepts/scheduling-eviction/assign-pod-node/).
153156
154157
### Pod Overhead
155158
156-
{{< feature-state for_k8s_version="v1.18" state="beta" >}}
159+
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
157160
158161
You can specify _overhead_ resources that are associated with running a Pod. Declaring overhead allows
159162
the cluster (including the scheduler) to account for it when making decisions about Pods and resources.
160-
To use Pod overhead, you must have the PodOverhead [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
161-
enabled (it is on by default).
162163
163-
Pod overhead is defined in RuntimeClass through the `overhead` fields. Through the use of these fields,
164+
Pod overhead is defined in RuntimeClass through the `overhead` field. Through the use of this field,
164165
you can specify the overhead of running pods utilizing this RuntimeClass and ensure these overheads
165166
are accounted for in Kubernetes.
166167
@@ -170,3 +171,4 @@ are accounted for in Kubernetes.
170171
- [RuntimeClass Scheduling Design](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/585-runtime-class/README.md#runtimeclass-scheduling)
171172
- Read about the [Pod Overhead](/docs/concepts/scheduling-eviction/pod-overhead/) concept
172173
- [PodOverhead Feature Design](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/688-pod-overhead)
174+

content/en/docs/concepts/scheduling-eviction/pod-overhead.md

Lines changed: 48 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -10,17 +10,12 @@ weight: 30
1010

1111
<!-- overview -->
1212

13-
{{< feature-state for_k8s_version="v1.18" state="beta" >}}
14-
13+
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
1514

1615
When you run a Pod on a Node, the Pod itself takes an amount of system resources. These
1716
resources are additional to the resources needed to run the container(s) inside the Pod.
18-
_Pod Overhead_ is a feature for accounting for the resources consumed by the Pod infrastructure
19-
on top of the container requests & limits.
20-
21-
22-
23-
17+
In Kubernetes, _Pod Overhead_ is a way to account for the resources consumed by the Pod
18+
infrastructure on top of the container requests & limits.
2419

2520
<!-- body -->
2621

@@ -29,33 +24,30 @@ In Kubernetes, the Pod's overhead is set at
2924
time according to the overhead associated with the Pod's
3025
[RuntimeClass](/docs/concepts/containers/runtime-class/).
3126

32-
When Pod Overhead is enabled, the overhead is considered in addition to the sum of container
33-
resource requests when scheduling a Pod. Similarly, the kubelet will include the Pod overhead when sizing
34-
the Pod cgroup, and when carrying out Pod eviction ranking.
27+
A pod's overhead is considered in addition to the sum of container resource requests when
28+
scheduling a Pod. Similarly, the kubelet will include the Pod overhead when sizing the Pod cgroup,
29+
and when carrying out Pod eviction ranking.
3530

36-
## Enabling Pod Overhead {#set-up}
31+
## Configuring Pod overhead {#set-up}
3732

38-
You need to make sure that the `PodOverhead`
39-
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled (it is on by default as of 1.18)
40-
across your cluster, and a `RuntimeClass` is utilized which defines the `overhead` field.
33+
You need to make sure a `RuntimeClass` is utilized which defines the `overhead` field.
4134

4235
## Usage example
4336

44-
To use the PodOverhead feature, you need a RuntimeClass that defines the `overhead` field. As
45-
an example, you could use the following RuntimeClass definition with a virtualizing container runtime
46-
that uses around 120MiB per Pod for the virtual machine and the guest OS:
37+
To work with Pod overhead, you need a RuntimeClass that defines the `overhead` field. As
38+
an example, you could use the following RuntimeClass definition with a virtualization container
39+
runtime that uses around 120MiB per Pod for the virtual machine and the guest OS:
4740

4841
```yaml
49-
---
50-
kind: RuntimeClass
5142
apiVersion: node.k8s.io/v1
43+
kind: RuntimeClass
5244
metadata:
53-
name: kata-fc
45+
name: kata-fc
5446
handler: kata-fc
5547
overhead:
56-
podFixed:
57-
memory: "120Mi"
58-
cpu: "250m"
48+
podFixed:
49+
memory: "120Mi"
50+
cpu: "250m"
5951
```
6052
6153
Workloads which are created which specify the `kata-fc` RuntimeClass handler will take the memory and
@@ -92,13 +84,15 @@ updates the workload's PodSpec to include the `overhead` as described in the Run
9284
the Pod will be rejected. In the given example, since only the RuntimeClass name is specified, the admission controller mutates the Pod
9385
to include an `overhead`.
9486

95-
After the RuntimeClass admission controller, you can check the updated PodSpec:
87+
After the RuntimeClass admission controller has made modifications, you can check the updated
88+
Pod overhead value:
9689

9790
```bash
9891
kubectl get pod test-pod -o jsonpath='{.spec.overhead}'
9992
```
10093

10194
The output is:
95+
10296
```
10397
map[cpu:250m memory:120Mi]
10498
```
@@ -110,44 +104,50 @@ When the kube-scheduler is deciding which node should run a new Pod, the schedul
110104
`overhead` as well as the sum of container requests for that Pod. For this example, the scheduler adds the
111105
requests and the overhead, then looks for a node that has 2.25 CPU and 320 MiB of memory available.
112106

113-
Once a Pod is scheduled to a node, the kubelet on that node creates a new {{< glossary_tooltip text="cgroup" term_id="cgroup" >}}
114-
for the Pod. It is within this pod that the underlying container runtime will create containers.
107+
Once a Pod is scheduled to a node, the kubelet on that node creates a new {{< glossary_tooltip
108+
text="cgroup" term_id="cgroup" >}} for the Pod. It is within this pod that the underlying
109+
container runtime will create containers.
115110

116111
If the resource has a limit defined for each container (Guaranteed QoS or Bustrable QoS with limits defined),
117112
the kubelet will set an upper limit for the pod cgroup associated with that resource (cpu.cfs_quota_us for CPU
118113
and memory.limit_in_bytes memory). This upper limit is based on the sum of the container limits plus the `overhead`
119114
defined in the PodSpec.
120115

121-
For CPU, if the Pod is Guaranteed or Burstable QoS, the kubelet will set `cpu.shares` based on the sum of container
122-
requests plus the `overhead` defined in the PodSpec.
116+
For CPU, if the Pod is Guaranteed or Burstable QoS, the kubelet will set `cpu.shares` based on the
117+
sum of container requests plus the `overhead` defined in the PodSpec.
123118

124119
Looking at our example, verify the container requests for the workload:
120+
125121
```bash
126122
kubectl get pod test-pod -o jsonpath='{.spec.containers[*].resources.limits}'
127123
```
128124

129125
The total container requests are 2000m CPU and 200MiB of memory:
126+
130127
```
131128
map[cpu: 500m memory:100Mi] map[cpu:1500m memory:100Mi]
132129
```
133130

134131
Check this against what is observed by the node:
132+
135133
```bash
136134
kubectl describe node | grep test-pod -B2
137135
```
138136

139-
The output shows 2250m CPU and 320MiB of memory are requested, which includes PodOverhead:
137+
The output shows requests for 2250m CPU, and for 320MiB of memory. The requests include Pod overhead:
138+
140139
```
141-
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
142-
--------- ---- ------------ ---------- --------------- ------------- ---
143-
default test-pod 2250m (56%) 2250m (56%) 320Mi (1%) 320Mi (1%) 36m
140+
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
141+
--------- ---- ------------ ---------- --------------- ------------- ---
142+
default test-pod 2250m (56%) 2250m (56%) 320Mi (1%) 320Mi (1%) 36m
144143
```
145144

146145
## Verify Pod cgroup limits
147146

148-
Check the Pod's memory cgroups on the node where the workload is running. In the following example, [`crictl`](https://github.com/kubernetes-sigs/cri-tools/blob/master/docs/crictl.md)
147+
Check the Pod's memory cgroups on the node where the workload is running. In the following example,
148+
[`crictl`](https://github.com/kubernetes-sigs/cri-tools/blob/master/docs/crictl.md)
149149
is used on the node, which provides a CLI for CRI-compatible container runtimes. This is an
150-
advanced example to show PodOverhead behavior, and it is not expected that users should need to check
150+
advanced example to show Pod overhead behavior, and it is not expected that users should need to check
151151
cgroups directly on the node.
152152

153153
First, on the particular node, determine the Pod identifier:
@@ -158,40 +158,42 @@ POD_ID="$(sudo crictl pods --name test-pod -q)"
158158
```
159159

160160
From this, you can determine the cgroup path for the Pod:
161+
161162
```bash
162163
# Run this on the node where the Pod is scheduled
163164
sudo crictl inspectp -o=json $POD_ID | grep cgroupsPath
164165
```
165166

166167
The resulting cgroup path includes the Pod's `pause` container. The Pod level cgroup is one directory above.
168+
167169
```
168-
"cgroupsPath": "/kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2/7ccf55aee35dd16aca4189c952d83487297f3cd760f1bbf09620e206e7d0c27a"
170+
"cgroupsPath": "/kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2/7ccf55aee35dd16aca4189c952d83487297f3cd760f1bbf09620e206e7d0c27a"
169171
```
170172
171-
In this specific case, the pod cgroup path is `kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2`. Verify the Pod level cgroup setting for memory:
173+
In this specific case, the pod cgroup path is `kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2`.
174+
Verify the Pod level cgroup setting for memory:
175+
172176
```bash
173177
# Run this on the node where the Pod is scheduled.
174178
# Also, change the name of the cgroup to match the cgroup allocated for your pod.
175179
cat /sys/fs/cgroup/memory/kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2/memory.limit_in_bytes
176180
```
177181

178182
This is 320 MiB, as expected:
183+
179184
```
180185
335544320
181186
```
182187

183188
### Observability
184189

185-
A `kube_pod_overhead` metric is available in [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics)
186-
to help identify when PodOverhead is being utilized and to help observe stability of workloads
187-
running with a defined Overhead. This functionality is not available in the 1.9 release of
188-
kube-state-metrics, but is expected in a following release. Users will need to build kube-state-metrics
189-
from source in the meantime.
190-
191-
190+
Some `kube_pod_overhead_*` metrics are available in [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics)
191+
to help identify when Pod overhead is being utilized and to help observe stability of workloads
192+
running with a defined overhead.
192193

193194
## {{% heading "whatsnext" %}}
194195

196+
* Learn more about [RuntimeClass](/docs/concepts/containers/runtime-class/)
197+
* Read the [PodOverhead Design](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/688-pod-overhead)
198+
enhancement proposal for extra context
195199

196-
* [RuntimeClass](/docs/concepts/containers/runtime-class/)
197-
* [PodOverhead Design](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/688-pod-overhead)

content/en/docs/reference/access-authn-authz/admission-controllers.md

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -666,6 +666,7 @@ plugins:
666666
{{< /tabs >}}
667667

668668
#### Configuration Annotation Format
669+
669670
`PodNodeSelector` uses the annotation key `scheduler.alpha.kubernetes.io/node-selector` to assign node selectors to namespaces.
670671

671672
```yaml
@@ -678,6 +679,7 @@ metadata:
678679
```
679680

680681
#### Internal Behavior
682+
681683
This admission controller has the following behavior:
682684

683685
1. If the `Namespace` has an annotation with a key `scheduler.alpha.kubernetes.io/node-selector`, use its value as the
@@ -746,27 +748,29 @@ metadata:
746748

747749
### Priority {#priority}
748750

749-
The priority admission controller uses the `priorityClassName` field and populates the integer value of the priority. If the priority class is not found, the Pod is rejected.
751+
The priority admission controller uses the `priorityClassName` field and populates the integer value of the priority.
752+
If the priority class is not found, the Pod is rejected.
750753

751754
### ResourceQuota {#resourcequota}
752755

753756
This admission controller will observe the incoming request and ensure that it does not violate any of the constraints
754757
enumerated in the `ResourceQuota` object in a `Namespace`. If you are using `ResourceQuota`
755758
objects in your Kubernetes deployment, you MUST use this admission controller to enforce quota constraints.
756759

757-
See the [resourceQuota design doc](https://git.k8s.io/community/contributors/design-proposals/resource-management/admission_control_resource_quota.md) and the [example of Resource Quota](/docs/concepts/policy/resource-quotas/) for more details.
760+
See the [resourceQuota design doc](https://git.k8s.io/community/contributors/design-proposals/resource-management/admission_control_resource_quota.md)
761+
and the [example of Resource Quota](/docs/concepts/policy/resource-quotas/) for more details.
758762

759763
### RuntimeClass {#runtimeclass}
760764

761765
{{< feature-state for_k8s_version="v1.20" state="stable" >}}
762766

763-
If you enable the `PodOverhead` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/), and define a RuntimeClass with [Pod overhead](/docs/concepts/scheduling-eviction/pod-overhead/) configured, this admission controller checks incoming
764-
Pods. When enabled, this admission controller rejects any Pod create requests that have the overhead already set.
765-
For Pods that have a RuntimeClass is configured and selected in their `.spec`, this admission controller sets `.spec.overhead` in the Pod based on the value defined in the corresponding RuntimeClass.
766-
767-
{{< note >}}
768-
The `.spec.overhead` field for Pod and the `.overhead` field for RuntimeClass are both in beta. If you do not enable the `PodOverhead` feature gate, all Pods are treated as if `.spec.overhead` is unset.
769-
{{< /note >}}
767+
If you define a RuntimeClass with [Pod overhead](/docs/concepts/scheduling-eviction/pod-overhead/)
768+
configured, this admission controller checks incoming Pods.
769+
When enabled, this admission controller rejects any Pod create requests
770+
that have the overhead already set.
771+
For Pods that have a RuntimeClass configured and selected in their `.spec`,
772+
this admission controller sets `.spec.overhead` in the Pod based on the value
773+
defined in the corresponding RuntimeClass.
770774

771775
See also [Pod Overhead](/docs/concepts/scheduling-eviction/pod-overhead/)
772776
for more information.
@@ -823,11 +827,11 @@ If you disable the ValidatingAdmissionWebhook, you must also disable the
823827
group/version via the `--runtime-config` flag (both are on by default in
824828
versions 1.9 and later).
825829

826-
827830
## Is there a recommended set of admission controllers to use?
828831

829-
Yes. The recommended admission controllers are enabled by default (shown [here](/docs/reference/command-line-tools-reference/kube-apiserver/#options)), so you do not need to explicitly specify them. You can enable additional admission controllers beyond the default set using the `--enable-admission-plugins` flag (**order doesn't matter**).
832+
Yes. The recommended admission controllers are enabled by default
833+
(shown [here](/docs/reference/command-line-tools-reference/kube-apiserver/#options)),
834+
so you do not need to explicitly specify them.
835+
You can enable additional admission controllers beyond the default set using the
836+
`--enable-admission-plugins` flag (**order doesn't matter**).
830837

831-
{{< note >}}
832-
`--admission-control` was deprecated in 1.10 and replaced with `--enable-admission-plugins`.
833-
{{< /note >}}

content/en/docs/reference/command-line-tools-reference/feature-gates.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -162,8 +162,6 @@ different Kubernetes components.
162162
| `PodAndContainerStatsFromCRI` | `false` | Alpha | 1.23 | |
163163
| `PodDeletionCost` | `false` | Alpha | 1.21 | 1.21 |
164164
| `PodDeletionCost` | `true` | Beta | 1.22 | |
165-
| `PodOverhead` | `false` | Alpha | 1.16 | 1.17 |
166-
| `PodOverhead` | `true` | Beta | 1.18 | |
167165
| `PodSecurity` | `false` | Alpha | 1.22 | 1.22 |
168166
| `PodSecurity` | `true` | Beta | 1.23 | |
169167
| `ProbeTerminationGracePeriod` | `false` | Alpha | 1.21 | 1.21 |
@@ -410,6 +408,9 @@ different Kubernetes components.
410408
| `PodDisruptionBudget` | `false` | Alpha | 1.3 | 1.4 |
411409
| `PodDisruptionBudget` | `true` | Beta | 1.5 | 1.20 |
412410
| `PodDisruptionBudget` | `true` | GA | 1.21 | - |
411+
| `PodOverhead` | `false` | Alpha | 1.16 | 1.17 |
412+
| `PodOverhead` | `true` | Beta | 1.18 | 1.23 |
413+
| `PodOverhead` | `true` | GA | 1.24 | - |
413414
| `PodPriority` | `false` | Alpha | 1.8 | 1.10 |
414415
| `PodPriority` | `true` | Beta | 1.11 | 1.13 |
415416
| `PodPriority` | `true` | GA | 1.14 | - |

0 commit comments

Comments
 (0)