Skip to content

Commit f3771f8

Browse files
authored
Merge pull request #39846 from vinaykul/kep-1287-1.27-blog
In-place pod resize feature blog
2 parents 2ec7645 + a370442 commit f3771f8

File tree

1 file changed

+200
-0
lines changed
  • content/en/blog/_posts/2023-05-12-in-place-pod-resize

1 file changed

+200
-0
lines changed
Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
---
2+
layout: blog
3+
title: "Kubernetes 1.27: In-place Resource Resize for Kubernetes Pods (alpha)"
4+
date: 2023-05-12
5+
slug: in-place-pod-resize-alpha
6+
---
7+
8+
**Author:** Vinay Kulkarni (Kubescaler Labs)
9+
10+
If you have deployed Kubernetes pods with CPU and/or memory resources
11+
specified, you may have noticed that changing the resource values involves
12+
restarting the pod. This has been a disruptive operation for running
13+
workloads... until now.
14+
15+
In Kubernetes v1.27, we have added a new alpha feature that allows users
16+
to resize CPU/memory resources allocated to pods without restarting the
17+
containers. To facilitate this, the `resources` field in a pod's containers
18+
now allow mutation for `cpu` and `memory` resources. They can be changed
19+
simply by patching the running pod spec.
20+
21+
This also means that `resources` field in the pod spec can no longer be
22+
relied upon as an indicator of the pod's actual resources. Monitoring tools
23+
and other such applications must now look at new fields in the pod's status.
24+
Kubernetes queries the actual CPU and memory requests and limits enforced on
25+
the running containers via a CRI (Container Runtime Interface) API call to the
26+
runtime, such as containerd, which is responsible for running the containers.
27+
The response from container runtime is reflected in the pod's status.
28+
29+
In addition, a new `restartPolicy` for resize has been added. It gives users
30+
control over how their containers are handled when resources are resized.
31+
32+
33+
## What's new in v1.27?
34+
35+
Besides the addition of resize policy in the pod's spec, a new field named
36+
`allocatedResources` has been added to `containerStatuses` in the pod's status.
37+
This field reflects the node resources allocated to the pod's containers.
38+
39+
In addition, a new field called `resources` has been added to the container's
40+
status. This field reflects the actual resource requests and limits configured
41+
on the running containers as reported by the container runtime.
42+
43+
Lastly, a new field named `resize` has been added to the pod's status to show the
44+
status of the last requested resize. A value of `Proposed` is an acknowledgement
45+
of the requested resize and indicates that request was validated and recorded. A
46+
value of `InProgress` indicates that the node has accepted the resize request
47+
and is in the process of applying the resize request to the pod's containers.
48+
A value of `Deferred` means that the requested resize cannot be granted at this
49+
time, and the node will keep retrying. The resize may be granted when other pods
50+
leave and free up node resources. A value of `Infeasible` is a signal that the
51+
node cannot accommodate the requested resize. This can happen if the requested
52+
resize exceeds the maximum resources the node can ever allocate for a pod.
53+
54+
55+
## When to use this feature
56+
57+
Here are a few examples where this feature may be useful:
58+
- Pod is running on node but with either too much or too little resources.
59+
- Pods are not being scheduled do to lack of sufficient CPU or memory in a
60+
cluster that is underutilized by running pods that were overprovisioned.
61+
- Evicting certain stateful pods that need more resources to schedule them
62+
on bigger nodes is an expensive or disruptive operation when other lower
63+
priority pods in the node can be resized down or moved.
64+
65+
66+
## How to use this feature
67+
68+
In order to use this feature in v1.27, the `InPlacePodVerticalScaling`
69+
feature gate must be enabled. A local cluster with this feature enabled
70+
can be started as shown below:
71+
72+
```
73+
root@vbuild:~/go/src/k8s.io/kubernetes# FEATURE_GATES=InPlacePodVerticalScaling=true ./hack/local-up-cluster.sh
74+
go version go1.20.2 linux/arm64
75+
+++ [0320 13:52:02] Building go targets for linux/arm64
76+
k8s.io/kubernetes/cmd/kubectl (static)
77+
k8s.io/kubernetes/cmd/kube-apiserver (static)
78+
k8s.io/kubernetes/cmd/kube-controller-manager (static)
79+
k8s.io/kubernetes/cmd/cloud-controller-manager (non-static)
80+
k8s.io/kubernetes/cmd/kubelet (non-static)
81+
...
82+
...
83+
Logs:
84+
/tmp/etcd.log
85+
/tmp/kube-apiserver.log
86+
/tmp/kube-controller-manager.log
87+
88+
/tmp/kube-proxy.log
89+
/tmp/kube-scheduler.log
90+
/tmp/kubelet.log
91+
92+
To start using your cluster, you can open up another terminal/tab and run:
93+
94+
export KUBECONFIG=/var/run/kubernetes/admin.kubeconfig
95+
cluster/kubectl.sh
96+
97+
Alternatively, you can write to the default kubeconfig:
98+
99+
export KUBERNETES_PROVIDER=local
100+
101+
cluster/kubectl.sh config set-cluster local --server=https://localhost:6443 --certificate-authority=/var/run/kubernetes/server-ca.crt
102+
cluster/kubectl.sh config set-credentials myself --client-key=/var/run/kubernetes/client-admin.key --client-certificate=/var/run/kubernetes/client-admin.crt
103+
cluster/kubectl.sh config set-context local --cluster=local --user=myself
104+
cluster/kubectl.sh config use-context local
105+
cluster/kubectl.sh
106+
107+
```
108+
109+
Once the local cluster is up and running, Kubernetes users can schedule pods
110+
with resources, and resize the pods via kubectl. An example of how to use this
111+
feature is illustrated in the following demo video.
112+
113+
{{< youtube id="1m2FOuB6Bh0" title="In-place resize of pod CPU and memory resources">}}
114+
115+
116+
## Example Use Cases
117+
118+
### Cloud-based Development Environment
119+
120+
In this scenario, developers or development teams write their code locally
121+
but build and test their code in Kubernetes pods with consistent configs
122+
that reflect production use. Such pods need minimal resources when the
123+
developers are writing code, but need significantly more CPU and memory
124+
when they build their code or run a battery of tests. This use case can
125+
leverage in-place pod resize feature (with a little help from eBPF) to
126+
quickly resize the pod's resources and avoid kernel OOM (out of memory)
127+
killer from terminating their processes.
128+
129+
This [KubeCon North America 2022 conference talk](https://www.youtube.com/watch?v=jjfa1cVJLwc)
130+
illustrates the use case.
131+
132+
### Java processes initialization CPU requirements
133+
134+
Some Java applications may need significantly more CPU during initialization
135+
than what is needed during normal process operation time. If such applications
136+
specify CPU requests and limits suited for normal operation, they may suffer
137+
from very long startup times. Such pods can request higher CPU values at the
138+
time of pod creation, and can be resized down to normal running needs once the
139+
application has finished initializing.
140+
141+
142+
## Known Issues
143+
144+
This feature enters v1.27 at [alpha stage](/docs/reference/command-line-tools-reference/feature-gates/#feature-stages).
145+
Below are a few known issues users may encounter:
146+
- containerd versions below v1.6.9 do not have the CRI support needed for full
147+
end-to-end operation of this feature. Attempts to resize pods will appear
148+
to be _stuck_ in the `InProgress` state, and `resources` field in the pod's
149+
status are never updated even though the new resources may have been enacted
150+
on the running containers.
151+
- Pod resize may encounter a race condition with other pod updates, causing
152+
delayed enactment of pod resize.
153+
- Reflecting the resized container resources in pod's status may take a while.
154+
- Static CPU management policy is not supported with this feature.
155+
156+
157+
## Credits
158+
159+
This feature is a result of the efforts of a very collaborative Kubernetes community.
160+
Here's a little shoutout to just a few of the many many people that contributed
161+
countless hours of their time and helped make this happen.
162+
- [@thockin](https://github.com/thockin) for detail-oriented API design and air-tight code reviews.
163+
- [@derekwaynecarr](https://github.com/derekwaynecarr) for simplifying the design and thorough API and node reviews.
164+
- [@dchen1107](https://github.com/dchen1107) for bringing vast knowledge from Borg and helping us avoid pitfalls.
165+
- [@ruiwen-zhao](https://github.com/ruiwen-zhao) for adding containerd support that enabled full E2E implementation.
166+
- [@wangchen615](https://github.com/wangchen615) for implementing comprehensive E2E tests and driving scheduler fixes.
167+
- [@bobbypage](https://github.com/bobbypage) for invaluable help getting CI ready and quickly investigating issues, covering for me on my vacation.
168+
- [@Random-Liu](https://github.com/Random-Liu) for thorough kubelet reviews and identifying problematic race conditions.
169+
- [@Huang-Wei](https://github.com/Huang-Wei), [@ahg-g](https://github.com/ahg-g), [@alculquicondor](https://github.com/alculquicondor) for helping get scheduler changes done.
170+
- [@mikebrow](https://github.com/mikebrow) [@marosset](https://github.com/marosset) for reviews on short notice that helped CRI changes make it into v1.25.
171+
- [@endocrimes](https://github.com/endocrimes), [@ehashman](https://github.com/ehashman) for helping ensure that the oft-overlooked tests are in good shape.
172+
- [@mrunalp](https://github.com/mrunalp) for reviewing cgroupv2 changes and ensuring clean handling of v1 vs v2.
173+
- [@liggitt](https://github.com/liggitt), [@gjkim42](https://github.com/gjkim42) for tracking down, root-causing important missed issues post-merge.
174+
- [@SergeyKanzhelev](https://github.com/SergeyKanzhelev) for supporting and shepherding various issues during the home stretch.
175+
- [@pdgetrf](https://github.com/pdgetrf) for making the first prototype a reality.
176+
- [@dashpole](https://github.com/dashpole) for bringing me up to speed on 'the Kubernetes way' of doing things.
177+
- [@bsalamat](https://github.com/bsalamat), [@kgolab](https://github.com/kgolab) for very thoughtful insights and suggestions in the early stages.
178+
- [@sftim](https://github.com/sftim), [@tengqm](https://github.com/tengqm) for ensuring docs are easy to follow.
179+
- [@dims](https://github.com/dims) for being omnipresent and helping make merges happen at critical hours.
180+
- Release teams for ensuring that the project stayed healthy.
181+
182+
And a big thanks to my very supportive management [Dr. Xiaoning Ding](https://www.linkedin.com/in/xiaoningding/)
183+
and [Dr. Ying Xiong](https://www.linkedin.com/in/ying-xiong-59a2482/) for their patience and encouragement.
184+
185+
186+
## References
187+
188+
### For app developers
189+
190+
* [Resize CPU and Memory Resources assigned to Containers](/docs/tasks/configure-pod-container/resize-container-resources/)
191+
192+
* [Assign Memory Resources to Containers and Pods](/docs/tasks/configure-pod-container/assign-memory-resource/)
193+
194+
* [Assign CPU Resources to Containers and Pods](/docs/tasks/configure-pod-container/assign-cpu-resource/)
195+
196+
### For cluster administrators
197+
198+
* [Configure Default Memory Requests and Limits for a Namespace](/docs/tasks/administer-cluster/manage-resources/memory-default-namespace/)
199+
200+
* [Configure Default CPU Requests and Limits for a Namespace](/docs/tasks/administer-cluster/manage-resources/cpu-default-namespace/)

0 commit comments

Comments
 (0)