|
| 1 | +--- |
| 2 | +layout: blog |
| 3 | +title: "Kubernetes 1.27: In-place Resource Resize for Kubernetes Pods (alpha)" |
| 4 | +date: 2023-05-12 |
| 5 | +slug: in-place-pod-resize-alpha |
| 6 | +--- |
| 7 | + |
| 8 | +**Author:** Vinay Kulkarni (Kubescaler Labs) |
| 9 | + |
| 10 | +If you have deployed Kubernetes pods with CPU and/or memory resources |
| 11 | +specified, you may have noticed that changing the resource values involves |
| 12 | +restarting the pod. This has been a disruptive operation for running |
| 13 | +workloads... until now. |
| 14 | + |
| 15 | +In Kubernetes v1.27, we have added a new alpha feature that allows users |
| 16 | +to resize CPU/memory resources allocated to pods without restarting the |
| 17 | +containers. To facilitate this, the `resources` field in a pod's containers |
| 18 | +now allow mutation for `cpu` and `memory` resources. They can be changed |
| 19 | +simply by patching the running pod spec. |
| 20 | + |
| 21 | +This also means that `resources` field in the pod spec can no longer be |
| 22 | +relied upon as an indicator of the pod's actual resources. Monitoring tools |
| 23 | +and other such applications must now look at new fields in the pod's status. |
| 24 | +Kubernetes queries the actual CPU and memory requests and limits enforced on |
| 25 | +the running containers via a CRI (Container Runtime Interface) API call to the |
| 26 | +runtime, such as containerd, which is responsible for running the containers. |
| 27 | +The response from container runtime is reflected in the pod's status. |
| 28 | + |
| 29 | +In addition, a new `restartPolicy` for resize has been added. It gives users |
| 30 | +control over how their containers are handled when resources are resized. |
| 31 | + |
| 32 | + |
| 33 | +## What's new in v1.27? |
| 34 | + |
| 35 | +Besides the addition of resize policy in the pod's spec, a new field named |
| 36 | +`allocatedResources` has been added to `containerStatuses` in the pod's status. |
| 37 | +This field reflects the node resources allocated to the pod's containers. |
| 38 | + |
| 39 | +In addition, a new field called `resources` has been added to the container's |
| 40 | +status. This field reflects the actual resource requests and limits configured |
| 41 | +on the running containers as reported by the container runtime. |
| 42 | + |
| 43 | +Lastly, a new field named `resize` has been added to the pod's status to show the |
| 44 | +status of the last requested resize. A value of `Proposed` is an acknowledgement |
| 45 | +of the requested resize and indicates that request was validated and recorded. A |
| 46 | +value of `InProgress` indicates that the node has accepted the resize request |
| 47 | +and is in the process of applying the resize request to the pod's containers. |
| 48 | +A value of `Deferred` means that the requested resize cannot be granted at this |
| 49 | +time, and the node will keep retrying. The resize may be granted when other pods |
| 50 | +leave and free up node resources. A value of `Infeasible` is a signal that the |
| 51 | +node cannot accommodate the requested resize. This can happen if the requested |
| 52 | +resize exceeds the maximum resources the node can ever allocate for a pod. |
| 53 | + |
| 54 | + |
| 55 | +## When to use this feature |
| 56 | + |
| 57 | +Here are a few examples where this feature may be useful: |
| 58 | +- Pod is running on node but with either too much or too little resources. |
| 59 | +- Pods are not being scheduled do to lack of sufficient CPU or memory in a |
| 60 | +cluster that is underutilized by running pods that were overprovisioned. |
| 61 | +- Evicting certain stateful pods that need more resources to schedule them |
| 62 | +on bigger nodes is an expensive or disruptive operation when other lower |
| 63 | +priority pods in the node can be resized down or moved. |
| 64 | + |
| 65 | + |
| 66 | +## How to use this feature |
| 67 | + |
| 68 | +In order to use this feature in v1.27, the `InPlacePodVerticalScaling` |
| 69 | +feature gate must be enabled. A local cluster with this feature enabled |
| 70 | +can be started as shown below: |
| 71 | + |
| 72 | +``` |
| 73 | +root@vbuild:~/go/src/k8s.io/kubernetes# FEATURE_GATES=InPlacePodVerticalScaling=true ./hack/local-up-cluster.sh |
| 74 | +go version go1.20.2 linux/arm64 |
| 75 | ++++ [0320 13:52:02] Building go targets for linux/arm64 |
| 76 | + k8s.io/kubernetes/cmd/kubectl (static) |
| 77 | + k8s.io/kubernetes/cmd/kube-apiserver (static) |
| 78 | + k8s.io/kubernetes/cmd/kube-controller-manager (static) |
| 79 | + k8s.io/kubernetes/cmd/cloud-controller-manager (non-static) |
| 80 | + k8s.io/kubernetes/cmd/kubelet (non-static) |
| 81 | +... |
| 82 | +... |
| 83 | +Logs: |
| 84 | + /tmp/etcd.log |
| 85 | + /tmp/kube-apiserver.log |
| 86 | + /tmp/kube-controller-manager.log |
| 87 | +
|
| 88 | + /tmp/kube-proxy.log |
| 89 | + /tmp/kube-scheduler.log |
| 90 | + /tmp/kubelet.log |
| 91 | +
|
| 92 | +To start using your cluster, you can open up another terminal/tab and run: |
| 93 | +
|
| 94 | + export KUBECONFIG=/var/run/kubernetes/admin.kubeconfig |
| 95 | + cluster/kubectl.sh |
| 96 | +
|
| 97 | +Alternatively, you can write to the default kubeconfig: |
| 98 | +
|
| 99 | + export KUBERNETES_PROVIDER=local |
| 100 | +
|
| 101 | + cluster/kubectl.sh config set-cluster local --server=https://localhost:6443 --certificate-authority=/var/run/kubernetes/server-ca.crt |
| 102 | + cluster/kubectl.sh config set-credentials myself --client-key=/var/run/kubernetes/client-admin.key --client-certificate=/var/run/kubernetes/client-admin.crt |
| 103 | + cluster/kubectl.sh config set-context local --cluster=local --user=myself |
| 104 | + cluster/kubectl.sh config use-context local |
| 105 | + cluster/kubectl.sh |
| 106 | +
|
| 107 | +``` |
| 108 | + |
| 109 | +Once the local cluster is up and running, Kubernetes users can schedule pods |
| 110 | +with resources, and resize the pods via kubectl. An example of how to use this |
| 111 | +feature is illustrated in the following demo video. |
| 112 | + |
| 113 | +{{< youtube id="1m2FOuB6Bh0" title="In-place resize of pod CPU and memory resources">}} |
| 114 | + |
| 115 | + |
| 116 | +## Example Use Cases |
| 117 | + |
| 118 | +### Cloud-based Development Environment |
| 119 | + |
| 120 | +In this scenario, developers or development teams write their code locally |
| 121 | +but build and test their code in Kubernetes pods with consistent configs |
| 122 | +that reflect production use. Such pods need minimal resources when the |
| 123 | +developers are writing code, but need significantly more CPU and memory |
| 124 | +when they build their code or run a battery of tests. This use case can |
| 125 | +leverage in-place pod resize feature (with a little help from eBPF) to |
| 126 | +quickly resize the pod's resources and avoid kernel OOM (out of memory) |
| 127 | +killer from terminating their processes. |
| 128 | + |
| 129 | +This [KubeCon North America 2022 conference talk](https://www.youtube.com/watch?v=jjfa1cVJLwc) |
| 130 | +illustrates the use case. |
| 131 | + |
| 132 | +### Java processes initialization CPU requirements |
| 133 | + |
| 134 | +Some Java applications may need significantly more CPU during initialization |
| 135 | +than what is needed during normal process operation time. If such applications |
| 136 | +specify CPU requests and limits suited for normal operation, they may suffer |
| 137 | +from very long startup times. Such pods can request higher CPU values at the |
| 138 | +time of pod creation, and can be resized down to normal running needs once the |
| 139 | +application has finished initializing. |
| 140 | + |
| 141 | + |
| 142 | +## Known Issues |
| 143 | + |
| 144 | +This feature enters v1.27 at [alpha stage](/docs/reference/command-line-tools-reference/feature-gates/#feature-stages). |
| 145 | +Below are a few known issues users may encounter: |
| 146 | +- containerd versions below v1.6.9 do not have the CRI support needed for full |
| 147 | + end-to-end operation of this feature. Attempts to resize pods will appear |
| 148 | + to be _stuck_ in the `InProgress` state, and `resources` field in the pod's |
| 149 | + status are never updated even though the new resources may have been enacted |
| 150 | + on the running containers. |
| 151 | +- Pod resize may encounter a race condition with other pod updates, causing |
| 152 | + delayed enactment of pod resize. |
| 153 | +- Reflecting the resized container resources in pod's status may take a while. |
| 154 | +- Static CPU management policy is not supported with this feature. |
| 155 | + |
| 156 | + |
| 157 | +## Credits |
| 158 | + |
| 159 | +This feature is a result of the efforts of a very collaborative Kubernetes community. |
| 160 | +Here's a little shoutout to just a few of the many many people that contributed |
| 161 | +countless hours of their time and helped make this happen. |
| 162 | +- [@thockin](https://github.com/thockin) for detail-oriented API design and air-tight code reviews. |
| 163 | +- [@derekwaynecarr](https://github.com/derekwaynecarr) for simplifying the design and thorough API and node reviews. |
| 164 | +- [@dchen1107](https://github.com/dchen1107) for bringing vast knowledge from Borg and helping us avoid pitfalls. |
| 165 | +- [@ruiwen-zhao](https://github.com/ruiwen-zhao) for adding containerd support that enabled full E2E implementation. |
| 166 | +- [@wangchen615](https://github.com/wangchen615) for implementing comprehensive E2E tests and driving scheduler fixes. |
| 167 | +- [@bobbypage](https://github.com/bobbypage) for invaluable help getting CI ready and quickly investigating issues, covering for me on my vacation. |
| 168 | +- [@Random-Liu](https://github.com/Random-Liu) for thorough kubelet reviews and identifying problematic race conditions. |
| 169 | +- [@Huang-Wei](https://github.com/Huang-Wei), [@ahg-g](https://github.com/ahg-g), [@alculquicondor](https://github.com/alculquicondor) for helping get scheduler changes done. |
| 170 | +- [@mikebrow](https://github.com/mikebrow) [@marosset](https://github.com/marosset) for reviews on short notice that helped CRI changes make it into v1.25. |
| 171 | +- [@endocrimes](https://github.com/endocrimes), [@ehashman](https://github.com/ehashman) for helping ensure that the oft-overlooked tests are in good shape. |
| 172 | +- [@mrunalp](https://github.com/mrunalp) for reviewing cgroupv2 changes and ensuring clean handling of v1 vs v2. |
| 173 | +- [@liggitt](https://github.com/liggitt), [@gjkim42](https://github.com/gjkim42) for tracking down, root-causing important missed issues post-merge. |
| 174 | +- [@SergeyKanzhelev](https://github.com/SergeyKanzhelev) for supporting and shepherding various issues during the home stretch. |
| 175 | +- [@pdgetrf](https://github.com/pdgetrf) for making the first prototype a reality. |
| 176 | +- [@dashpole](https://github.com/dashpole) for bringing me up to speed on 'the Kubernetes way' of doing things. |
| 177 | +- [@bsalamat](https://github.com/bsalamat), [@kgolab](https://github.com/kgolab) for very thoughtful insights and suggestions in the early stages. |
| 178 | +- [@sftim](https://github.com/sftim), [@tengqm](https://github.com/tengqm) for ensuring docs are easy to follow. |
| 179 | +- [@dims](https://github.com/dims) for being omnipresent and helping make merges happen at critical hours. |
| 180 | +- Release teams for ensuring that the project stayed healthy. |
| 181 | + |
| 182 | +And a big thanks to my very supportive management [Dr. Xiaoning Ding](https://www.linkedin.com/in/xiaoningding/) |
| 183 | +and [Dr. Ying Xiong](https://www.linkedin.com/in/ying-xiong-59a2482/) for their patience and encouragement. |
| 184 | + |
| 185 | + |
| 186 | +## References |
| 187 | + |
| 188 | +### For app developers |
| 189 | + |
| 190 | +* [Resize CPU and Memory Resources assigned to Containers](/docs/tasks/configure-pod-container/resize-container-resources/) |
| 191 | + |
| 192 | +* [Assign Memory Resources to Containers and Pods](/docs/tasks/configure-pod-container/assign-memory-resource/) |
| 193 | + |
| 194 | +* [Assign CPU Resources to Containers and Pods](/docs/tasks/configure-pod-container/assign-cpu-resource/) |
| 195 | + |
| 196 | +### For cluster administrators |
| 197 | + |
| 198 | +* [Configure Default Memory Requests and Limits for a Namespace](/docs/tasks/administer-cluster/manage-resources/memory-default-namespace/) |
| 199 | + |
| 200 | +* [Configure Default CPU Requests and Limits for a Namespace](/docs/tasks/administer-cluster/manage-resources/cpu-default-namespace/) |
0 commit comments