Skip to content

Commit ebdc6bd

Browse files
committed
Update In-place Pod Vertical Scaling target to 1.25. Merge CRI KEP 2273 with 1287
1 parent c85afff commit ebdc6bd

File tree

4 files changed

+186
-566
lines changed

4 files changed

+186
-566
lines changed

keps/sig-node/1287-in-place-update-pod-resources/README.md

Lines changed: 178 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,14 @@
2222
- [Flow Control](#flow-control)
2323
- [Container resource limit update ordering](#container-resource-limit-update-ordering)
2424
- [Container resource limit update failure handling](#container-resource-limit-update-failure-handling)
25+
- [CRI Changes Flow](#cri-changes-flow)
2526
- [Notes](#notes)
2627
- [Affected Components](#affected-components)
2728
- [Future Enhancements](#future-enhancements)
2829
- [Test Plan](#test-plan)
2930
- [Unit Tests](#unit-tests)
3031
- [Pod Resize E2E Tests](#pod-resize-e2e-tests)
32+
- [CRI E2E Tests](#cri-e2e-tests)
3133
- [Resource Quota and Limit Ranges](#resource-quota-and-limit-ranges)
3234
- [Resize Policy Tests](#resize-policy-tests)
3335
- [Backward Compatibility and Negative Tests](#backward-compatibility-and-negative-tests)
@@ -109,6 +111,13 @@ https://github.com/kubernetes/community/pull/1719
109111
[Vertical Resources Scaling in Kubernetes]:
110112
https://docs.google.com/document/d/18K-bl1EVsmJ04xeRq9o_vfY2GDgek6B6wmLjXw-kos4
111113

114+
This proposal also aims to improve the Container Runtime Interface (CRI) APIs for
115+
managing a Container's CPU and memory resource configurations on the runtime.
116+
It seeks to extend UpdateContainerResources CRI API such that it works for
117+
Windows, and other future runtimes besides Linux. It also seeks to extend
118+
ContainerStatus CRI API to allow Kubelet to discover the current resources
119+
configured on a Container.
120+
112121
## Motivation
113122

114123
Resources allocated to a Pod's Container(s) can require a change for various
@@ -130,6 +139,18 @@ resulting in lower availability or higher cost of running.
130139
Allowing Resources to be changed without recreating the Pod or restarting the
131140
Containers addresses this issue directly.
132141

142+
Additioally, In-Place Pod Vertical Scaling feature relies on Container Runtime
143+
Interface (CRI) to update CPU and/or memory requests/limits for a Pod's Container(s).
144+
145+
The current CRI API set has a few drawbacks that need to be addressed:
146+
1. UpdateContainerResources CRI API takes a parameter that describes Container
147+
resources to update for Linux Containers, and this may not work for Windows
148+
Containers or other potential non-Linux runtimes in the future.
149+
1. There is no CRI mechanism that lets Kubelet query and discover the CPU and
150+
memory limits configured on a Container from the Container runtime.
151+
1. The expected behavior from a runtime that handles UpdateContainerResources
152+
CRI API is not very well defined or documented.
153+
133154
### Goals
134155

135156
* Primary: allow to change container resource requests & limits without
@@ -139,6 +160,15 @@ Containers addresses this issue directly.
139160
* Secondary: allow users to specify which Containers can be resized without a
140161
restart.
141162

163+
Additionally, this proposal has two goals for CRI:
164+
- Modify UpdateContainerResources to allow it to work for Windows Containers,
165+
as well as Containers managed by other runtimes besides Linux,
166+
- Provide CRI API mechanism to query the Container runtime for CPU and memory
167+
resource configurations that are currently applied to a Container.
168+
169+
An additional goal of this proposal is to better define and document the
170+
expected behavior of a Container runtime when handling resource updates.
171+
142172
### Non-Goals
143173

144174
The explicit non-goal of this KEP is to avoid controlling full lifecycle of a
@@ -149,7 +179,14 @@ Other identified non-goals are:
149179
* allow to change Pod QoS class without a restart,
150180
* to change resources of Init Containers without a restart,
151181
* eviction of lower priority Pods to facilitate Pod resize,
152-
* updating extended resources or any other resource types besides CPU, memory.
182+
* updating extended resources or any other resource types besides CPU, memory,
183+
* support for CPU/memory manager policies besides the default 'None' policy.
184+
185+
Definition of expected behavior of a Container runtime when it handles CRI APIs
186+
related to a Container's resources is intended to be a high level guide. It is
187+
a non-goal of this proposal to define a detailed or specific way to implement
188+
these functions. Implementation specifics are left to the runtime, within the
189+
bounds of expected behavior.
153190

154191
## Proposal
155192

@@ -245,15 +282,88 @@ means the same as "Deferred".
245282
Kubelet calls UpdateContainerResources CRI API which currently takes
246283
*runtimeapi.LinuxContainerResources* parameter that works for Docker and Kata,
247284
but not for Windows. This parameter changes to *runtimeapi.ContainerResources*,
248-
that is runtime agnostic, and will contain platform-specific information.
285+
that is runtime agnostic, and will contain platform-specific information. This
286+
would make UpdateContainerResources API work for Windows, and any other future
287+
runtimes, besides Linux by making the resources parameter passed in the API
288+
specific to the target runtime.
249289

250290
Additionally, ContainerStatus CRI API is extended to hold
251291
*runtimeapi.ContainerResources* so that it allows Kubelet to query Container's
252-
CPU and memory limit configurations from runtime.
292+
CPU and memory limit configurations from runtime. This expects runtime to respond
293+
with CPU and memory resource values currently applied to the Container.
253294

254295
These CRI changes are a separate effort that does not affect the design
255296
proposed in this KEP.
256297

298+
To accomplish aforementioned CRI changes:
299+
300+
* A new protobuf message object named *ContainerResources* that encapsulates
301+
LinuxContainerResources and WindowsContainerResources is introduced as below.
302+
- This message can easily be extended for future runtimes by simply adding a
303+
new runtime-specific resources struct to the ContainerResources message.
304+
```
305+
// ContainerResources holds resource configuration for a container.
306+
message ContainerResources {
307+
// Resource configuration specific to Linux container.
308+
LinuxContainerResources linux = 1;
309+
// Resource configuration specific to Windows container.
310+
WindowsContainerResources windows = 2;
311+
}
312+
```
313+
314+
* UpdateContainerResourcesRequest message is extended to carry
315+
ContainerResources field as below.
316+
- For Linux runtimes, Kubelet fills UpdateContainerResourcesRequest.Linux in
317+
additon to UpdateContainerResourcesRequest.Resources.Linux fields.
318+
- This keeps backward compatibility by letting runtimes that rely on the
319+
current LinuxContainerResources continue to work, while enabling newer
320+
runtime versions to use UpdateContainerResourcesRequest.Resources.Linux,
321+
- It enables deprecation of UpdateContainerResourcesRequest.Linux field.
322+
```
323+
message UpdateContainerResourcesRequest {
324+
// ID of the container to update.
325+
string container_id = 1;
326+
// Resource configuration specific to Linux container.
327+
LinuxContainerResources linux = 2;
328+
// Resource configuration for the container.
329+
ContainerResources resources = 3;
330+
}
331+
```
332+
333+
* ContainerStatus message is extended to return ContainerResources as below.
334+
- This enables Kubelet to query the runtime and discover resources currently
335+
applied to a Container using ContainerStatus CRI API.
336+
```
337+
@@ -914,6 +912,8 @@ message ContainerStatus {
338+
repeated Mount mounts = 14;
339+
// Log path of container.
340+
string log_path = 15;
341+
+ // Resource configuration of the container.
342+
+ ContainerResources resources = 16;
343+
}
344+
```
345+
346+
* ContainerManager CRI API service interface is modified as below.
347+
- UpdateContainerResources takes ContainerResources parameter instead of
348+
LinuxContainerResources.
349+
```
350+
--- a/staging/src/k8s.io/cri-api/pkg/apis/services.go
351+
+++ b/staging/src/k8s.io/cri-api/pkg/apis/services.go
352+
@@ -43,8 +43,10 @@ type ContainerManager interface {
353+
ListContainers(filter *runtimeapi.ContainerFilter) ([]*runtimeapi.Container, error)
354+
// ContainerStatus returns the status of the container.
355+
ContainerStatus(containerID string) (*runtimeapi.ContainerStatus, error)
356+
- // UpdateContainerResources updates the cgroup resources for the container.
357+
- UpdateContainerResources(containerID string, resources *runtimeapi.LinuxContainerResources) error
358+
+ // UpdateContainerResources updates resource configuration for the container.
359+
+ UpdateContainerResources(containerID string, resources *runtimeapi.ContainerResources) error
360+
// ExecSync executes a command in the container, and returns the stdout output.
361+
// If command exits with a non-zero exit code, an error is returned.
362+
ExecSync(containerID string, cmd []string, timeout time.Duration) (stdout []byte, stderr []byte, err error)
363+
```
364+
365+
* Kubelet code is modified to leverage these changes.
366+
257367
### Risks and Mitigations
258368

259369
1. Backward compatibility: When Pod.Spec.Containers[i].Resources becomes
@@ -484,6 +594,54 @@ container limits does not exceed Pod-level cgroup limit at any point. Once all
484594
the container limits have been successfully updated, Kubelet updates the Pod's
485595
Status.ContainerStatuses[i].Resources to match the desired limit values.
486596

597+
#### CRI Changes Flow
598+
599+
Below diagram is an overview of Kubelet using UpdateContainerResources and
600+
ContainerStatus CRI APIs to set new container resource limits, and update the
601+
Pod Status in response to user changing the desired resources in Pod Spec.
602+
603+
```
604+
+-----------+ +-----------+ +-----------+
605+
| | | | | |
606+
| apiserver | | kubelet | | runtime |
607+
| | | | | |
608+
+-----+-----+ +-----+-----+ +-----+-----+
609+
| | |
610+
| watch (pod update) | |
611+
|------------------------------>| |
612+
| [Containers.Resources] | |
613+
| | |
614+
| (admit) |
615+
| | |
616+
| | UpdateContainerResources() |
617+
| |----------------------------->|
618+
| | (set limits)
619+
| |<- - - - - - - - - - - - - - -|
620+
| | |
621+
| | ContainerStatus() |
622+
| |----------------------------->|
623+
| | |
624+
| | [ContainerResources] |
625+
| |<- - - - - - - - - - - - - - -|
626+
| | |
627+
| update (pod status) | |
628+
|<------------------------------| |
629+
| [ContainerStatuses.Resources] | |
630+
| | |
631+
632+
```
633+
634+
* Kubelet invokes UpdateContainerResources() CRI API in ContainerManager
635+
interface to configure new CPU and memory limits for a Container by
636+
specifying those values in ContainerResources parameter to the API. Kubelet
637+
sets ContainerResources parameter specific to the target runtime platform
638+
when calling this CRI API.
639+
640+
* Kubelet calls ContainerStatus() CRI API in ContainerManager interface to get
641+
the CPU and memory limits applied to a Container. It uses the values returned
642+
in ContainerStatus.Resources to update ContainerStatuses[i].Resources.Limits
643+
for that Container in the Pod's Status.
644+
487645
#### Notes
488646

489647
* If CPU Manager policy for a Node is set to 'static', then only integral
@@ -560,6 +718,9 @@ Other components:
560718
Unit tests will cover the sanity of code changes that implements the feature,
561719
and the policy controls that are introduced as part of this feature.
562720

721+
CRI unit tests are updated to reflect use of ContainerResources object in
722+
UpdateContainerResources and ContainerStatus APIs.
723+
563724
#### Pod Resize E2E Tests
564725

565726
End-to-End tests resize a Pod via PATCH to Pod's Spec.Containers[i].Resources.
@@ -618,6 +779,12 @@ E2E tests for Guaranteed class Pod with three containers (c1, c2, c3):
618779
1. Increase CPU for c1 & c3, decrease c2 - net CPU increase for Pod.
619780
1. Increase memory for c1 & c3, decrease c2 - net memory increase for Pod.
620781

782+
#### CRI E2E Tests
783+
784+
1. E2E test is added to verify UpdateContainerResources API with containerd runtime.
785+
1. E2E test is added to verify ContainerStatus API using containerd runtime.
786+
1. E2E test is added to verify backward compatibility using containerd runtime.
787+
621788
#### Resource Quota and Limit Ranges
622789

623790
Setup a namespace with ResourceQuota and a single, valid Pod.
@@ -672,13 +839,18 @@ TODO: Identify more cases
672839
- Resize Policies functionality is implemented,
673840
- Unit tests and E2E tests covering basic functionality are added,
674841
- E2E tests covering multiple containers are added.
842+
- UpdateContainerResources API changes are done and tested with containerd
843+
runtime, backward compatibility is maintained.
844+
- ContainerStatus API changes are done. Tests are ready but not enforced.
675845

676846
#### Beta
677847
- VPA alpha integration of feature completed and any bugs addressed,
678848
- E2E tests covering Resize Policy, LimitRanger, and ResourceQuota are added,
679849
- Negative tests are identified and added.
680850
- A "/resize" subresource is defined and implemented.
681851
- Pod-scoped resources are handled if that KEP is past alpha
852+
- ContainerStatus API change tests are enforced and containerd runtime must comply.
853+
- ContainerStatus API change tests are enforced and Windows runtime should comply.
682854

683855
#### Stable
684856
- VPA integration of feature moved to beta,
@@ -922,13 +1094,16 @@ _This section must be completed when targeting beta graduation to a release._
9221094
- 2019-01-18 - implementation proposal extended
9231095
- 2019-03-07 - changes to flow control, updates per review feedback
9241096
- 2019-08-29 - updated design proposal
1097+
- 2019-10-25 - Initial CRI changes KEP draft created
9251098
- 2019-10-25 - update key open items and move KEP to implementable
9261099
- 2020-01-06 - API review suggested changes incorporated
9271100
- 2020-01-13 - Test plan and graduation criteria added
1101+
- 2020-01-14 - CRI changes test plan and graduation criteria added
9281102
- 2020-01-21 - Graduation criteria updated per review feedback
9291103
- 2020-11-06 - Updated with feedback from reviews
9301104
- 2020-12-09 - Add "Deferred"
9311105
- 2021-02-05 - Final consensus on resourcesAllocated[] and resize[]
1106+
- 2022-05-01 - KEP 2273-kubelet-container-resources-cri-api-changes merged with this KEP
9321107

9331108
## Drawbacks
9341109

keps/sig-node/1287-in-place-update-pod-resources/kep.yaml

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,25 +18,28 @@ reviewers:
1818
- "@dchen1107"
1919
- "@ahg-g"
2020
- "@k82cn"
21+
- "@Random-Liu"
22+
- "@yujuhong"
23+
- "@PatrickLang"
2124
approvers:
2225
- "@dchen1107"
2326
- "@derekwaynecarr"
2427
- "@ahg-g"
2528
- "@mwielgus"
29+
- "@thockin"
2630
prr-approvers:
2731
- "@ehashman"
2832
see-also:
29-
- "/keps/sig-node/2273-kubelet-container-resources-cri-api-changes"
3033
replaces:
3134

3235
stage: "alpha"
3336

34-
latest-milestone: "v1.24"
37+
latest-milestone: "v1.25"
3538

3639
milestone:
37-
alpha: "v1.24"
38-
beta: "v1.25"
39-
stable: "v1.27"
40+
alpha: "v1.25"
41+
beta: "v1.26"
42+
stable: "v1.28"
4043

4144
feature-gates:
4245
- name: InPlacePodVerticalScaling

0 commit comments

Comments
 (0)