22
22
- [ Flow Control] ( #flow-control )
23
23
- [ Container resource limit update ordering] ( #container-resource-limit-update-ordering )
24
24
- [ Container resource limit update failure handling] ( #container-resource-limit-update-failure-handling )
25
+ - [ CRI Changes Flow] ( #cri-changes-flow )
25
26
- [ Notes] ( #notes )
26
27
- [ Affected Components] ( #affected-components )
27
28
- [ Future Enhancements] ( #future-enhancements )
28
29
- [ Test Plan] ( #test-plan )
29
30
- [ Unit Tests] ( #unit-tests )
30
31
- [ Pod Resize E2E Tests] ( #pod-resize-e2e-tests )
32
+ - [ CRI E2E Tests] ( #cri-e2e-tests )
31
33
- [ Resource Quota and Limit Ranges] ( #resource-quota-and-limit-ranges )
32
34
- [ Resize Policy Tests] ( #resize-policy-tests )
33
35
- [ Backward Compatibility and Negative Tests] ( #backward-compatibility-and-negative-tests )
@@ -109,6 +111,13 @@ https://github.com/kubernetes/community/pull/1719
109
111
[ Vertical Resources Scaling in Kubernetes] :
110
112
https://docs.google.com/document/d/18K-bl1EVsmJ04xeRq9o_vfY2GDgek6B6wmLjXw-kos4
111
113
114
+ This proposal also aims to improve the Container Runtime Interface (CRI) APIs for
115
+ managing a Container's CPU and memory resource configurations on the runtime.
116
+ It seeks to extend UpdateContainerResources CRI API such that it works for
117
+ Windows, and other future runtimes besides Linux. It also seeks to extend
118
+ ContainerStatus CRI API to allow Kubelet to discover the current resources
119
+ configured on a Container.
120
+
112
121
## Motivation
113
122
114
123
Resources allocated to a Pod's Container(s) can require a change for various
@@ -130,6 +139,18 @@ resulting in lower availability or higher cost of running.
130
139
Allowing Resources to be changed without recreating the Pod or restarting the
131
140
Containers addresses this issue directly.
132
141
142
+ Additioally, In-Place Pod Vertical Scaling feature relies on Container Runtime
143
+ Interface (CRI) to update CPU and/or memory requests/limits for a Pod's Container(s).
144
+
145
+ The current CRI API set has a few drawbacks that need to be addressed:
146
+ 1 . UpdateContainerResources CRI API takes a parameter that describes Container
147
+ resources to update for Linux Containers, and this may not work for Windows
148
+ Containers or other potential non-Linux runtimes in the future.
149
+ 1 . There is no CRI mechanism that lets Kubelet query and discover the CPU and
150
+ memory limits configured on a Container from the Container runtime.
151
+ 1 . The expected behavior from a runtime that handles UpdateContainerResources
152
+ CRI API is not very well defined or documented.
153
+
133
154
### Goals
134
155
135
156
* Primary: allow to change container resource requests & limits without
@@ -139,6 +160,15 @@ Containers addresses this issue directly.
139
160
* Secondary: allow users to specify which Containers can be resized without a
140
161
restart.
141
162
163
+ Additionally, this proposal has two goals for CRI:
164
+ - Modify UpdateContainerResources to allow it to work for Windows Containers,
165
+ as well as Containers managed by other runtimes besides Linux,
166
+ - Provide CRI API mechanism to query the Container runtime for CPU and memory
167
+ resource configurations that are currently applied to a Container.
168
+
169
+ An additional goal of this proposal is to better define and document the
170
+ expected behavior of a Container runtime when handling resource updates.
171
+
142
172
### Non-Goals
143
173
144
174
The explicit non-goal of this KEP is to avoid controlling full lifecycle of a
@@ -149,7 +179,14 @@ Other identified non-goals are:
149
179
* allow to change Pod QoS class without a restart,
150
180
* to change resources of Init Containers without a restart,
151
181
* eviction of lower priority Pods to facilitate Pod resize,
152
- * updating extended resources or any other resource types besides CPU, memory.
182
+ * updating extended resources or any other resource types besides CPU, memory,
183
+ * support for CPU/memory manager policies besides the default 'None' policy.
184
+
185
+ Definition of expected behavior of a Container runtime when it handles CRI APIs
186
+ related to a Container's resources is intended to be a high level guide. It is
187
+ a non-goal of this proposal to define a detailed or specific way to implement
188
+ these functions. Implementation specifics are left to the runtime, within the
189
+ bounds of expected behavior.
153
190
154
191
## Proposal
155
192
@@ -245,15 +282,88 @@ means the same as "Deferred".
245
282
Kubelet calls UpdateContainerResources CRI API which currently takes
246
283
* runtimeapi.LinuxContainerResources* parameter that works for Docker and Kata,
247
284
but not for Windows. This parameter changes to * runtimeapi.ContainerResources* ,
248
- that is runtime agnostic, and will contain platform-specific information.
285
+ that is runtime agnostic, and will contain platform-specific information. This
286
+ would make UpdateContainerResources API work for Windows, and any other future
287
+ runtimes, besides Linux by making the resources parameter passed in the API
288
+ specific to the target runtime.
249
289
250
290
Additionally, ContainerStatus CRI API is extended to hold
251
291
* runtimeapi.ContainerResources* so that it allows Kubelet to query Container's
252
- CPU and memory limit configurations from runtime.
292
+ CPU and memory limit configurations from runtime. This expects runtime to respond
293
+ with CPU and memory resource values currently applied to the Container.
253
294
254
295
These CRI changes are a separate effort that does not affect the design
255
296
proposed in this KEP.
256
297
298
+ To accomplish aforementioned CRI changes:
299
+
300
+ * A new protobuf message object named * ContainerResources* that encapsulates
301
+ LinuxContainerResources and WindowsContainerResources is introduced as below.
302
+ - This message can easily be extended for future runtimes by simply adding a
303
+ new runtime-specific resources struct to the ContainerResources message.
304
+ ```
305
+ // ContainerResources holds resource configuration for a container.
306
+ message ContainerResources {
307
+ // Resource configuration specific to Linux container.
308
+ LinuxContainerResources linux = 1;
309
+ // Resource configuration specific to Windows container.
310
+ WindowsContainerResources windows = 2;
311
+ }
312
+ ```
313
+
314
+ * UpdateContainerResourcesRequest message is extended to carry
315
+ ContainerResources field as below.
316
+ - For Linux runtimes, Kubelet fills UpdateContainerResourcesRequest.Linux in
317
+ additon to UpdateContainerResourcesRequest.Resources.Linux fields.
318
+ - This keeps backward compatibility by letting runtimes that rely on the
319
+ current LinuxContainerResources continue to work, while enabling newer
320
+ runtime versions to use UpdateContainerResourcesRequest.Resources.Linux,
321
+ - It enables deprecation of UpdateContainerResourcesRequest.Linux field.
322
+ ```
323
+ message UpdateContainerResourcesRequest {
324
+ // ID of the container to update.
325
+ string container_id = 1;
326
+ // Resource configuration specific to Linux container.
327
+ LinuxContainerResources linux = 2;
328
+ // Resource configuration for the container.
329
+ ContainerResources resources = 3;
330
+ }
331
+ ```
332
+
333
+ * ContainerStatus message is extended to return ContainerResources as below.
334
+ - This enables Kubelet to query the runtime and discover resources currently
335
+ applied to a Container using ContainerStatus CRI API.
336
+ ```
337
+ @@ -914,6 +912,8 @@ message ContainerStatus {
338
+ repeated Mount mounts = 14;
339
+ // Log path of container.
340
+ string log_path = 15;
341
+ + // Resource configuration of the container.
342
+ + ContainerResources resources = 16;
343
+ }
344
+ ```
345
+
346
+ * ContainerManager CRI API service interface is modified as below.
347
+ - UpdateContainerResources takes ContainerResources parameter instead of
348
+ LinuxContainerResources.
349
+ ```
350
+ --- a/staging/src/k8s.io/cri-api/pkg/apis/services.go
351
+ +++ b/staging/src/k8s.io/cri-api/pkg/apis/services.go
352
+ @@ -43,8 +43,10 @@ type ContainerManager interface {
353
+ ListContainers(filter *runtimeapi.ContainerFilter) ([]*runtimeapi.Container, error)
354
+ // ContainerStatus returns the status of the container.
355
+ ContainerStatus(containerID string) (*runtimeapi.ContainerStatus, error)
356
+ - // UpdateContainerResources updates the cgroup resources for the container.
357
+ - UpdateContainerResources(containerID string, resources *runtimeapi.LinuxContainerResources) error
358
+ + // UpdateContainerResources updates resource configuration for the container.
359
+ + UpdateContainerResources(containerID string, resources *runtimeapi.ContainerResources) error
360
+ // ExecSync executes a command in the container, and returns the stdout output.
361
+ // If command exits with a non-zero exit code, an error is returned.
362
+ ExecSync(containerID string, cmd []string, timeout time.Duration) (stdout []byte, stderr []byte, err error)
363
+ ```
364
+
365
+ * Kubelet code is modified to leverage these changes.
366
+
257
367
### Risks and Mitigations
258
368
259
369
1 . Backward compatibility: When Pod.Spec.Containers[ i] .Resources becomes
@@ -484,6 +594,54 @@ container limits does not exceed Pod-level cgroup limit at any point. Once all
484
594
the container limits have been successfully updated, Kubelet updates the Pod's
485
595
Status.ContainerStatuses[ i] .Resources to match the desired limit values.
486
596
597
+ #### CRI Changes Flow
598
+
599
+ Below diagram is an overview of Kubelet using UpdateContainerResources and
600
+ ContainerStatus CRI APIs to set new container resource limits, and update the
601
+ Pod Status in response to user changing the desired resources in Pod Spec.
602
+
603
+ ```
604
+ +-----------+ +-----------+ +-----------+
605
+ | | | | | |
606
+ | apiserver | | kubelet | | runtime |
607
+ | | | | | |
608
+ +-----+-----+ +-----+-----+ +-----+-----+
609
+ | | |
610
+ | watch (pod update) | |
611
+ |------------------------------>| |
612
+ | [Containers.Resources] | |
613
+ | | |
614
+ | (admit) |
615
+ | | |
616
+ | | UpdateContainerResources() |
617
+ | |----------------------------->|
618
+ | | (set limits)
619
+ | |<- - - - - - - - - - - - - - -|
620
+ | | |
621
+ | | ContainerStatus() |
622
+ | |----------------------------->|
623
+ | | |
624
+ | | [ContainerResources] |
625
+ | |<- - - - - - - - - - - - - - -|
626
+ | | |
627
+ | update (pod status) | |
628
+ |<------------------------------| |
629
+ | [ContainerStatuses.Resources] | |
630
+ | | |
631
+
632
+ ```
633
+
634
+ * Kubelet invokes UpdateContainerResources() CRI API in ContainerManager
635
+ interface to configure new CPU and memory limits for a Container by
636
+ specifying those values in ContainerResources parameter to the API. Kubelet
637
+ sets ContainerResources parameter specific to the target runtime platform
638
+ when calling this CRI API.
639
+
640
+ * Kubelet calls ContainerStatus() CRI API in ContainerManager interface to get
641
+ the CPU and memory limits applied to a Container. It uses the values returned
642
+ in ContainerStatus.Resources to update ContainerStatuses[ i] .Resources.Limits
643
+ for that Container in the Pod's Status.
644
+
487
645
#### Notes
488
646
489
647
* If CPU Manager policy for a Node is set to 'static', then only integral
@@ -560,6 +718,9 @@ Other components:
560
718
Unit tests will cover the sanity of code changes that implements the feature,
561
719
and the policy controls that are introduced as part of this feature.
562
720
721
+ CRI unit tests are updated to reflect use of ContainerResources object in
722
+ UpdateContainerResources and ContainerStatus APIs.
723
+
563
724
#### Pod Resize E2E Tests
564
725
565
726
End-to-End tests resize a Pod via PATCH to Pod's Spec.Containers[ i] .Resources.
@@ -618,6 +779,12 @@ E2E tests for Guaranteed class Pod with three containers (c1, c2, c3):
618
779
1 . Increase CPU for c1 & c3, decrease c2 - net CPU increase for Pod.
619
780
1 . Increase memory for c1 & c3, decrease c2 - net memory increase for Pod.
620
781
782
+ #### CRI E2E Tests
783
+
784
+ 1 . E2E test is added to verify UpdateContainerResources API with containerd runtime.
785
+ 1 . E2E test is added to verify ContainerStatus API using containerd runtime.
786
+ 1 . E2E test is added to verify backward compatibility using containerd runtime.
787
+
621
788
#### Resource Quota and Limit Ranges
622
789
623
790
Setup a namespace with ResourceQuota and a single, valid Pod.
@@ -672,13 +839,18 @@ TODO: Identify more cases
672
839
- Resize Policies functionality is implemented,
673
840
- Unit tests and E2E tests covering basic functionality are added,
674
841
- E2E tests covering multiple containers are added.
842
+ - UpdateContainerResources API changes are done and tested with containerd
843
+ runtime, backward compatibility is maintained.
844
+ - ContainerStatus API changes are done. Tests are ready but not enforced.
675
845
676
846
#### Beta
677
847
- VPA alpha integration of feature completed and any bugs addressed,
678
848
- E2E tests covering Resize Policy, LimitRanger, and ResourceQuota are added,
679
849
- Negative tests are identified and added.
680
850
- A "/resize" subresource is defined and implemented.
681
851
- Pod-scoped resources are handled if that KEP is past alpha
852
+ - ContainerStatus API change tests are enforced and containerd runtime must comply.
853
+ - ContainerStatus API change tests are enforced and Windows runtime should comply.
682
854
683
855
#### Stable
684
856
- VPA integration of feature moved to beta,
@@ -922,13 +1094,16 @@ _This section must be completed when targeting beta graduation to a release._
922
1094
- 2019-01-18 - implementation proposal extended
923
1095
- 2019-03-07 - changes to flow control, updates per review feedback
924
1096
- 2019-08-29 - updated design proposal
1097
+ - 2019-10-25 - Initial CRI changes KEP draft created
925
1098
- 2019-10-25 - update key open items and move KEP to implementable
926
1099
- 2020-01-06 - API review suggested changes incorporated
927
1100
- 2020-01-13 - Test plan and graduation criteria added
1101
+ - 2020-01-14 - CRI changes test plan and graduation criteria added
928
1102
- 2020-01-21 - Graduation criteria updated per review feedback
929
1103
- 2020-11-06 - Updated with feedback from reviews
930
1104
- 2020-12-09 - Add "Deferred"
931
1105
- 2021-02-05 - Final consensus on resourcesAllocated[ ] and resize[ ]
1106
+ - 2022-05-01 - KEP 2273-kubelet-container-resources-cri-api-changes merged with this KEP
932
1107
933
1108
## Drawbacks
934
1109
0 commit comments