|
3 | 3 | ## Table of Contents
|
4 | 4 |
|
5 | 5 | <!-- toc -->
|
| 6 | +- [Release Signoff Checklist](#release-signoff-checklist) |
6 | 7 | - [Summary](#summary)
|
7 | 8 | - [Motivation](#motivation)
|
8 | 9 | - [Goals](#goals)
|
|
13 | 14 | - [Container Resize Policy](#container-resize-policy)
|
14 | 15 | - [Resize Status](#resize-status)
|
15 | 16 | - [CRI Changes](#cri-changes)
|
| 17 | + - [Risks and Mitigations](#risks-and-mitigations) |
| 18 | +- [Design Details](#design-details) |
16 | 19 | - [Kubelet and API Server Interaction](#kubelet-and-api-server-interaction)
|
17 | 20 | - [Kubelet Restart Tolerance](#kubelet-restart-tolerance)
|
18 | 21 | - [Scheduler and API Server Interaction](#scheduler-and-api-server-interaction)
|
|
22 | 25 | - [Notes](#notes)
|
23 | 26 | - [Affected Components](#affected-components)
|
24 | 27 | - [Future Enhancements](#future-enhancements)
|
25 |
| - - [Risks and Mitigations](#risks-and-mitigations) |
26 | 28 | - [Test Plan](#test-plan)
|
27 | 29 | - [Unit Tests](#unit-tests)
|
28 | 30 | - [Pod Resize E2E Tests](#pod-resize-e2e-tests)
|
|
43 | 45 | - [Scalability](#scalability)
|
44 | 46 | - [Troubleshooting](#troubleshooting)
|
45 | 47 | - [Implementation History](#implementation-history)
|
| 48 | +- [Drawbacks](#drawbacks) |
| 49 | +- [Alternatives](#alternatives) |
46 | 50 | <!-- /toc -->
|
47 | 51 |
|
| 52 | +## Release Signoff Checklist |
| 53 | + |
| 54 | +<!-- |
| 55 | +**ACTION REQUIRED:** In order to merge code into a release, there must be an |
| 56 | +issue in [kubernetes/enhancements] referencing this KEP and targeting a release |
| 57 | +milestone **before the [Enhancement Freeze](https://git.k8s.io/sig-release/releases) |
| 58 | +of the targeted release**. |
| 59 | +
|
| 60 | +For enhancements that make changes to code or processes/procedures in core |
| 61 | +Kubernetes—i.e., [kubernetes/kubernetes], we require the following Release |
| 62 | +Signoff checklist to be completed. |
| 63 | +
|
| 64 | +Check these off as they are completed for the Release Team to track. These |
| 65 | +checklist items _must_ be updated for the enhancement to be released. |
| 66 | +--> |
| 67 | + |
| 68 | +Items marked with (R) are required *prior to targeting to a milestone / release*. |
| 69 | + |
| 70 | +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) |
| 71 | +- [ ] (R) KEP approvers have approved the KEP status as `implementable` |
| 72 | +- [ ] (R) Design details are appropriately documented |
| 73 | +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) |
| 74 | + - [ ] e2e Tests for all Beta API Operations (endpoints) |
| 75 | + - [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) |
| 76 | + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free |
| 77 | +- [ ] (R) Graduation criteria is in place |
| 78 | + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) |
| 79 | +- [ ] (R) Production readiness review completed |
| 80 | +- [ ] (R) Production readiness review approved |
| 81 | +- [ ] "Implementation History" section is up-to-date for milestone |
| 82 | +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] |
| 83 | +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes |
| 84 | + |
| 85 | +<!-- |
| 86 | +**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone. |
| 87 | +--> |
| 88 | + |
| 89 | +[kubernetes.io]: https://kubernetes.io/ |
| 90 | +[kubernetes/enhancements]: https://git.k8s.io/enhancements |
| 91 | +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes |
| 92 | +[kubernetes/website]: https://git.k8s.io/website |
48 | 93 |
|
49 | 94 | ## Summary
|
50 | 95 |
|
@@ -209,6 +254,25 @@ CPU and memory limit configurations from runtime.
|
209 | 254 | These CRI changes are a separate effort that does not affect the design
|
210 | 255 | proposed in this KEP.
|
211 | 256 |
|
| 257 | +### Risks and Mitigations |
| 258 | + |
| 259 | +1. Backward compatibility: When Pod.Spec.Containers[i].Resources becomes |
| 260 | + representative of desired state, and Pod's true resource allocations are |
| 261 | + tracked in Pod.Status.ContainerStatuses[i].ResourcesAllocated, applications |
| 262 | + that query PodSpec and rely on Resources in PodSpec to determine resource |
| 263 | + allocations will see values that may not represent actual allocations. As a |
| 264 | + mitigation, this change needs to be documented and highlighted in the |
| 265 | + release notes, and in top-level Kubernetes documents. |
| 266 | +1. Resizing memory lower: Lowering cgroup memory limits may not work as pages |
| 267 | + could be in use, and approaches such as setting limit near current usage may |
| 268 | + be required. This issue needs further investigation. |
| 269 | +1. Older client versions: Previous versions of clients that are unaware of the |
| 270 | + new ResourcesAllocated and ResizePolicy fields would set them to nil. To |
| 271 | + keep compatibility, PodResourceAllocation admission controller mutates such |
| 272 | + an update by copying non-nil values from the old Pod to current Pod. |
| 273 | + |
| 274 | +## Design Details |
| 275 | + |
212 | 276 | ### Kubelet and API Server Interaction
|
213 | 277 |
|
214 | 278 | When a new Pod is created, Scheduler is responsible for selecting a suitable
|
@@ -489,23 +553,6 @@ Other components:
|
489 | 553 | 1. Allow resource limits to be updated (VPA feature).
|
490 | 554 | 1. Handle pod-scoped resources (https://github.com/kubernetes/enhancements/pull/1592)
|
491 | 555 |
|
492 |
| -### Risks and Mitigations |
493 |
| - |
494 |
| -1. Backward compatibility: When Pod.Spec.Containers[i].Resources becomes |
495 |
| - representative of desired state, and Pod's true resource allocations are |
496 |
| - tracked in Pod.Status.ContainerStatuses[i].ResourcesAllocated, applications |
497 |
| - that query PodSpec and rely on Resources in PodSpec to determine resource |
498 |
| - allocations will see values that may not represent actual allocations. As a |
499 |
| - mitigation, this change needs to be documented and highlighted in the |
500 |
| - release notes, and in top-level Kubernetes documents. |
501 |
| -1. Resizing memory lower: Lowering cgroup memory limits may not work as pages |
502 |
| - could be in use, and approaches such as setting limit near current usage may |
503 |
| - be required. This issue needs further investigation. |
504 |
| -1. Older client versions: Previous versions of clients that are unaware of the |
505 |
| - new ResourcesAllocated and ResizePolicy fields would set them to nil. To |
506 |
| - keep compatibility, PodResourceAllocation admission controller mutates such |
507 |
| - an update by copying non-nil values from the old Pod to current Pod. |
508 |
| - |
509 | 556 | ### Test Plan
|
510 | 557 |
|
511 | 558 | #### Unit Tests
|
@@ -875,3 +922,22 @@ _This section must be completed when targeting beta graduation to a release._
|
875 | 922 | - 2020-11-06 - Updated with feedback from reviews
|
876 | 923 | - 2020-12-09 - Add "Deferred"
|
877 | 924 | - 2021-02-05 - Final consensus on resourcesAllocated[] and resize[]
|
| 925 | + |
| 926 | +## Drawbacks |
| 927 | + |
| 928 | +<!-- |
| 929 | +Why should this KEP _not_ be implemented? |
| 930 | +--> |
| 931 | + |
| 932 | +There are no drawbacks that we are aware of. |
| 933 | + |
| 934 | +## Alternatives |
| 935 | + |
| 936 | +<!-- |
| 937 | +What other approaches did you consider, and why did you rule them out? These do |
| 938 | +not need to be as detailed as the proposal, but should include enough |
| 939 | +information to express the idea and why it was not acceptable. |
| 940 | +--> |
| 941 | + |
| 942 | +We considered having scheduler approve the resize. We also considered PodSpec as |
| 943 | +the location to checkpoint allocated resources. |
0 commit comments