Skip to content

Commit 29fe6ba

Browse files
Create 20200529-pod-cost-annotations (kubernetes#1828)
* Create pod-cost annotation proposal * ./hack/update-toc.sh
1 parent 2c10835 commit 29fe6ba

File tree

2 files changed

+314
-0
lines changed

2 files changed

+314
-0
lines changed

keps/sig-apps/2255-pod-cost/README.md

Lines changed: 267 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,267 @@
1+
# KEP-2255: Add pod-cost annotation for ReplicaSet
2+
3+
4+
<!-- toc -->
5+
- [Release Signoff Checklist](#release-signoff-checklist)
6+
- [Summary](#summary)
7+
- [Motivation](#motivation)
8+
- [Goals](#goals)
9+
- [Non-Goals](#non-goals)
10+
- [Proposal](#proposal)
11+
- [User Stories (optional)](#user-stories-optional)
12+
- [Story 1](#story-1)
13+
- [Story 2](#story-2)
14+
- [Risks and Mitigations](#risks-and-mitigations)
15+
- [Design Details](#design-details)
16+
- [Test Plan](#test-plan)
17+
- [Graduation Criteria](#graduation-criteria)
18+
- [Alpha -&gt; Beta Graduation](#alpha---beta-graduation)
19+
- [Beta -&gt; GA Graduation](#beta---ga-graduation)
20+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
21+
- [Version Skew Strategy](#version-skew-strategy)
22+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
23+
- [Feature enablement and rollback](#feature-enablement-and-rollback)
24+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
25+
- [Monitoring requirements](#monitoring-requirements)
26+
- [Dependencies](#dependencies)
27+
- [Scalability](#scalability)
28+
- [Troubleshooting](#troubleshooting)
29+
- [Implementation History](#implementation-history)
30+
- [Drawbacks](#drawbacks)
31+
- [Alternatives](#alternatives)
32+
<!-- /toc -->
33+
34+
## Release Signoff Checklist
35+
36+
37+
Items marked with (R) are required *prior to targeting to a milestone / release*.
38+
39+
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
40+
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
41+
- [ ] (R) Design details are appropriately documented
42+
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
43+
- [ ] (R) Graduation criteria is in place
44+
- [ ] (R) Production readiness review completed
45+
- [ ] Production readiness review approved
46+
- [ ] "Implementation History" section is up-to-date for milestone
47+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
48+
- [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
49+
50+
51+
[kubernetes.io]: https://kubernetes.io/
52+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
53+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
54+
[kubernetes/website]: https://git.k8s.io/website
55+
56+
## Summary
57+
58+
This feature allows making a suggestion to the ReplicaSet controller, which pod of a Deployment should be deleted first when a scale-down event happens. This can prevent session disruption in stateful applications in a trivial manner.
59+
60+
## Motivation
61+
62+
For some applications, it is necessary that the application can tell Kubernetes which pod can be deleted and which replica has to be protected. The reason for this is that some applications do have stateful sessions and it is not possible to put such an application into Kubernetes because of session termination resulting from "random" down-scale. If the application is able to tell Kubernetes which of the replicas contains no/few/less important active sessions, this would solve many problems. This feature is non-disruptive to the default behaviour. Only if the annotation is existing, it will make a difference in deletion order.
63+
64+
### Goals
65+
66+
To recommend which pod gets deleted next of a ReplicaSet. This should help to avoid major reworks in existing applications architecture:
67+
* [45509](https://github.com/kubernetes/kubernetes/issues/45509) - Scale down a deployment by removing specific pods
68+
69+
70+
### Non-Goals
71+
72+
Guaranteed (in contrast to the recommendation stated in Goals) deletion of a selected replica.
73+
74+
## Proposal
75+
76+
The application can set the `controller.kubernetes.io/pod-cost` annotation to a pod through the Kubernetes API. When a downscale event happens, the pod with the lower priority value of the previously set annotation will be deleted first. If one pod of the Deployment has no priority annotation set, it will be treated as the lowest priority.
77+
78+
If all pods have the same priority, there is no difference in the normal pod delete decision behaviour. The same applies if the pod-cost annotation is not used at all.
79+
80+
The pod-cost annotation can be changed during operation, for example, if workload changes or a new master gets elected.
81+
82+
### User Stories (optional)
83+
84+
85+
#### Story 1
86+
87+
In an application environment with stateful worker (user-)sessions, it is essential to keep the user sessions alive as good as possible. In case of a scale-down event, the application has to tell the scheduler, which delete decision would have the lowest impact on existing sessions.
88+
89+
#### Story 2
90+
91+
An application consists of identical server processes, but one of the replicas will be the master, which should be kept as long as possible. All other replicas can be treated as cattle workload. Then the master can set the priority annotation with a high priority value as soon as it has finished its startup process. The other replicas can remain either without any priority set, or e.g. with all the same, lower priority. This ensures, that the master replica of this deployment will be protected in a downscale situation.
92+
93+
94+
### Risks and Mitigations
95+
96+
On previous Kubernetes ReplicaSet controller versions that don't implement the pod-cost annotation feature, the same application might make false assumptions about the protection of a master instance or workers with open (user-)sessions on it. As the pod-cost annotation would be only a suggestion to the ReplicaSet controller, the application developer should, however, handle the case of a failed master instance or broken user sessions. The feature is just an improvement, not a guarantee, as there might happen timing issues between setting the annotation and the next controller scale-down event.
97+
98+
## Design Details
99+
100+
101+
### Test Plan
102+
103+
* Units test in kube-controller-manager package to test a variety of scenarios.
104+
* New E2E Tests to validate that replicas get deleted as expected e.g:
105+
* Replicas with lower pod-cost before replicas with higher pod-cost
106+
* Replicas with no pod-cost annotation set before replicas with low priority
107+
108+
### Graduation Criteria
109+
110+
#### Alpha -> Beta Graduation
111+
* Implemented feedback from alpha testers
112+
* Thorough E2E and unit testing in place
113+
114+
#### Beta -> GA Graduation
115+
* Significant number of end-users are using the feature
116+
* We're confident that no further API changes will be needed to achieve the goals of the KEP
117+
* All known functional bugs have been fixed
118+
119+
### Upgrade / Downgrade Strategy
120+
121+
When upgrading no changes are needed to maintain existing behaviour as all of this behaviour is fully optional and disabled by default. To activate this feature either a user has to make an annotation to a pod in a Deployment by hand or the application annotates a pod in a Deployment through the API.
122+
123+
When downgrading, there is no need to changing anything, as this is just a pod annotation, which is uncritical.
124+
125+
### Version Skew Strategy
126+
127+
As this feature is based on pod annotations, there is no issue with different Kubernetes versions. The lack of this feature in older versions may change the efficiency and reliability of the applications.
128+
129+
## Production Readiness Review Questionnaire
130+
131+
### Feature enablement and rollback
132+
133+
* **How can this feature be enabled / disabled in a live cluster?**
134+
- [x] Other
135+
- Make special pod annotations within a live Deployment
136+
137+
138+
* **Does enabling the feature change any default behavior?**
139+
- No
140+
141+
142+
* **Can the feature be disabled once it has been enabled (i.e. can we rollback
143+
the enablement)?**
144+
- One can either remove the annotations or downgrade to an older Kubernetes release
145+
146+
147+
* **What happens if we reenable the feature if it was previously rolled back?**
148+
- Then the feature will be reenabled. Nothing special to consider here.
149+
150+
151+
* **Are there any tests for feature enablement/disablement?**
152+
153+
154+
### Rollout, Upgrade and Rollback Planning
155+
156+
_This section must be completed when targeting beta graduation to a release._
157+
158+
* **How can a rollout fail? Can it impact already running workloads?**
159+
- As the feature is a simple annoation, the worst what could happen is that either the annotation is lost or ignored. In the worst case, a pod with a higher priority gets deleted before a pod with a lower priority.
160+
161+
162+
* **What specific metrics should inform a rollback?**
163+
- None
164+
165+
166+
* **Were upgrade and rollback tested? Was upgrade->downgrade->upgrade path tested?**
167+
- Was tested. Behaviour change in both directions, as expected.
168+
169+
170+
* **Is the rollout accompanied by any deprecations and/or removals of features,
171+
APIs, fields of API types, flags, etc.?**
172+
- No. However, the exact same pod annotation string cannot be used for any other purposes.
173+
174+
175+
### Monitoring requirements
176+
177+
_This section must be completed when targeting beta graduation to a release._
178+
179+
* **How can an operator determine if the feature is in use by workloads?**
180+
- Search for pod annotations with the exact same pod-cost annotation string.
181+
182+
183+
* **What are the SLIs (Service Level Indicators) an operator can use to
184+
determine the health of the service?**
185+
- A pod with a lower pod-cost annotation in a Deployment gets deleted first on a scale-down event.
186+
187+
188+
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
189+
- All pods with a lower pod-cost annotation in a Deployment are deleted first on a scale-down event.
190+
191+
* **Are there any missing metrics that would be useful to have to improve
192+
observability if this feature?**
193+
- N/A
194+
195+
### Dependencies
196+
197+
_This section must be completed when targeting beta graduation to a release._
198+
199+
* **Does this feature depend on any specific services running in the cluster?**
200+
- The feature requires the existing of the kube-controller-manager and the ability and permissions to set pod annotations.
201+
202+
203+
### Scalability
204+
205+
_For alpha, this section is encouraged: reviewers should consider these questions
206+
and attempt to answer them._
207+
208+
_For beta, this section is required: reviewers must answer these questions._
209+
210+
_For GA, this section is required: approvers should be able to confirms the
211+
previous answers based on experience in the field._
212+
213+
* **Will enabling / using this feature result in any new API calls?**
214+
- Whenever the application decides, that a change in pod-cost is needed for a replica, it will send out an API request and set the appropriate pod annotation(s).
215+
216+
217+
* **Will enabling / using this feature result in introducing new API types?**
218+
- No.
219+
220+
221+
* **Will enabling / using this feature result in any new calls to cloud
222+
provider?**
223+
- No.
224+
225+
226+
* **Will enabling / using this feature result in increasing size or count
227+
of the existing API objects?**
228+
Describe them providing:
229+
- API type(s): Pod annotation
230+
- Estimated increase in size: Size of a new annotation
231+
- Estimated amount of new objects: new annotation for potentially every existing Pod
232+
233+
234+
* **Will enabling / using this feature result in increasing time taken by any
235+
operations covered by [existing SLIs/SLOs][]?**
236+
- The time it takes to set/delete/change a pod annotation
237+
238+
239+
* **Will enabling / using this feature result in non-negligible increase of
240+
resource usage (CPU, RAM, disk, IO, ...) in any components?**
241+
- The resources it takes to set/delete/change a pod annotation
242+
243+
244+
### Troubleshooting
245+
246+
_This section must be completed when targeting beta graduation to a release._
247+
248+
* **How does this feature react if the API server and/or etcd is unavailable?**
249+
- The pod annotation can't be set. The normal pod deletion behavior will be used for non-annotated pods in a Deployment.
250+
* **What are other known failure modes?**
251+
- None.
252+
253+
* **What steps should be taken if SLOs are not being met to determine the problem?**
254+
- N/A
255+
256+
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
257+
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
258+
259+
## Implementation History
260+
261+
262+
## Drawbacks
263+
264+
265+
## Alternatives
266+
267+
Similar behaviour can be achieved through the Operator Framework which however will take a lot more configuration and setup work and is not a built-in Kubernetes feature.

keps/sig-apps/2255-pod-cost/kep.yaml

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
title: Add pod-cost annotation for ReplicaSet
2+
kep-number: 2255
3+
authors:
4+
- "@drbugfinder-work"
5+
- "@ahg-g"
6+
- "@alculquicondor"
7+
owning-sig: sig-apps
8+
participating-sigs:
9+
status: provisional
10+
creation-date: 2021-01-12
11+
reviewers:
12+
- "@ahg-g"
13+
- "@janetkuo"
14+
- "@alculquicondor"
15+
approvers:
16+
- "@janetkuo"
17+
see-also:
18+
- https://github.com/kubernetes/kubernetes/issues/45509
19+
- https://github.com/kubernetes/enhancements/issues/2255
20+
replaces:
21+
22+
# The target maturity stage in the current dev cycle for this KEP.
23+
stage: alpha
24+
25+
# The most recent milestone for which work toward delivery of this KEP has been
26+
# done. This can be the current (upcoming) milestone, if it is being actively
27+
# worked on.
28+
latest-milestone: "v1.21"
29+
30+
# The milestone at which this feature was, or is targeted to be, at each stage.
31+
milestone:
32+
alpha: "v1.21"
33+
beta: "v1.22"
34+
stable: "v1.24"
35+
36+
# The following PRR answers are required at alpha release
37+
# List the feature gate name and the components for which it must be enabled
38+
#feature-gates:
39+
# - name: MyFeature
40+
# components:
41+
# - kube-apiserver
42+
# - kube-controller-manager
43+
disable-supported: true
44+
45+
# The following PRR answers are required at beta release
46+
#metrics:
47+
# - my_feature_metric

0 commit comments

Comments
 (0)