Skip to content

Commit 281068e

Browse files
committed
Updates the ttl-to-finished KEP to graduate the feature to Beta.
Enhancement issue: kubernetes#592
1 parent 29fe6ba commit 281068e

File tree

3 files changed

+250
-58
lines changed

3 files changed

+250
-58
lines changed

keps/prod-readiness/sig-apps/592.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 592
2+
beta:
3+
approver: "@wojtek-t"

keps/sig-apps/592-ttl-after-finish/README.md

Lines changed: 211 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,20 @@
1616
- [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
1717
- [TTL Controller](#ttl-controller)
1818
- [Finished Jobs](#finished-jobs)
19-
- [Finished Pods](#finished-pods)
2019
- [Owner References](#owner-references)
2120
- [Risks and Mitigations](#risks-and-mitigations)
2221
- [Graduation Criteria](#graduation-criteria)
22+
- [Alpha](#alpha)
23+
- [Alpha -> Beta](#alpha---beta)
24+
- [Beta -> GA](#beta---ga)
25+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
26+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
27+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
28+
- [Monitoring Requirements](#monitoring-requirements)
29+
- [Dependencies](#dependencies)
30+
- [Scalability](#scalability)
31+
- [Troubleshooting](#troubleshooting)
32+
- [Future Work](#future-work)
2333
- [Implementation History](#implementation-history)
2434
<!-- /toc -->
2535

@@ -106,24 +116,6 @@ This allows Jobs to be cleaned up after they finish and provides time for
106116
asynchronous clients to observe Jobs' final states before they are deleted.
107117

108118

109-
Similarly, we will add the following API fields to `PodSpec` (`Pod`'s `.spec`).
110-
111-
```go
112-
type PodSpec struct {
113-
// ttlSecondsAfterFinished limits the lifetime of a Pod that has finished
114-
// execution (either Succeeded or Failed). If this field is set, once the Pod
115-
// finishes, it will be deleted after ttlSecondsAfterFinished expires. When
116-
// the Pod is being deleted, its lifecycle guarantees (e.g. finalizers) will
117-
// be honored. If this field is unset, ttlSecondsAfterFinished will not
118-
// expire. If this field is set to zero, ttlSecondsAfterFinished expires
119-
// immediately after the Pod finishes.
120-
// This field is alpha-level and is only honored by servers that enable the
121-
// TTLAfterFinished feature.
122-
// +optional
123-
TTLSecondsAfterFinished *int32
124-
}
125-
```
126-
127119
##### Validation
128120

129121
Because Job controller depends on Pods to exist to work correctly. In Job
@@ -157,16 +149,16 @@ The steps are as easy as:
157149
### Implementation Details/Notes/Constraints
158150

159151
#### TTL Controller
160-
We will add a TTL controller for finished Jobs and finished Pods. We considered
152+
We will add a TTL controller for finished Jobs. We considered
161153
adding it in Job controller, but decided not to, for the following reasons:
162154

163155
1. Job controller should focus on managing Pods based on the Job's spec and pod
164156
template, but not cleaning up Jobs.
165-
1. We also need the TTL controller to clean up finished Pods, and we consider
157+
1. We also need the TTL controller to clean up finished Pods in the future, and we consider
166158
generalizing TTL controller later for custom resources.
167159

168-
The TTL controller utilizes informer framework, watches all Jobs and Pods, and
169-
read Jobs and Pods from a local cache.
160+
The TTL controller utilizes informer framework, watches all Jobs, and
161+
read Jobs from a local cache.
170162

171163
#### Finished Jobs
172164

@@ -192,29 +184,6 @@ When a Job is created or updated:
192184
the Job after a computed amount of time when it will expire.
193185
1. Delete the Job if passing the sanity checks.
194186

195-
#### Finished Pods
196-
197-
When a Pod is created or updated:
198-
1. Check its `.status.phase` to see if it has finished (`Succeeded` or `Failed`).
199-
If it hasn't finished, do nothing.
200-
1. Otherwise, if the Pod has finished, check if Pod's
201-
`.spec.ttlSecondsAfterFinished` field is set. Do nothing if the TTL field is
202-
not set.
203-
1. Otherwise, if the TTL field is set, check if the TTL has expired, i.e.
204-
`.spec.ttlSecondsAfterFinished` + the time when the Pod finishes (max of all
205-
of its containers termination time
206-
`.containerStatuses.state.terminated.finishedAt`) > now.
207-
1. If the TTL hasn't expired, delay re-enqueuing the Pod after a computed amount
208-
of time when it will expire. The computed time period is:
209-
(`.spec.ttlSecondsAfterFinished` + the time when the Pod finishes - now).
210-
1. If the TTL has expired, `GET` the Pod from API server to do final sanity
211-
checks before deleting it.
212-
1. Check if the freshly got Pod's TTL has expired. This field may be updated
213-
before TTL controller observes the new value in its local cache.
214-
* If it hasn't expired, it is not safe to delete the Pod. Delay re-enqueue
215-
the Pod after a computed amount of time when it will expire.
216-
1. Delete the Pod if passing the sanity checks.
217-
218187
#### Owner References
219188

220189
We have considered making TTL controller leave a Job/Pod around even after its
@@ -250,17 +219,205 @@ Mitigations:
250219

251220
## Graduation Criteria
252221

253-
We want to implement this feature for Pods/Jobs first to gather feedback, and
254-
decide whether to generalize it to custom resources. This feature can be
255-
promoted to beta after we finalize the decision for whether to generalize it or
256-
not, and when it satisfies users' need for cleaning up finished resource
257-
objects, without regressions.
222+
### Alpha
258223

259-
This will be promoted to GA once it's gone a sufficient amount of time as beta
260-
with no changes.
224+
- For alpha graduation, the feature implemented for Job, as future work it can be extended to Pods, but that should happen under a separate feature flag.
225+
- Unit and e2e tests
226+
227+
### Alpha -> Beta
228+
229+
- Appropriate metrics are agreed on and implemented
230+
- upgrade/rollback manually tested
231+
232+
### Beta -> GA
233+
234+
- Make a decision on wehther or not the feature should be extended to pods
235+
- Enabled in Beta for at least two releases without complaints
261236

262237
[umbrella issues]: https://github.com/kubernetes/kubernetes/issues/42752
263238

264-
## Implementation History
239+
## Production Readiness Review Questionnaire
240+
241+
### Feature Enablement and Rollback
242+
243+
* **How can this feature be enabled / disabled in a live cluster?**
244+
- [x] Feature gate (also fill in values in `kep.yaml`)
245+
- Feature gate name: TTLAfterFinished
246+
- Components depending on the feature gate: kube-apiserver, kube-controller-manager
247+
- [ ] Other
248+
- Describe the mechanism:
249+
- Will enabling / disabling the feature require downtime of the control
250+
plane?
251+
- Will enabling / disabling the feature require downtime or reprovisioning
252+
of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).
253+
254+
* **Does enabling the feature change any default behavior?**
255+
No.
256+
257+
* **Can the feature be disabled once it has been enabled (i.e. can we roll back
258+
the enablement)?**
259+
Yes. One caveat here is that Jobs created with TTLSecondsAfterFinished set when
260+
the feature was enabled will continue to have that field set when the feature is disabled,
261+
but will not have any effect.
262+
263+
* **What happens if we reenable the feature if it was previously rolled back?**
264+
It should work as expected.
265+
266+
* **Are there any tests for feature enablement/disablement?**
267+
No.
268+
269+
### Rollout, Upgrade and Rollback Planning
270+
271+
* **How can a rollout fail? Can it impact already running workloads?**
272+
It shouldn't impact already running workloads. This is an opt-in feature since
273+
users need to explicitly set the TTLSecondsAfterFinished parameter in the job spec,
274+
if the feature is disabled the field is preserved if it was already set in the
275+
presisted Job object, otherwise it is silently dropped.
276+
277+
* **What specific metrics should inform a rollback?**
278+
- Unexpected restarts of kube-controller-manager
279+
- Extended 4xx/5xx on the Jobs endpoint from kube-apiserver
280+
281+
* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
282+
Manually tested. No issues were found.
283+
284+
* **Is the rollout accompanied by any deprecations and/or removals of features, APIs,
285+
fields of API types, flags, etc.?**
286+
No
287+
288+
### Monitoring Requirements
289+
290+
_This section must be completed when targeting beta graduation to a release._
291+
292+
* **How can an operator determine if the feature is in use by workloads?**
293+
- The `workqueue_adds_total{name="ttl_jobs_to_delete"}` tracks the number of
294+
finished Jobs with ttlSecondsAfterFinished set.
295+
- Listing jobs in the cluster and checking if any has ttlSecondsAfterFinished field set.
296+
297+
* **What are the SLIs (Service Level Indicators) an operator can use to determine
298+
the health of the service?**
299+
- [x] Metrics
300+
- Components exposing the metric: `kube-controller-manager`
301+
- Metric name: `ttl_after_finished_controller_rate_limiter_use`
302+
- Metric name: `workqueue_adds_total{name="ttl_jobs_to_delete"}`
303+
- Metric name: `workqueue_depth{name="ttl_jobs_to_delete"}`
304+
- Metric name: `workqueue_queue_duration_seconds{name="ttl_jobs_to_delete"}`
305+
- Metric name: `workqueue_retries_total{name="ttl_jobs_to_delete"}`
306+
- Components exposing the metric: `kube-apiserver`
307+
- Metric name: `etcd_object_counts{resource="jobs.batch"}`
308+
309+
310+
We will also add the following new histogram metric exposed by kube-controller-manager:
311+
- `ttl_after_finished_controller_time_to_deletion_seconds` which tracks the time it took
312+
the delete the job since it became eligible (actual-delete-timestamp - (job-finished-timestamp + ttlAfterFinished)).
265313

314+
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
315+
316+
99% of the jobs that needs cleanup are deleted within X minutes.
317+
318+
This can be implemented using the `ttl_after_finished_controller_time_to_deletion_seconds`
319+
histogram.
320+
321+
* **Are there any missing metrics that would be useful to have to improve observability
322+
of this feature?**
323+
324+
No
325+
326+
### Dependencies
327+
328+
_This section must be completed when targeting beta graduation to a release._
329+
330+
* **Does this feature depend on any specific services running in the cluster?**
331+
No.
332+
333+
### Scalability
334+
335+
* **Will enabling / using this feature result in any new API calls?**
336+
- API call type: DELETE jobs
337+
- Estimated throughput: the upper bound is equal to Job creation rate.
338+
- originating component(s): kube-controller-manager
339+
340+
* **Will enabling / using this feature result in introducing new API types?**
341+
No.
342+
343+
* **Will enabling / using this feature result in any new calls to the cloud
344+
provider?**
345+
No.
346+
347+
* **Will enabling / using this feature result in increasing size or count of
348+
the existing API objects?**
349+
Yes. An int field is added to the Job object.
350+
351+
* **Will enabling / using this feature result in increasing time taken by any
352+
operations covered by [existing SLIs/SLOs]?**
353+
No.
354+
355+
* **Will enabling / using this feature result in non-negligible increase of
356+
resource usage (CPU, RAM, disk, IO, ...) in any components?**
357+
kube-controller-manager may consume more CPU depending on the number of jobs that require deletion in the system.
358+
359+
### Troubleshooting
360+
361+
_This section must be completed when targeting beta graduation to a release._
362+
363+
* **How does this feature react if the API server and/or etcd is unavailable?**
364+
The controller will not be notified of job updates and it can't deleted existing ones.
365+
366+
* **What are other known failure modes?**
367+
None.
368+
369+
* **What steps should be taken if SLOs are not being met to determine the problem?**
266370
TBD
371+
372+
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
373+
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
374+
375+
## Future Work
376+
377+
As a future work, ttl-after-finished can be added to Pods. The API is similar to the Job's one:
378+
379+
```go
380+
type PodSpec struct {
381+
// ttlSecondsAfterFinished limits the lifetime of a Pod that has finished
382+
// execution (either Succeeded or Failed). If this field is set, once the Pod
383+
// finishes, it will be deleted after ttlSecondsAfterFinished expires. When
384+
// the Pod is being deleted, its lifecycle guarantees (e.g. finalizers) will
385+
// be honored. If this field is unset, ttlSecondsAfterFinished will not
386+
// expire. If this field is set to zero, ttlSecondsAfterFinished expires
387+
// immediately after the Pod finishes.
388+
// This field is alpha-level and is only honored by servers that enable the
389+
// TTLAfterFinished feature.
390+
// +optional
391+
TTLSecondsAfterFinished *int32
392+
}
393+
```
394+
395+
The TTL controller can be changed to watch Pods in addition to Jobs.
396+
397+
When a Pod is created or updated:
398+
1. Check its `.status.phase` to see if it has finished (`Succeeded` or `Failed`).
399+
If it hasn't finished, do nothing.
400+
1. Otherwise, if the Pod has finished, check if Pod's
401+
`.spec.ttlSecondsAfterFinished` field is set. Do nothing if the TTL field is
402+
not set.
403+
1. Otherwise, if the TTL field is set, check if the TTL has expired, i.e.
404+
`.spec.ttlSecondsAfterFinished` + the time when the Pod finishes (max of all
405+
of its containers termination time
406+
`.containerStatuses.state.terminated.finishedAt`) > now.
407+
1. If the TTL hasn't expired, delay re-enqueuing the Pod after a computed amount
408+
of time when it will expire. The computed time period is:
409+
(`.spec.ttlSecondsAfterFinished` + the time when the Pod finishes - now).
410+
1. If the TTL has expired, `GET` the Pod from API server to do final sanity
411+
checks before deleting it.
412+
1. Check if the freshly got Pod's TTL has expired. This field may be updated
413+
before TTL controller observes the new value in its local cache.
414+
* If it hasn't expired, it is not safe to delete the Pod. Delay re-enqueue
415+
the Pod after a computed amount of time when it will expire.
416+
1. Delete the Pod if passing the sanity checks.
417+
418+
## Implementation History
419+
- 2018-08-16: Initial KEP
420+
- 2021-01-08: KEP updated to
421+
- indicate that the feature will be graduated for Jobs, and that Pods will be done as future work under a separate flag
422+
- add production readiness questionnaire
423+
- mark the feature for Beta graduation for jobs.

keps/sig-apps/592-ttl-after-finish/kep.yaml

Lines changed: 36 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,21 +2,53 @@ title: TTL After Finished
22
kep-number: 592
33
authors:
44
- "@janetkuo"
5+
- "@ahg-g"
56
owning-sig: sig-apps
67
participating-sigs:
78
- sig-api-machinery
9+
status: implementable
10+
creation-date: 2018-08-16
811
reviewers:
912
- "@enisoc"
1013
- "@tnozicka"
1114
approvers:
1215
- "@kow3ns"
13-
editor: TBD
14-
creation-date: 2018-08-16
15-
last-updated: 2018-08-16
16-
status: provisional
16+
prr-approvers:
17+
- "@wojtek-t"
1718
see-also:
1819
- n/a
1920
replaces:
2021
- n/a
2122
superseded-by:
2223
- n/a
24+
25+
# The target maturity stage in the current dev cycle for this KEP.
26+
stage: beta
27+
28+
# The most recent milestone for which work toward delivery of this KEP has been
29+
# done. This can be the current (upcoming) milestone, if it is being actively
30+
# worked on.
31+
latest-milestone: "v1.21"
32+
33+
# The milestone at which this feature was, or is targeted to be, at each stage.
34+
milestone:
35+
alpha: "v1.12"
36+
beta: "v1.21"
37+
stable: "v1.23"
38+
39+
# The following PRR answers are required at alpha release
40+
# List the feature gate name and the components for which it must be enabled
41+
feature-gates:
42+
- name: TTLAfterFinished
43+
components:
44+
- kube-apiserver
45+
- kube-controller-manager
46+
disable-supported: true
47+
48+
metrics:
49+
- ttl_after_finished_controller_rate_limiter_use
50+
- ttl_after_finished_controller_time_to_deletion_seconds
51+
- workqueue_adds_total{name=ttl_jobs_to_delete}
52+
- workqueue_depth{name=ttl_jobs_to_delete}
53+
- workqueue_queue_duration_seconds{name=ttl_jobs_to_delete}
54+
- workqueue_retries_total{name=ttl_jobs_to_delete}

0 commit comments

Comments
 (0)