16
16
- [ Implementation Details/Notes/Constraints] ( #implementation-detailsnotesconstraints )
17
17
- [ TTL Controller] ( #ttl-controller )
18
18
- [ Finished Jobs] ( #finished-jobs )
19
- - [ Finished Pods] ( #finished-pods )
20
19
- [ Owner References] ( #owner-references )
21
20
- [ Risks and Mitigations] ( #risks-and-mitigations )
22
21
- [ Graduation Criteria] ( #graduation-criteria )
22
+ - [ Alpha] ( #alpha )
23
+ - [ Alpha -> ; Beta] ( #alpha---beta )
24
+ - [ Beta -> ; GA] ( #beta---ga )
25
+ - [ Production Readiness Review Questionnaire] ( #production-readiness-review-questionnaire )
26
+ - [ Feature Enablement and Rollback] ( #feature-enablement-and-rollback )
27
+ - [ Rollout, Upgrade and Rollback Planning] ( #rollout-upgrade-and-rollback-planning )
28
+ - [ Monitoring Requirements] ( #monitoring-requirements )
29
+ - [ Dependencies] ( #dependencies )
30
+ - [ Scalability] ( #scalability )
31
+ - [ Troubleshooting] ( #troubleshooting )
32
+ - [ Future Work] ( #future-work )
23
33
- [ Implementation History] ( #implementation-history )
24
34
<!-- /toc -->
25
35
@@ -106,24 +116,6 @@ This allows Jobs to be cleaned up after they finish and provides time for
106
116
asynchronous clients to observe Jobs' final states before they are deleted.
107
117
108
118
109
- Similarly, we will add the following API fields to ` PodSpec ` (` Pod ` 's ` .spec ` ).
110
-
111
- ``` go
112
- type PodSpec struct {
113
- // ttlSecondsAfterFinished limits the lifetime of a Pod that has finished
114
- // execution (either Succeeded or Failed). If this field is set, once the Pod
115
- // finishes, it will be deleted after ttlSecondsAfterFinished expires. When
116
- // the Pod is being deleted, its lifecycle guarantees (e.g. finalizers) will
117
- // be honored. If this field is unset, ttlSecondsAfterFinished will not
118
- // expire. If this field is set to zero, ttlSecondsAfterFinished expires
119
- // immediately after the Pod finishes.
120
- // This field is alpha-level and is only honored by servers that enable the
121
- // TTLAfterFinished feature.
122
- // +optional
123
- TTLSecondsAfterFinished *int32
124
- }
125
- ```
126
-
127
119
##### Validation
128
120
129
121
Because Job controller depends on Pods to exist to work correctly. In Job
@@ -157,16 +149,16 @@ The steps are as easy as:
157
149
### Implementation Details/Notes/Constraints
158
150
159
151
#### TTL Controller
160
- We will add a TTL controller for finished Jobs and finished Pods . We considered
152
+ We will add a TTL controller for finished Jobs. We considered
161
153
adding it in Job controller, but decided not to, for the following reasons:
162
154
163
155
1 . Job controller should focus on managing Pods based on the Job's spec and pod
164
156
template, but not cleaning up Jobs.
165
- 1 . We also need the TTL controller to clean up finished Pods, and we consider
157
+ 1 . We also need the TTL controller to clean up finished Pods in the future , and we consider
166
158
generalizing TTL controller later for custom resources.
167
159
168
- The TTL controller utilizes informer framework, watches all Jobs and Pods , and
169
- read Jobs and Pods from a local cache.
160
+ The TTL controller utilizes informer framework, watches all Jobs, and
161
+ read Jobs from a local cache.
170
162
171
163
#### Finished Jobs
172
164
@@ -192,29 +184,6 @@ When a Job is created or updated:
192
184
the Job after a computed amount of time when it will expire.
193
185
1 . Delete the Job if passing the sanity checks.
194
186
195
- #### Finished Pods
196
-
197
- When a Pod is created or updated:
198
- 1 . Check its ` .status.phase ` to see if it has finished (` Succeeded ` or ` Failed ` ).
199
- If it hasn't finished, do nothing.
200
- 1 . Otherwise, if the Pod has finished, check if Pod's
201
- ` .spec.ttlSecondsAfterFinished ` field is set. Do nothing if the TTL field is
202
- not set.
203
- 1 . Otherwise, if the TTL field is set, check if the TTL has expired, i.e.
204
- ` .spec.ttlSecondsAfterFinished ` + the time when the Pod finishes (max of all
205
- of its containers termination time
206
- ` .containerStatuses.state.terminated.finishedAt ` ) > now.
207
- 1 . If the TTL hasn't expired, delay re-enqueuing the Pod after a computed amount
208
- of time when it will expire. The computed time period is:
209
- (` .spec.ttlSecondsAfterFinished ` + the time when the Pod finishes - now).
210
- 1 . If the TTL has expired, ` GET ` the Pod from API server to do final sanity
211
- checks before deleting it.
212
- 1 . Check if the freshly got Pod's TTL has expired. This field may be updated
213
- before TTL controller observes the new value in its local cache.
214
- * If it hasn't expired, it is not safe to delete the Pod. Delay re-enqueue
215
- the Pod after a computed amount of time when it will expire.
216
- 1 . Delete the Pod if passing the sanity checks.
217
-
218
187
#### Owner References
219
188
220
189
We have considered making TTL controller leave a Job/Pod around even after its
@@ -250,17 +219,205 @@ Mitigations:
250
219
251
220
## Graduation Criteria
252
221
253
- We want to implement this feature for Pods/Jobs first to gather feedback, and
254
- decide whether to generalize it to custom resources. This feature can be
255
- promoted to beta after we finalize the decision for whether to generalize it or
256
- not, and when it satisfies users' need for cleaning up finished resource
257
- objects, without regressions.
222
+ ### Alpha
258
223
259
- This will be promoted to GA once it's gone a sufficient amount of time as beta
260
- with no changes.
224
+ - For alpha graduation, the feature implemented for Job, as future work it can be extended to Pods, but that should happen under a separate feature flag.
225
+ - Unit and e2e tests
226
+
227
+ ### Alpha -> Beta
228
+
229
+ - Appropriate metrics are agreed on and implemented
230
+ - upgrade/rollback manually tested
231
+
232
+ ### Beta -> GA
233
+
234
+ - Make a decision on wehther or not the feature should be extended to pods
235
+ - Enabled in Beta for at least two releases without complaints
261
236
262
237
[ umbrella issues ] : https://github.com/kubernetes/kubernetes/issues/42752
263
238
264
- ## Implementation History
239
+ ## Production Readiness Review Questionnaire
240
+
241
+ ### Feature Enablement and Rollback
242
+
243
+ * ** How can this feature be enabled / disabled in a live cluster?**
244
+ - [x] Feature gate (also fill in values in ` kep.yaml ` )
245
+ - Feature gate name: TTLAfterFinished
246
+ - Components depending on the feature gate: kube-apiserver, kube-controller-manager
247
+ - [ ] Other
248
+ - Describe the mechanism:
249
+ - Will enabling / disabling the feature require downtime of the control
250
+ plane?
251
+ - Will enabling / disabling the feature require downtime or reprovisioning
252
+ of a node? (Do not assume ` Dynamic Kubelet Config ` feature is enabled).
253
+
254
+ * ** Does enabling the feature change any default behavior?**
255
+ No.
256
+
257
+ * ** Can the feature be disabled once it has been enabled (i.e. can we roll back
258
+ the enablement)?**
259
+ Yes. One caveat here is that Jobs created with TTLSecondsAfterFinished set when
260
+ the feature was enabled will continue to have that field set when the feature is disabled,
261
+ but will not have any effect.
262
+
263
+ * ** What happens if we reenable the feature if it was previously rolled back?**
264
+ It should work as expected.
265
+
266
+ * ** Are there any tests for feature enablement/disablement?**
267
+ No.
268
+
269
+ ### Rollout, Upgrade and Rollback Planning
270
+
271
+ * ** How can a rollout fail? Can it impact already running workloads?**
272
+ It shouldn't impact already running workloads. This is an opt-in feature since
273
+ users need to explicitly set the TTLSecondsAfterFinished parameter in the job spec,
274
+ if the feature is disabled the field is preserved if it was already set in the
275
+ presisted Job object, otherwise it is silently dropped.
276
+
277
+ * ** What specific metrics should inform a rollback?**
278
+ - Unexpected restarts of kube-controller-manager
279
+ - Extended 4xx/5xx on the Jobs endpoint from kube-apiserver
280
+
281
+ * ** Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
282
+ Manually tested. No issues were found.
283
+
284
+ * ** Is the rollout accompanied by any deprecations and/or removals of features, APIs,
285
+ fields of API types, flags, etc.?**
286
+ No
287
+
288
+ ### Monitoring Requirements
289
+
290
+ _ This section must be completed when targeting beta graduation to a release._
291
+
292
+ * ** How can an operator determine if the feature is in use by workloads?**
293
+ - The ` workqueue_adds_total{name="ttl_jobs_to_delete"} ` tracks the number of
294
+ finished Jobs with ttlSecondsAfterFinished set.
295
+ - Listing jobs in the cluster and checking if any has ttlSecondsAfterFinished field set.
296
+
297
+ * ** What are the SLIs (Service Level Indicators) an operator can use to determine
298
+ the health of the service?**
299
+ - [x] Metrics
300
+ - Components exposing the metric: ` kube-controller-manager `
301
+ - Metric name: ` ttl_after_finished_controller_rate_limiter_use `
302
+ - Metric name: ` workqueue_adds_total{name="ttl_jobs_to_delete"} `
303
+ - Metric name: ` workqueue_depth{name="ttl_jobs_to_delete"} `
304
+ - Metric name: ` workqueue_queue_duration_seconds{name="ttl_jobs_to_delete"} `
305
+ - Metric name: ` workqueue_retries_total{name="ttl_jobs_to_delete"} `
306
+ - Components exposing the metric: ` kube-apiserver `
307
+ - Metric name: ` etcd_object_counts{resource="jobs.batch"} `
308
+
309
+
310
+ We will also add the following new histogram metric exposed by kube-controller-manager:
311
+ - ` ttl_after_finished_controller_time_to_deletion_seconds ` which tracks the time it took
312
+ the delete the job since it became eligible (actual-delete-timestamp - (job-finished-timestamp + ttlAfterFinished)).
265
313
314
+ * ** What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
315
+
316
+ 99% of the jobs that needs cleanup are deleted within X minutes.
317
+
318
+ This can be implemented using the ` ttl_after_finished_controller_time_to_deletion_seconds `
319
+ histogram.
320
+
321
+ * ** Are there any missing metrics that would be useful to have to improve observability
322
+ of this feature?**
323
+
324
+ No
325
+
326
+ ### Dependencies
327
+
328
+ _ This section must be completed when targeting beta graduation to a release._
329
+
330
+ * ** Does this feature depend on any specific services running in the cluster?**
331
+ No.
332
+
333
+ ### Scalability
334
+
335
+ * ** Will enabling / using this feature result in any new API calls?**
336
+ - API call type: DELETE jobs
337
+ - Estimated throughput: the upper bound is equal to Job creation rate.
338
+ - originating component(s): kube-controller-manager
339
+
340
+ * ** Will enabling / using this feature result in introducing new API types?**
341
+ No.
342
+
343
+ * ** Will enabling / using this feature result in any new calls to the cloud
344
+ provider?**
345
+ No.
346
+
347
+ * ** Will enabling / using this feature result in increasing size or count of
348
+ the existing API objects?**
349
+ Yes. An int field is added to the Job object.
350
+
351
+ * ** Will enabling / using this feature result in increasing time taken by any
352
+ operations covered by [ existing SLIs/SLOs] ?**
353
+ No.
354
+
355
+ * ** Will enabling / using this feature result in non-negligible increase of
356
+ resource usage (CPU, RAM, disk, IO, ...) in any components?**
357
+ kube-controller-manager may consume more CPU depending on the number of jobs that require deletion in the system.
358
+
359
+ ### Troubleshooting
360
+
361
+ _ This section must be completed when targeting beta graduation to a release._
362
+
363
+ * ** How does this feature react if the API server and/or etcd is unavailable?**
364
+ The controller will not be notified of job updates and it can't deleted existing ones.
365
+
366
+ * ** What are other known failure modes?**
367
+ None.
368
+
369
+ * ** What steps should be taken if SLOs are not being met to determine the problem?**
266
370
TBD
371
+
372
+ [ supported limits ] : https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
373
+ [ existing SLIs/SLOs ] : https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
374
+
375
+ ## Future Work
376
+
377
+ As a future work, ttl-after-finished can be added to Pods. The API is similar to the Job's one:
378
+
379
+ ``` go
380
+ type PodSpec struct {
381
+ // ttlSecondsAfterFinished limits the lifetime of a Pod that has finished
382
+ // execution (either Succeeded or Failed). If this field is set, once the Pod
383
+ // finishes, it will be deleted after ttlSecondsAfterFinished expires. When
384
+ // the Pod is being deleted, its lifecycle guarantees (e.g. finalizers) will
385
+ // be honored. If this field is unset, ttlSecondsAfterFinished will not
386
+ // expire. If this field is set to zero, ttlSecondsAfterFinished expires
387
+ // immediately after the Pod finishes.
388
+ // This field is alpha-level and is only honored by servers that enable the
389
+ // TTLAfterFinished feature.
390
+ // +optional
391
+ TTLSecondsAfterFinished *int32
392
+ }
393
+ ```
394
+
395
+ The TTL controller can be changed to watch Pods in addition to Jobs.
396
+
397
+ When a Pod is created or updated:
398
+ 1 . Check its ` .status.phase ` to see if it has finished (` Succeeded ` or ` Failed ` ).
399
+ If it hasn't finished, do nothing.
400
+ 1 . Otherwise, if the Pod has finished, check if Pod's
401
+ ` .spec.ttlSecondsAfterFinished ` field is set. Do nothing if the TTL field is
402
+ not set.
403
+ 1 . Otherwise, if the TTL field is set, check if the TTL has expired, i.e.
404
+ ` .spec.ttlSecondsAfterFinished ` + the time when the Pod finishes (max of all
405
+ of its containers termination time
406
+ ` .containerStatuses.state.terminated.finishedAt ` ) > now.
407
+ 1 . If the TTL hasn't expired, delay re-enqueuing the Pod after a computed amount
408
+ of time when it will expire. The computed time period is:
409
+ (` .spec.ttlSecondsAfterFinished ` + the time when the Pod finishes - now).
410
+ 1 . If the TTL has expired, ` GET ` the Pod from API server to do final sanity
411
+ checks before deleting it.
412
+ 1 . Check if the freshly got Pod's TTL has expired. This field may be updated
413
+ before TTL controller observes the new value in its local cache.
414
+ * If it hasn't expired, it is not safe to delete the Pod. Delay re-enqueue
415
+ the Pod after a computed amount of time when it will expire.
416
+ 1 . Delete the Pod if passing the sanity checks.
417
+
418
+ ## Implementation History
419
+ - 2018-08-16: Initial KEP
420
+ - 2021-01-08: KEP updated to
421
+ - indicate that the feature will be graduated for Jobs, and that Pods will be done as future work under a separate flag
422
+ - add production readiness questionnaire
423
+ - mark the feature for Beta graduation for jobs.
0 commit comments