Skip to content

Commit b067305

Browse files
authored
Merge pull request #37595 from sftim/20221029_improve_ttl_after_finished_concept
Improve concept docs for TTL-after-finished controller / Job cleanup
2 parents 8f902f3 + e47d1c4 commit b067305

File tree

1 file changed

+45
-33
lines changed

1 file changed

+45
-33
lines changed
Lines changed: 45 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,75 +1,87 @@
11
---
22
reviewers:
33
- janetkuo
4-
title: Automatic Clean-up for Finished Jobs
4+
title: Automatic Cleanup for Finished Jobs
55
content_type: concept
66
weight: 70
7+
description: >-
8+
A time-to-live mechanism to clean up old Jobs that have finished execution.
79
---
810

911
<!-- overview -->
1012

1113
{{< feature-state for_k8s_version="v1.23" state="stable" >}}
1214

13-
TTL-after-finished {{<glossary_tooltip text="controller" term_id="controller">}} provides a
14-
TTL (time to live) mechanism to limit the lifetime of resource objects that
15-
have finished execution. TTL controller only handles
16-
{{< glossary_tooltip text="Jobs" term_id="job" >}}.
15+
When your Job has finished, it's useful to keep that Job in the API (and not immediately delete the Job)
16+
so that you can tell whether the Job succeeded or failed.
17+
18+
Kubernetes' TTL-after-finished {{<glossary_tooltip text="controller" term_id="controller">}} provides a
19+
TTL (time to live) mechanism to limit the lifetime of Job objects that
20+
have finished execution.
1721

1822
<!-- body -->
1923

20-
## TTL-after-finished Controller
24+
## Cleanup for finished Jobs
2125

22-
The TTL-after-finished controller is only supported for Jobs. A cluster operator can use this feature to clean
26+
The TTL-after-finished controller is only supported for Jobs. You can use this mechanism to clean
2327
up finished Jobs (either `Complete` or `Failed`) automatically by specifying the
2428
`.spec.ttlSecondsAfterFinished` field of a Job, as in this
2529
[example](/docs/concepts/workloads/controllers/job/#clean-up-finished-jobs-automatically).
26-
The TTL-after-finished controller will assume that a job is eligible to be cleaned up
27-
TTL seconds after the job has finished, in other words, when the TTL has expired. When the
30+
31+
The TTL-after-finished controller assumes that a Job is eligible to be cleaned up
32+
TTL seconds after the Job has finished. The timer starts once the
33+
status condition of the Job changes to show that the Job is either `Complete` or `Failed`; once the TTL has
34+
expired, that Job becomes eligible for
35+
[cascading](/docs/concepts/architecture/garbage-collection/#cascading-deletion) removal. When the
2836
TTL-after-finished controller cleans up a job, it will delete it cascadingly, that is to say it will delete
29-
its dependent objects together with it. Note that when the job is deleted,
30-
its lifecycle guarantees, such as finalizers, will be honored.
37+
its dependent objects together with it.
38+
39+
Kubernetes honors object lifecycle guarantees on the Job, such as waiting for
40+
[finalizers](/docs/concepts/overview/working-with-objects/finalizers/).
3141

32-
The TTL seconds can be set at any time. Here are some examples for setting the
42+
You can set the TTL seconds at any time. Here are some examples for setting the
3343
`.spec.ttlSecondsAfterFinished` field of a Job:
3444

35-
* Specify this field in the job manifest, so that a Job can be cleaned up
45+
* Specify this field in the Job manifest, so that a Job can be cleaned up
3646
automatically some time after it finishes.
37-
* Set this field of existing, already finished jobs, to adopt this new
38-
feature.
47+
* Manually set this field of existing, already finished Jobs, so that they become eligible
48+
for cleanup.
3949
* Use a
4050
[mutating admission webhook](/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks)
41-
to set this field dynamically at job creation time. Cluster administrators can
51+
to set this field dynamically at Job creation time. Cluster administrators can
4252
use this to enforce a TTL policy for finished jobs.
4353
* Use a
4454
[mutating admission webhook](/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks)
45-
to set this field dynamically after the job has finished, and choose
46-
different TTL values based on job status, labels, etc.
55+
to set this field dynamically after the Job has finished, and choose
56+
different TTL values based on job status, labels. For this case, the webhook needs
57+
to detect changes to the `.status` of the Job and only set a TTL when the Job
58+
is being marked as completed.
59+
* Write your own controller to manage the cleanup TTL for Jobs that match a particular
60+
{{< glossary_tooltip term_id="selector" text="selector-selector" >}}.
4761

48-
## Caveat
62+
## Caveats
4963

50-
### Updating TTL Seconds
64+
### Updating TTL for finished Jobs
5165

52-
Note that the TTL period, e.g. `.spec.ttlSecondsAfterFinished` field of Jobs,
53-
can be modified after the job is created or has finished. However, once the
54-
Job becomes eligible to be deleted (when the TTL has expired), the system won't
55-
guarantee that the Jobs will be kept, even if an update to extend the TTL
56-
returns a successful API response.
66+
You can modify the TTL period, e.g. `.spec.ttlSecondsAfterFinished` field of Jobs,
67+
after the job is created or has finished. If you extend the TTL period after the
68+
existing `ttlSecondsAfterFinished` period has expired, Kubernetes doesn't guarantee
69+
to retain that Job, even if an update to extend the TTL returns a successful API
70+
response.
5771

58-
### Time Skew
72+
### Time skew
5973

60-
Because TTL-after-finished controller uses timestamps stored in the Kubernetes jobs to
74+
Because the TTL-after-finished controller uses timestamps stored in the Kubernetes jobs to
6175
determine whether the TTL has expired or not, this feature is sensitive to time
62-
skew in the cluster, which may cause TTL-after-finish controller to clean up job objects
76+
skew in your cluster, which may cause the control plane to clean up Job objects
6377
at the wrong time.
6478

6579
Clocks aren't always correct, but the difference should be
6680
very small. Please be aware of this risk when setting a non-zero TTL.
6781

68-
69-
7082
## {{% heading "whatsnext" %}}
7183

72-
* [Clean up Jobs automatically](/docs/concepts/workloads/controllers/job/#clean-up-finished-jobs-automatically)
73-
74-
* [Design doc](https://github.com/kubernetes/enhancements/blob/master/keps/sig-apps/592-ttl-after-finish/README.md)
84+
* Read [Clean up Jobs automatically](/docs/concepts/workloads/controllers/job/#clean-up-finished-jobs-automatically)
7585

86+
* Refer to the [Kubernetes Enhancement Proposal](https://github.com/kubernetes/enhancements/blob/master/keps/sig-apps/592-ttl-after-finish/README.md)
87+
(KEP) for adding this mechanism.

0 commit comments

Comments
 (0)