Merge pull request #37595 from sftim/20221029_improve_ttl_after_finished_concept

k8s-ci-robot · web-flow · commit b067305d644b · 2023-01-28T11:38:32.000-08:00
Improve concept docs for TTL-after-finished controller / Job cleanup
diff --git a/content/en/docs/concepts/workloads/controllers/ttlafterfinished.md b/content/en/docs/concepts/workloads/controllers/ttlafterfinished.md
@@ -1,75 +1,87 @@
 ---
 reviewers:
 - janetkuo
-title: Automatic Clean-up for Finished Jobs 
+title: Automatic Cleanup for Finished Jobs
 content_type: concept
 weight: 70
+description: >-
+  A time-to-live mechanism to clean up old Jobs that have finished execution.
 ---
 
 <!-- overview -->
 
 {{< feature-state for_k8s_version="v1.23" state="stable" >}}
 
-TTL-after-finished {{<glossary_tooltip text="controller" term_id="controller">}} provides a 
-TTL (time to live) mechanism to limit the lifetime of resource objects that 
-have finished execution. TTL controller only handles 
-{{< glossary_tooltip text="Jobs" term_id="job" >}}.
+When your Job has finished, it's useful to keep that Job in the API (and not immediately delete the Job)
+so that you can tell whether the Job succeeded or failed.
+
+Kubernetes' TTL-after-finished {{<glossary_tooltip text="controller" term_id="controller">}} provides a
+TTL (time to live) mechanism to limit the lifetime of Job objects that
+have finished execution.
 
 <!-- body -->
 
-## TTL-after-finished Controller
+## Cleanup for finished Jobs
 
-The TTL-after-finished controller is only supported for Jobs. A cluster operator can use this feature to clean
+The TTL-after-finished controller is only supported for Jobs. You can use this mechanism to clean
 up finished Jobs (either `Complete` or `Failed`) automatically by specifying the
 `.spec.ttlSecondsAfterFinished` field of a Job, as in this
 [example](/docs/concepts/workloads/controllers/job/#clean-up-finished-jobs-automatically).
-The TTL-after-finished controller will assume that a job is eligible to be cleaned up
-TTL seconds after the job has finished, in other words, when the TTL has expired. When the
+
+The TTL-after-finished controller assumes that a Job is eligible to be cleaned up
+TTL seconds after the Job has finished. The timer starts once the
+status condition of the Job changes to show that the Job is either `Complete` or `Failed`; once the TTL has
+expired, that Job becomes eligible for
+[cascading](/docs/concepts/architecture/garbage-collection/#cascading-deletion) removal. When the
 TTL-after-finished controller cleans up a job, it will delete it cascadingly, that is to say it will delete
-its dependent objects together with it. Note that when the job is deleted,
-its lifecycle guarantees, such as finalizers, will be honored.
+its dependent objects together with it.
+
+Kubernetes honors object lifecycle guarantees on the Job, such as waiting for
+[finalizers](/docs/concepts/overview/working-with-objects/finalizers/).
 
-The TTL seconds can be set at any time. Here are some examples for setting the
+You can set the TTL seconds at any time. Here are some examples for setting the
 `.spec.ttlSecondsAfterFinished` field of a Job:
 
-* Specify this field in the job manifest, so that a Job can be cleaned up
+* Specify this field in the Job manifest, so that a Job can be cleaned up
   automatically some time after it finishes.
-* Set this field of existing, already finished jobs, to adopt this new
-  feature.
+* Manually set this field of existing, already finished Jobs, so that they become eligible
+  for cleanup.
 * Use a
   [mutating admission webhook](/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks)
-  to set this field dynamically at job creation time. Cluster administrators can
+  to set this field dynamically at Job creation time. Cluster administrators can
   use this to enforce a TTL policy for finished jobs.
 * Use a
   [mutating admission webhook](/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks)
-  to set this field dynamically after the job has finished, and choose
-  different TTL values based on job status, labels, etc.
+  to set this field dynamically after the Job has finished, and choose
+  different TTL values based on job status, labels. For this case, the webhook needs
+  to detect changes to the `.status` of the Job and only set a TTL when the Job
+  is being marked as completed.
+* Write your own controller to manage the cleanup TTL for Jobs that match a particular
+  {{< glossary_tooltip term_id="selector" text="selector-selector" >}}.
 
-## Caveat
+## Caveats
 
-### Updating TTL Seconds
+### Updating TTL for finished Jobs
 
-Note that the TTL period, e.g. `.spec.ttlSecondsAfterFinished` field of Jobs,
-can be modified after the job is created or has finished. However, once the
-Job becomes eligible to be deleted (when the TTL has expired), the system won't
-guarantee that the Jobs will be kept, even if an update to extend the TTL
-returns a successful API response.
+You can modify the TTL period, e.g. `.spec.ttlSecondsAfterFinished` field of Jobs,
+after the job is created or has finished. If you extend the TTL period after the
+existing `ttlSecondsAfterFinished` period has expired, Kubernetes doesn't guarantee
+to retain that Job, even if an update to extend the TTL returns a successful API
+response.
 
-### Time Skew
+### Time skew
 
-Because TTL-after-finished controller uses timestamps stored in the Kubernetes jobs to
+Because the TTL-after-finished controller uses timestamps stored in the Kubernetes jobs to
 determine whether the TTL has expired or not, this feature is sensitive to time
-skew in the cluster, which may cause TTL-after-finish controller to clean up job objects
+skew in your cluster, which may cause the control plane to clean up Job objects
 at the wrong time.
 
 Clocks aren't always correct, but the difference should be
 very small. Please be aware of this risk when setting a non-zero TTL.
 
-
-
 ## {{% heading "whatsnext" %}}
 
-* [Clean up Jobs automatically](/docs/concepts/workloads/controllers/job/#clean-up-finished-jobs-automatically)
-
-* [Design doc](https://github.com/kubernetes/enhancements/blob/master/keps/sig-apps/592-ttl-after-finish/README.md)
+* Read [Clean up Jobs automatically](/docs/concepts/workloads/controllers/job/#clean-up-finished-jobs-automatically)
 
+* Refer to the [Kubernetes Enhancement Proposal](https://github.com/kubernetes/enhancements/blob/master/keps/sig-apps/592-ttl-after-finish/README.md)
+  (KEP) for adding this mechanism.