You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -68,11 +76,15 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
68
76
-[x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
69
77
-[x] (R) KEP approvers have approved the KEP status as `implementable`
70
78
-[x] (R) Design details are appropriately documented
71
-
-[x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
79
+
-[x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
80
+
-[x] e2e Tests for all Beta API Operations (endpoints)
81
+
-[x] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
82
+
-[x] (R) Minimum Two Week Window for GA e2e tests to prove flake free
72
83
-[x] (R) Graduation criteria is in place
84
+
-[x] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
73
85
-[x] (R) Production readiness review completed
74
86
-[x] (R) Production readiness review approved
75
-
-[] "Implementation History" section is up-to-date for milestone
87
+
-[x] "Implementation History" section is up-to-date for milestone
76
88
-[ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
77
89
-[ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
78
90
@@ -161,6 +173,25 @@ Changes in the status not produced by the Job controller in
161
173
kube-controller-manager could affect the Job tracking. Cluster administrators
162
174
should make sure to protect the Job status endpoint via RBAC.
163
175
176
+
#### Jobs with legacy tracking
177
+
178
+
Starting in 1.27, the job controller will ignore the annotation `batch.kubernetes.io/job-completion`
179
+
and will start tracking every Job with finalizers.
180
+
This means that terminated pods without finalizers will be ignored and
181
+
replacement pods might be created (with finalizers). This behavior is similar
182
+
to:
183
+
- Having a low terminated pods threshold in the Pod GC or
184
+
- Losing pods because of node upgrades.
185
+
186
+
The impact should be minimal for the following reasons:
187
+
- During 1.26, all new Jobs will be tracked with finalizers, as the feature
188
+
cannot be disabled.
189
+
- Most clusters would also have the feature enabled in 1.25, giving extra
190
+
time for jobs to terminate.
191
+
192
+
In other words, in most clusters Jobs will have 2 releases to terminate
193
+
before getting their pods recreated.
194
+
164
195
## Design Details
165
196
166
197
### API changes
@@ -307,21 +338,63 @@ finalizer.
307
338
The job controller adds the finalizer in the same patch request that modifies
308
339
the owner reference.
309
340
341
+
## Monitoring Pods with finalizers
342
+
343
+
Starting in 1.26, the metric `job_terminated_pod_tracking_finalizer` is a gauge
344
+
that tracks the number of terminated pods (`.status.phase=(Succeeded|Failed)`)
345
+
that currently have a job tracking finalizer.
346
+
347
+
The job controller tracks this metric in its event handlers.
348
+
310
349
### Test Plan
311
350
312
-
- Unit tests:
351
+
[x] I/we understand the owners of the involved components may require updates to
352
+
existing tests to make this code solid enough prior to committing the changes necessary
353
+
to implement this enhancement.
354
+
355
+
##### Prerequisite testing updates
356
+
357
+
Already fulfilled at alpha and beta stages.
358
+
359
+
##### Unit tests
360
+
313
361
- Job sync with feature gate enabled.
314
362
- Removal of finalizers when feature gate is disabled.
315
-
- Tracking of terminating Pods.
316
-
- Integration tests:
317
-
- Job tracking with feature enabled.
318
-
- Tracking of terminating Pods.
319
-
- Transition from feature enabled to disabled and enabled again.
320
-
- Clean up finalizers of Orphan Pods.
363
+
- Tracking of terminating Pods for NonIndexed and Indexed Jobs.
364
+
365
+
Coverage:
366
+
367
+
-`pkg/controller/job`: 2022-08-06 - 90%
368
+
-`pkg/apis/batch/validation`: 2022-08-06 - 96%
369
+
-`pkg/apis/batch/v1`: 2022-08-06 - 85.2%
370
+
-`pkg/registry/batch/job`: 2022-08-06 - 79.7%
371
+
372
+
##### Integration tests
373
+
374
+
Almost the entire [test suite](https://storage.googleapis.com/k8s-triage/index.html?job=ci-kubernetes-integration&test=test%2Fintegration%2Fjob) runs with finalizers.
0 commit comments