Skip to content

Commit 6e0acd6

Browse files
Simplify
1 parent 5887977 commit 6e0acd6

File tree

2 files changed

+24
-45
lines changed

2 files changed

+24
-45
lines changed

keps/sig-apps/2307-job-tracking-without-lingering-pods/README.md

Lines changed: 24 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
- [New API calls](#new-api-calls)
1313
- [Bigger Job status](#bigger-job-status)
1414
- [Unprotected Job status endpoint](#unprotected-job-status-endpoint)
15+
- [Jobs with legacy tracking](#jobs-with-legacy-tracking)
1516
- [Design Details](#design-details)
1617
- [API changes](#api-changes)
1718
- [Algorithm](#algorithm)
@@ -20,7 +21,6 @@
2021
- [Deleted Jobs](#deleted-jobs)
2122
- [Pod adoption](#pod-adoption)
2223
- [Monitoring Pods with finalizers](#monitoring-pods-with-finalizers)
23-
- [Migrating Jobs with legacy tracking](#migrating-jobs-with-legacy-tracking)
2424
- [Test Plan](#test-plan)
2525
- [Prerequisite testing updates](#prerequisite-testing-updates)
2626
- [Unit tests](#unit-tests)
@@ -84,7 +84,7 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
8484
- [x] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
8585
- [x] (R) Production readiness review completed
8686
- [x] (R) Production readiness review approved
87-
- [ ] "Implementation History" section is up-to-date for milestone
87+
- [x] "Implementation History" section is up-to-date for milestone
8888
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
8989
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
9090

@@ -173,6 +173,25 @@ Changes in the status not produced by the Job controller in
173173
kube-controller-manager could affect the Job tracking. Cluster administrators
174174
should make sure to protect the Job status endpoint via RBAC.
175175

176+
#### Jobs with legacy tracking
177+
178+
Starting in 1.27, the job controller will ignore the annotation `batch.kubernetes.io/job-completion`
179+
and will start tracking every Job with finalizers.
180+
This means that terminated pods without finalizers will be ignored and
181+
replacement pods might be created (with finalizers). This behavior is similar
182+
to:
183+
- Having a low terminated pods threshold in the Pod GC or
184+
- Losing pods because of node upgrades.
185+
186+
The impact should be minimal for the following reasons:
187+
- During 1.26, all new Jobs will be tracked with finalizers, as the feature
188+
cannot be disabled.
189+
- Most clusters would also have the feature enabled in 1.25, giving extra
190+
time for jobs to terminate.
191+
192+
In other words, in most clusters Jobs will have 2 releases to terminate
193+
before getting their pods recreated.
194+
176195
## Design Details
177196

178197
### API changes
@@ -327,38 +346,6 @@ that currently have a job tracking finalizer.
327346

328347
The job controller tracks this metric in its event handlers.
329348

330-
## Migrating Jobs with legacy tracking
331-
332-
Once `JobTrackingWithFinalizers` graduates to stable, Jobs that start in a
333-
kubernetes version where `JobTrackingWithFinalizer` is disabled need to be
334-
migrated to the new tracking. This migration mechanism will be initially guarded
335-
by the feature gate `MigrateJobLegacyTracking`, starting in 1.26,
336-
enabled by default.
337-
338-
When the feature gate `MigrateJobLegacyTracking` is enabled, the job controller
339-
migrates jobs with legacy tracking to tracking with finalizers as described
340-
below:
341-
342-
If a Job doesn't have the annotation `batch.kubernetes.io/job-completion`, it
343-
means that the Job is not currently tracked with finalizers. The job controller
344-
starts the following migration process:
345-
1. Add the finalizer `batch.kubernetes.io/job-completion` to all pods with
346-
`.status.phase=(Pending/Running)`.
347-
2. Ignore pods with `.status.phase=(Complete/Failed)` that don't have the `batch.kubernetes.io/job-completion`.
348-
They are considered to be already counted in `.status.(failed/succeeded)`.
349-
3. Add the annotation `batch.kubernetes.io/job-completion`.
350-
351-
This might lead to extra pods being created, but this is acceptable because:
352-
353-
- After the `JobTrackingWithFinalizers` feature was enabled for some time, the
354-
Job controller is already tracking most Jobs using finalizers.
355-
- For the remaining Jobs, the Job controller already accounted most of the
356-
finished Pods in the status. The controller might leave some
357-
finished Pods unaccounted, if they finish before the controller has a chance
358-
to add a finalizer. This situation is no worse than the legacy tracking
359-
were the controller doesn't account for Pods removed by garbage collection or
360-
other means.
361-
362349
### Test Plan
363350

364351
[x] I/we understand the owners of the involved components may require updates to
@@ -432,8 +419,6 @@ for jobs with multiple sizes.
432419

433420
#### Beta -> GA Graduation
434421

435-
- [Migrate existing Jobs to tracking with finalizers](#migrating-jobs-with-legacy-tracking)
436-
under feature gate `MigrateJobLegacyTracking`, enabled by default.
437422
- Job E2E tests graduate to conformance.
438423
- Job tracking scales to 10^5 completions per Job processed within an order of
439424
minutes.
@@ -448,15 +433,12 @@ In 1.26:
448433

449434
In 1.27:
450435

451-
- Lock `MigrateJobLegacyTracking` to true.
452436
- Remove legacy tracking code.
437+
- Ignore annotation `batch.kubernetes.io/job-completion` and stop adding it.
438+
Mark the annotation as legacy in the documentation.
453439

454440
In 1.28:
455-
456-
- Stop adding annotation `batch.kubernetes.io/job-completion` and remove from
457-
documentation.
458-
- Remove feature gates `JobTrackingWithFinalizers` and `MigrateJobLegacyTracking`.
459-
- Remove legacy to finalizers migration code.
441+
- Remove feature gate `JobTrackingWithFinalizers`.
460442

461443
### Upgrade / Downgrade Strategy
462444

keps/sig-apps/2307-job-tracking-without-lingering-pods/kep.yaml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,6 @@ feature-gates:
3535
components:
3636
- kube-apiserver
3737
- kube-controller-manager
38-
- name: MigrateJobLegacyTracking
39-
components:
40-
- kube-controller-manager
4138
disable-supported: false
4239

4340
# The following PRR answers are required at beta release

0 commit comments

Comments
 (0)