Skip to content

Commit a51e50e

Browse files
Add finalizer for Job tracking
and count deleted running Pods as failed. Signed-off-by: Aldo Culquicondor <[email protected]>
1 parent 938b858 commit a51e50e

File tree

2 files changed

+52
-77
lines changed

2 files changed

+52
-77
lines changed

keps/sig-apps/2307-job-tracking-wihout-lingering-pods/README.md renamed to keps/sig-apps/2307-job-tracking-without-lingering-pods/README.md

Lines changed: 47 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -1,63 +1,3 @@
1-
<!--
2-
**Note:** When your KEP is complete, all of these comment blocks should be removed.
3-
4-
To get started with this template:
5-
6-
- [ ] **Pick a hosting SIG.**
7-
Make sure that the problem space is something the SIG is interested in taking
8-
up. KEPs should not be checked in without a sponsoring SIG.
9-
- [ ] **Create an issue in kubernetes/enhancements**
10-
When filing an enhancement tracking issue, please make sure to complete all
11-
fields in that template. One of the fields asks for a link to the KEP. You
12-
can leave that blank until this KEP is filed, and then go back to the
13-
enhancement and add the link.
14-
- [ ] **Make a copy of this template directory.**
15-
Copy this template into the owning SIG's directory and name it
16-
`NNNN-short-descriptive-title`, where `NNNN` is the issue number (with no
17-
leading-zero padding) assigned to your enhancement above.
18-
- [ ] **Fill out as much of the kep.yaml file as you can.**
19-
At minimum, you should fill in the "Title", "Authors", "Owning-sig",
20-
"Status", and date-related fields.
21-
- [ ] **Fill out this file as best you can.**
22-
At minimum, you should fill in the "Summary" and "Motivation" sections.
23-
These should be easy if you've preflighted the idea of the KEP with the
24-
appropriate SIG(s).
25-
- [ ] **Create a PR for this KEP.**
26-
Assign it to people in the SIG who are sponsoring this process.
27-
- [ ] **Merge early and iterate.**
28-
Avoid getting hung up on specific details and instead aim to get the goals of
29-
the KEP clarified and merged quickly. The best way to do this is to just
30-
start with the high-level sections and fill out details incrementally in
31-
subsequent PRs.
32-
33-
Just because a KEP is merged does not mean it is complete or approved. Any KEP
34-
marked as `provisional` is a working document and subject to change. You can
35-
denote sections that are under active debate as follows:
36-
37-
```
38-
<<[UNRESOLVED optional short context or usernames ]>>
39-
Stuff that is being argued.
40-
<<[/UNRESOLVED]>>
41-
```
42-
43-
When editing KEPS, aim for tightly-scoped, single-topic PRs to keep discussions
44-
focused. If you disagree with what is already in a document, open a new PR
45-
with suggested changes.
46-
47-
One KEP corresponds to one "feature" or "enhancement" for its whole lifecycle.
48-
You do not need a new KEP to move from beta to GA, for example. If
49-
new details emerge that belong in the KEP, edit the KEP. Once a feature has become
50-
"implemented", major changes should get new KEPs.
51-
52-
The canonical place for the latest set of instructions (and the likely source
53-
of this file) is [here](/keps/NNNN-kep-template/README.md).
54-
55-
**Note:** Any PRs to move a KEP to `implementable`, or significant changes once
56-
it is marked `implementable`, must be approved by each of the KEP approvers.
57-
If none of those approvers are still appropriate, then changes to that list
58-
should be approved by the remaining approvers and/or the owning SIG (or
59-
SIG Architecture for cross-cutting KEPs).
60-
-->
611
# KEP-2307: Job tracking without lingering Pods
622

633
<!-- toc -->
@@ -75,6 +15,7 @@ SIG Architecture for cross-cutting KEPs).
7515
- [Design Details](#design-details)
7616
- [API changes](#api-changes)
7717
- [Algorithm](#algorithm)
18+
- [Simplified algorithm for Indexed Jobs](#simplified-algorithm-for-indexed-jobs)
7819
- [Deleted Pods](#deleted-pods)
7920
- [Deleted Jobs](#deleted-jobs)
8021
- [Pod adoption](#pod-adoption)
@@ -238,11 +179,17 @@ could be stopped at any point and executed again from the first step without
238179
losing information. Generally, all the steps happen in a single Job sync
239180
cycle.
240181

182+
0. The Job controller adds a the `batch.kubernetes.io/job-completion` finalizer
183+
to the Job.
241184
1. The Job controller calculates the number of succeeded Pods as the sum of:
242185
- `.status.succeeded`,
243186
- the size of `job.status.uncountedTerminatedPods.succeeded` and
244187
- the number of finished Pods that are not in `job.status.uncountedTerminatedPods.succeeded`
245188
and have a finalizer.
189+
190+
The Job controller calculates the number of failed Pods similarly, and the
191+
number of active Pods as Pods that don't have a Failed or Succeeded condition
192+
and have a finalizer.
246193

247194
This number informs the creation of missing Pods to reach `.spec.completions`.
248195
The controller creates Pods for a Job with the finalizer
@@ -262,6 +209,9 @@ cycle.
262209
The counts increment the `.status.failed` and `.status.succeeded` and clears
263210
counted Pods from `.status.uncountedTerminatedPods` lists. The controller
264211
sends a status update.
212+
5. The Job controller removes the `batch.kubernetes.io/job-completion` finalizer
213+
from the Job if it has completed (succeeded or failed) and no Job Pod's have
214+
finalizers.
265215

266216
Steps 2 to 4 might deal with a potentially big number of Pods. Thus, status
267217
updates can potentially stress the kube-apiserver. For this reason, the Job
@@ -280,20 +230,41 @@ Steps 2 to 4 might be skipped in the scenario where a status update happened
280230
too recently and the number of uncounted Pods is a small percentage of
281231
parallelism.
282232

233+
Note that the `.status.uncountedTerminatedPods` struct allows to uniquely
234+
identify finished Pods to avoid over counting.
235+
236+
#### Simplified algorithm for Indexed Jobs
237+
238+
Pods in Indexed Jobs have a unique identifier: the completion index. Even if
239+
more than one Pod gets created for the same index, only one of them counts
240+
towards completions. The completed indexes are available in
241+
`.status.completedIndexes` in a compressed format.
242+
243+
When tracking Indexed Jobs, the Job controller can use
244+
`.status.completedIndexes` in place of
245+
`.status.uncountedTerminatedPods.succeeded` in step 2 and completely skip step 4
246+
if there are no failed terminated pods in the same sync cycle. This saves one
247+
API call for a Job status update.
248+
283249
### Deleted Pods
284250

285251
In the case where a user or another controller removes a Pod, which sets a
286252
deletion timestamp, the Job controller treats it the same as any other Pod.
287-
That is, once it reaches Failed status, the controller accounts for the Pod and
288-
then removes the finalizer.
289-
253+
Since deleted Pods with finalizers get inevitably marked as Failed, the
254+
Job controller already counts them as such and removes their finalizers.
290255
This is different from the legacy tracking, where the Job controller does not
291256
account for deleted Pods. This is a limitation that this KEP also wants to
292257
solve.
293258

294-
However, if the Job controller deletes the Pod (when parallelism is decreased,
295-
for example), the controller removes the finalizer before deleting it. Thus,
296-
these deletions don't count towards the failures.
259+
One edge case is when there is a Node failure. If the Node is down long enough,
260+
its Pods become orphan, and the garbage collector deletes them. Some of these
261+
deleted Pods could not have finished, but the algorithm described above treats
262+
them as failed.
263+
264+
On the other hand, if the Job controller deletes the Pod (when the user
265+
decreases parallelism or suspends the Job, for example), the controller removes
266+
the finalizer before deleting it. Thus, these deletions don't count towards the
267+
failures.
297268

298269
### Deleted Jobs
299270

@@ -332,11 +303,11 @@ the owner reference.
332303
- Implementation:
333304
- Job tracking without lingering Pods
334305
- Removal of finalizer when feature gate is disabled.
306+
- Support for [Indexed Jobs](https://git.k8s.io/enhancements/keps/sig-apps/2214-indexed-job)
335307
- Tests: unit, integration, E2E
336308

337309
#### Alpha -> Beta Graduation
338310

339-
- Support for [Indexed Jobs](https://git.k8s.io/enhancements/keps/sig-apps/2214-indexed-job)
340311
- Processing 5000 Pods per minute across any number of Jobs, with Pod creation
341312
having higher priority than status updates. This might depend on
342313
[Priority and Fairness](https://git.k8s.io/enhancements/keps/sig-api-machinery/1040-priority-and-fairness).
@@ -353,7 +324,7 @@ the owner reference.
353324

354325
### Upgrade / Downgrade Strategy
355326

356-
When the feature `JobTrackingWithoutLingeringPods` is enabled for the first
327+
When the feature `JobTrackingWithFinalizers` is enabled for the first
357328
time, the cluster can have Jobs whose Pods don't have the
358329
`batch.kubernetes.io/job-completion` finalizer. It would be hard to add the
359330
finalizer to all Pods while preventing race conditions.
@@ -363,9 +334,8 @@ was created after the feature was enabled. If this field is nil, the Job
363334
controller tracks Pods using the legacy tracking.
364335

365336
The kube-apiserver sets `.status.uncountedTerminatedPods` to an empty struct
366-
when the feature gate `JobTrackingWithoutLingeringPods` is enabled, at Job
367-
creation. In alpha, apiserver leaves `.status.uncountedTerminatedPods = nil`
368-
for [Indexed Jobs](https://git.k8s.io/enhancements/keps/sig-apps/2214-indexed-job)
337+
when the feature gate `JobTrackingWithFinalizers` is enabled, at Job
338+
creation.
369339

370340
When the feature is disabled after being enabled for some time, the next time
371341
the Job controller syncs a Job:
@@ -384,7 +354,7 @@ _This section must be completed when targeting alpha to a release._
384354

385355
* **How can this feature be enabled / disabled in a live cluster?**
386356
- [x] Feature gate (also fill in values in `kep.yaml`)
387-
- Feature gate name: JobTrackingWithoutLingeringPods
357+
- Feature gate name: JobTrackingWithFinalizers
388358
- Components depending on the feature gate:
389359
- kube-apiserver
390360
- kube-controller-manager
@@ -506,6 +476,9 @@ previous answers based on experience in the field._
506476
- estimated throughput: one per Pod created by the Job controller, when Pod
507477
finishes or is removed.
508478
- originating component: kube-controller-manager
479+
- PATCH Jobs, to add and remove finalizers.
480+
- estimated throughput: two calls for each Job created.
481+
- originating component: kube-controller-manager
509482
- PUT Job status, to keep track of uncounted Pods.
510483
- estimated throughput: at least one per Job sync. The job controller
511484
throttles additional calls at 1 per a few seconds (precise throughput TBD
@@ -526,6 +499,8 @@ the existing API objects?**
526499

527500
- Pod
528501
- Estimated increase: new finalizer of 33 bytes.
502+
- Job
503+
- Estimated increase: new finalizer of 33 bytes.
529504
- Job status
530505
- Estimated increase: new array temporarily containing terminated Pod UIDs.
531506
The job controller caps the size of the array to less than 20kb.

keps/sig-apps/2307-job-tracking-wihout-lingering-pods/kep.yaml renamed to keps/sig-apps/2307-job-tracking-without-lingering-pods/kep.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,18 +20,18 @@ stage: alpha
2020
# The most recent milestone for which work toward delivery of this KEP has been
2121
# done. This can be the current (upcoming) milestone, if it is being actively
2222
# worked on.
23-
latest-milestone: "v1.21"
23+
latest-milestone: "v1.22"
2424

2525
# The milestone at which this feature was, or is targeted to be, at each stage.
2626
milestone:
27-
alpha: "v1.21"
28-
beta: "v1.22"
29-
stable: "v1.24"
27+
alpha: "v1.22"
28+
beta: "v1.23"
29+
stable: "v1.25"
3030

3131
# The following PRR answers are required at alpha release
3232
# List the feature gate name and the components for which it must be enabled
3333
feature-gates:
34-
- name: JobTrackingWithoutLingeringPods
34+
- name: JobTrackingWithFinalizers
3535
components:
3636
- kube-apiserver
3737
- kube-controller-manager

0 commit comments

Comments
 (0)