Skip to content

Commit 428a19e

Browse files
authored
KEP3850: graduate Backoff Limit Per Index for Job to stable (#5154)
* KEP3850: graduate Backoff Limit Per Index for Job to stable * review remarks
1 parent 06d6d0d commit 428a19e

File tree

3 files changed

+49
-35
lines changed

3 files changed

+49
-35
lines changed

keps/prod-readiness/sig-apps/3850.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,5 @@ alpha:
33
approver: "@wojtek-t"
44
beta:
55
approver: "@wojtek-t"
6+
stable:
7+
approver: "@wojtek-t"

keps/sig-apps/3850-backoff-limits-per-index-for-indexed-jobs/README.md

Lines changed: 44 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -15,15 +15,15 @@
1515
- [Performance benchmark](#performance-benchmark)
1616
- [Risks and Mitigations](#risks-and-mitigations)
1717
- [The Job object too big](#the-job-object-too-big)
18-
- [Expotential backoff delay issue](#expotential-backoff-delay-issue)
18+
- [Exponential backoff delay issue](#exponential-backoff-delay-issue)
1919
- [Too fast Job status updates](#too-fast-job-status-updates)
2020
- [Design Details](#design-details)
2121
- [Job API](#job-api)
2222
- [Tracking the number of failures per index](#tracking-the-number-of-failures-per-index)
2323
- [Failed indexes format](#failed-indexes-format)
2424
- [Job completion](#job-completion)
2525
- [FailIndex action](#failindex-action)
26-
- [Expotential backoff delay per index](#expotential-backoff-delay-per-index)
26+
- [Exponential backoff delay per index](#exponential-backoff-delay-per-index)
2727
- [Test Plan](#test-plan)
2828
- [Prerequisite testing updates](#prerequisite-testing-updates)
2929
- [Unit tests](#unit-tests)
@@ -53,8 +53,8 @@
5353
- [Mutually exclusive backoffLimit and backoffLimitPerIndex](#mutually-exclusive-backofflimit-and-backofflimitperindex)
5454
- [Use bool field](#use-bool-field)
5555
- [Use enum field](#use-enum-field)
56-
- [Global expotential backoff delay](#global-expotential-backoff-delay)
57-
- [Expotential backoff delay with in-memory tracking](#expotential-backoff-delay-with-in-memory-tracking)
56+
- [Global exponential backoff delay](#global-exponential-backoff-delay)
57+
- [Exponential backoff delay with in-memory tracking](#exponential-backoff-delay-with-in-memory-tracking)
5858
- [Alternative ways to support high number of completions](#alternative-ways-to-support-high-number-of-completions)
5959
- [Keep failedIndexes field as a bitmap](#keep-failedindexes-field-as-a-bitmap)
6060
- [Keep the list of failed indexes in a dedicated API object](#keep-the-list-of-failed-indexes-in-a-dedicated-api-object)
@@ -72,15 +72,15 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
7272
- [x] (R) Design details are appropriately documented
7373
- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
7474
- [ ] e2e Tests for all Beta API Operations (endpoints)
75-
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
76-
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
75+
- [x] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
76+
- [x] (R) Minimum Two Week Window for GA e2e tests to prove flake free
7777
- [x] (R) Graduation criteria is in place
7878
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
7979
- [x] (R) Production readiness review completed
8080
- [x] (R) Production readiness review approved
8181
- [x] "Implementation History" section is up-to-date for milestone
8282
- [x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
83-
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
83+
- [x] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
8484

8585
[kubernetes.io]: https://kubernetes.io/
8686
[kubernetes/enhancements]: https://git.k8s.io/enhancements
@@ -272,7 +272,7 @@ BenchmarkLargeFailureHandling/job_with_backoffLimitPerIndex_with_failures;_size=
272272

273273
The above results show that the jobs using `.spec.backoffLimitPerIndex` are be
274274
slower for about 1% compared to regular indexed jobs. In practice the difference
275-
is expected to be covered by the expotential backoff delay due to pod failures.
275+
is expected to be covered by the exponential backoff delay due to pod failures.
276276

277277
<!--
278278
What are the caveats to the proposal?
@@ -338,9 +338,9 @@ We believe that the scalability limits should be enough for most of Job use-case
338338
For workloads requiring larger jobs users should be able to create multiple Jobs,
339339
orchestrated by the [JobSet](https://github.com/kubernetes-sigs/jobset).
340340

341-
### Expotential backoff delay issue
341+
### Exponential backoff delay issue
342342

343-
Currently, a pod is recreated by the Job controller with expotential backoff
343+
Currently, a pod is recreated by the Job controller with exponential backoff
344344
delay (10s, 20s, 40s ...), counted from the last failure time.
345345

346346
One complication is that the last failure time for failed pods may increase with
@@ -350,16 +350,16 @@ Thus, there is a risk that due to the presence
350350
of pods hitting the fallback the last failure time is continuously bumped,
351351
thus shifting the time to recreate the pod.
352352

353-
This risk is present both when computing the expotential backoff delay globally
353+
This risk is present both when computing the exponential backoff delay globally
354354
(as for regular indexed Jobs), or per-index as proposed in in this KEP
355-
(see [Expotential backoff delay per index](#expotential-backoff-delay-per-index)).
355+
(see [Exponential backoff delay per index](#exponential-backoff-delay-per-index)).
356356

357357
In order to mitigate this risk currently the time of last failure is recorded
358358
in-memory (globally for all pods within a Job). And a new failed pod may bump
359359
it only until it is added to the `uncountedTerminatedPods` structure.
360360

361361
However, tracking the last failure time per index might be costly for memory
362-
consumption (see [Expotential backoff delay with in-memory tracking](#expotential-backoff-delay-with-in-memory-tracking)).
362+
consumption (see [Exponential backoff delay with in-memory tracking](#exponential-backoff-delay-with-in-memory-tracking)).
363363

364364
Thus, in order to mitigate this risk we propose to compute the finish time for
365365
a pod as the first available value of the following (avoiding the ever-increasing
@@ -400,7 +400,7 @@ New items are added to the queue with a delay (1s for pod events, such as:
400400
delete, add, update). The delay allows for deduplication of the sync per Job.
401401

402402
One place to queue a new item in the queue, specific to this KEP, is when
403-
the expotential backoff delay hasn't elapsed for any index (allowing pod
403+
the exponential backoff delay hasn't elapsed for any index (allowing pod
404404
recreation), then we requeue the next Job status update. The delay is computed
405405
as minimum of all delays computed for all indexes requiring pod recreation,
406406
but not less that 1s.
@@ -570,12 +570,12 @@ we add the corresponding index to the set of failed indexes represented by
570570
`.status.failedIndexes`. This action can only be used if backoff limit per index
571571
is used.
572572

573-
### Expotential backoff delay per index
573+
### Exponential backoff delay per index
574574

575575
First, we solve the issue of increasing failure time for deleted pods when the
576576
finalizer removal is delayed, by modifying the definition of the pod finish time,
577577
to avoid fallback to `now`
578-
(see also [Expotential backoff delay issue](#expotential-backoff-delay-issue)).
578+
(see also [Exponential backoff delay issue](#exponential-backoff-delay-issue)).
579579

580580
Second, we compute the backoff delay within each index independently. The number
581581
of consecutive failures per-index can be derived from the
@@ -624,7 +624,7 @@ the following scenarios will be covered with unit tests:
624624
- marking of the Job as `Complete` only once all indexes are completed,
625625
- termination of Job execution and marking it as failed when
626626
`.spec.maxFailedIndexes` is exceeded.
627-
- calculation of the expotential backoff delay per index when `backoffLimitPerIndex`
627+
- calculation of the exponential backoff delay per index when `backoffLimitPerIndex`
628628
is used.
629629
- a fuzzer roundtrip test for API when `backoffLimit` is set to max int32.
630630

@@ -662,12 +662,14 @@ https://storage.googleapis.com/k8s-triage/index.html
662662
-->
663663

664664
The following scenarios will be covered with integration tests:
665-
- enabling, disabling and re-enabling of the `JobBackoffLimitPerIndex` feature gate
666-
- handling of the `.spec.backoffLimitPerIndex` when the `FailIndex` action is used,
667-
- handling of the `.spec.backoffLimitPerIndex` when `.spec.maxFailedIndexes` isn't set,
668-
- handling of the `.spec.backoffLimitPerIndex` when `.spec.maxFailedIndexes` is set,
669-
- handling of the `.spec.backoffLimit` when `.spec.backoffLimitPerIndex` is set,
670-
- handling of the expotential backoff delay per index when `.spec.backoffLimitPerIndex` is set.
665+
- enabling, disabling and re-enabling of the `JobBackoffLimitPerIndex` feature gate ([code](https://github.com/kubernetes/kubernetes/blob/20b12ad5c389ff74792988bf1e0c10fe2820d9a1/test/integration/job/job_test.go#L1030))
666+
- handling of the `.spec.backoffLimitPerIndex` when the `FailIndex` action is used ([code](https://github.com/kubernetes/kubernetes/blob/20b12ad5c389ff74792988bf1e0c10fe2820d9a1/test/integration/job/job_test.go#L1888)),
667+
- handling of the `.spec.backoffLimitPerIndex` when `.spec.maxFailedIndexes` isn't set ([code](https://github.com/kubernetes/kubernetes/blob/20b12ad5c389ff74792988bf1e0c10fe2820d9a1/test/integration/job/job_test.go#L1688)),
668+
- handling of the `.spec.backoffLimitPerIndex` when `.spec.maxFailedIndexes` is set ([code](https://github.com/kubernetes/kubernetes/blob/20b12ad5c389ff74792988bf1e0c10fe2820d9a1/test/integration/job/job_test.go#L1846)),
669+
- handling of the `.spec.backoffLimit` when `.spec.backoffLimitPerIndex` is set ([code](https://github.com/kubernetes/kubernetes/blob/master/test/integration/job/job_test.go#L1744)),
670+
- handling of the exponential backoff delay per index when `.spec.backoffLimitPerIndex` is set ([code](https://github.com/kubernetes/kubernetes/blob/20b12ad5c389ff74792988bf1e0c10fe2820d9a1/test/integration/job/job_test.go#L1120)).
671+
672+
The [k8s-triage] page for the [BackoffLimitPerIndex integration tests](https://storage.googleapis.com/k8s-triage/index.html?job=integration&test=BackoffLimitPerIndex).
671673

672674
More integration tests might be added to ensure good code coverage based on the
673675
actual implementation.
@@ -686,9 +688,11 @@ We expect no non-infra related flakes in the last month as a GA graduation crite
686688

687689
The following scenario is covered with e2e tests for Beta:
688690
- [sig-apps#gce](https://testgrid.k8s.io/sig-apps#gce):
689-
- Job should execute all indexes despite some failing when using backoffLimitPerIndex
690-
- Job should terminate job execution when the number of failed indexes exceeds maxFailedIndexes
691-
- Job should mark indexes as failed when the FailIndex action is matched in podFailurePolicy
691+
- Job should execute all indexes despite some failing when using backoffLimitPerIndex ([code](https://github.com/kubernetes/kubernetes/blob/20b12ad5c389ff74792988bf1e0c10fe2820d9a1/test/e2e/apps/job.go#L602))
692+
- Job should terminate job execution when the number of failed indexes exceeds maxFailedIndexes ([code](https://github.com/kubernetes/kubernetes/blob/20b12ad5c389ff74792988bf1e0c10fe2820d9a1/test/e2e/apps/job.go#L635))
693+
- Job should mark indexes as failed when the FailIndex action is matched in podFailurePolicy ([code](https://github.com/kubernetes/kubernetes/blob/20b12ad5c389ff74792988bf1e0c10fe2820d9a1/test/e2e/apps/job.go#L670))
694+
695+
The [k8s-triage] page for the [BackoffLimitPerIndex e2e tests](https://storage.googleapis.com/k8s-triage/index.html?job=e2e&test=should%20mark%20indexes%20as%20failed%20when%20the%20FailIndex%20action%20is%20matched%20in%20podFailurePolicy%7Cshould%20terminate%20job%20execution%20when%20the%20number%20of%20failed%20indexes%20exceeds%20maxFailedIndexes%7Cshould%20execute%20all%20indexes%20despite%20some%20failing%20when%20using%20backoffLimitPerIndex).
692696

693697
### Graduation Criteria
694698

@@ -757,7 +761,7 @@ in back-to-back releases.
757761
#### Alpha
758762

759763
- the feature implemented behind the `JobBackoffLimitPerIndex` feature flag
760-
- change the logic of computing the expotential backoff delay (see [here](#expotential-backoff-delay-issue))
764+
- change the logic of computing the exponential backoff delay (see [here](#exponential-backoff-delay-issue))
761765
- user-facing documentation, including the warning for setting completions > 10^5
762766
- The `JobBackoffLimitPerIndex` feature flag disabled by default
763767
- Tests: unit and integration
@@ -781,7 +785,6 @@ in back-to-back releases.
781785
to use `FailIndex`
782786
- Graduate e2e tests as conformance tests
783787
- Lock the `JobBackoffLimitPerIndex` feature gate
784-
- Declare deprecation of the `JobBackoffLimitPerIndex` feature gate in documentation
785788

786789
### Upgrade / Downgrade Strategy
787790

@@ -1390,6 +1393,15 @@ Major milestones might include:
13901393
- 2023-07-18: Merge the Job Controller PR [Support BackoffLimitPerIndex in Jobs](https://github.com/kubernetes/kubernetes/pull/118009)
13911394
- 2023-08-04: Merge user-facing docs PR [Docs update for Job's backoff limit per index (alpha in 1.28)](https://github.com/kubernetes/website/pull/41921)
13921395
- 2023-08-06: Merge KEP update reflecting decisions during the implementation phase [Update for KEP3850 "Backoff Limit Per Index"](https://github.com/kubernetes/enhancements/pull/4123)
1396+
- 2023-10-02: [Update KEP-3850 "Backoff Limit Per Index" for Beta](https://github.com/kubernetes/enhancements/pull/4228)
1397+
- 2023-10-20: [Introduce the job_finished_indexes_total metric](https://github.com/kubernetes/kubernetes/pull/121292)
1398+
- 2023-10-23: [Graduate BackoffLimitPerIndex to Beta](https://github.com/kubernetes/kubernetes/pull/121356)
1399+
- 2023-10-24: [Indicate Job Backoff Limit Per Index reason consts are beta](https://github.com/kubernetes/kubernetes/pull/121471)
1400+
- 2023-10-25: [Backoff limit per index e2e test](https://github.com/kubernetes/kubernetes/pull/121368)
1401+
- 2023-11-02: [Add remaining e2e tests for Job BackoffLimitPerIndex based on KEP](https://github.com/kubernetes/kubernetes/pull/121633)
1402+
- 2023-11-02: [Benchmark job with backoff limit per index](https://github.com/kubernetes/kubernetes/pull/121393)
1403+
- 2023-11-02: [Update KEP3850 "BackoffLimitPerIndex for Indexed Jobs"](https://github.com/kubernetes/enhancements/pull/4321)
1404+
- 2025-02-07: [KEP3850: graduate Backoff Limit Per Index for Job to stable](https://github.com/kubernetes/enhancements/pull/5154)
13931405

13941406
## Drawbacks
13951407

@@ -1556,9 +1568,9 @@ not need to be as detailed as the proposal, but should include enough
15561568
information to express the idea and why it was not acceptable.
15571569
-->
15581570

1559-
### Global expotential backoff delay
1571+
### Global exponential backoff delay
15601572

1561-
We could also consider leaving the expotential backoff delay as global and
1573+
We could also consider leaving the exponential backoff delay as global and
15621574
be enabled by a dedicated API field in the future KEP, say `backoffDelayPerIndex`.
15631575

15641576
**Reasons for deferring / rejecting**
@@ -1568,9 +1580,9 @@ Thus, failures or successes in one index should not influence backoff delays
15681580
for another index. We are leaving the decision to the community feeback and
15691581
discussions though.
15701582

1571-
### Expotential backoff delay with in-memory tracking
1583+
### Exponential backoff delay with in-memory tracking
15721584

1573-
Instead of modifying the definition of pod's finish time (see [Expotential backoff delay issue](#expotential-backoff-delay-issue))
1585+
Instead of modifying the definition of pod's finish time (see [Exponential backoff delay issue](#exponential-backoff-delay-issue))
15741586
we could keep track of the "failure time" for failed pods in-memory.
15751587

15761588
**Reasons for deferring / rejecting**

keps/sig-apps/3850-backoff-limits-per-index-for-indexed-jobs/kep.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,18 +14,18 @@ approvers:
1414
- "@soltysh"
1515

1616
# The target maturity stage in the current dev cycle for this KEP.
17-
stage: alpha
17+
stage: stable
1818

1919
# The most recent milestone for which work toward delivery of this KEP has been
2020
# done. This can be the current (upcoming) milestone, if it is being actively
2121
# worked on.
22-
latest-milestone: "v1.29"
22+
latest-milestone: "v1.33"
2323

2424
# The milestone at which this feature was, or is targeted to be, at each stage.
2525
milestone:
2626
alpha: "v1.28"
2727
beta: "v1.29"
28-
stable:
28+
stable: "v1.33"
2929

3030
# The following PRR answers are required at alpha release
3131
# List the feature gate name and the components for which it must be enabled

0 commit comments

Comments
 (0)