Skip to content

Commit 97bedd0

Browse files
Remove changes for APF configuration
1 parent 9125b3f commit 97bedd0

File tree

2 files changed

+32
-29
lines changed

2 files changed

+32
-29
lines changed

keps/sig-apps/2307-job-tracking-without-lingering-pods/README.md

Lines changed: 20 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@
3030
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
3131
- [How can this feature be enabled / disabled in a live cluster?](#how-can-this-feature-be-enabled--disabled-in-a-live-cluster)
3232
- [Does enabling the feature change any default behavior?](#does-enabling-the-feature-change-any-default-behavior)
33-
- [Can the feature be disabled once it has been enabled (i.e. can we roll back](#can-the-feature-be-disabled-once-it-has-been-enabled-ie-can-we-roll-back)
33+
- [Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?](#can-the-feature-be-disabled-once-it-has-been-enabled-ie-can-we-roll-back-the-enablement)
3434
- [What happens if we reenable the feature if it was previously rolled back?**](#what-happens-if-we-reenable-the-feature-if-it-was-previously-rolled-back)
3535
- [Are there any tests for feature enablement/disablement?**](#are-there-any-tests-for-feature-enablementdisablement)
3636
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
@@ -331,16 +331,16 @@ the owner reference.
331331
- Removal of finalizer when feature gate is disabled.
332332
- Support for [Indexed Jobs](https://git.k8s.io/enhancements/keps/sig-apps/2214-indexed-job)
333333
- Tests: unit, integration.
334+
334335
#### Alpha -> Beta Graduation
335336

336-
- Processing 5000 Pods per minute across any number of Jobs, with Pod creation
337-
having higher priority than status updates, using [Priority and Fairness](https://kubernetes.io/docs/concepts/cluster-administration/flow-control).
338-
We rename the existing workload-high priority to workload-medium and, when
339-
the feature gate JobTrackingWithFinalizers is enabled, we add a workload-high
340-
priority that matches the job-controller ServiceAccount for Pod creation.
337+
- Pod processing throughput per minute (mix of creating and counting finished Pods),
338+
assuming an average Job .spec.parallelism=10.
339+
- Up to 2500 Pods (~3000 queries) for a 50 QPS client limit for the job controller.
340+
- Up to 5000 (~6000 queries) Pods for a 100 QPS client limit for the job controller.
341341
- Ensure that tracking Jobs with big number of Pods doesn't cause starvation of
342342
smaller jobs.
343-
- Metrics for latency and errors
343+
- Metrics for latency, counting updates and errors.
344344
- Job E2E tests are in Testgrid with the feature enabled and linked in KEP
345345

346346
#### Beta -> GA Graduation
@@ -416,8 +416,7 @@ No implications to node runtime.
416416
- Pods removed by the user or other controllers count towards failures or
417417
completions.
418418

419-
#### Can the feature be disabled once it has been enabled (i.e. can we roll back
420-
the enablement)?
419+
#### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
421420

422421
Yes.
423422
The job controller removes finalizers in this case.
@@ -436,8 +435,6 @@ No implications to node runtime.
436435
### Rollout, Upgrade and Rollback Planning
437436

438437
#### How can a rollout fail? Can it impact already running workloads?
439-
Try to be as paranoid as possible - e.g., what if some components will restart
440-
mid-rollout?
441438

442439
The change doesn't affect running Pods. If the component restarts
443440
mid-rollout into an older version, the Job controller switches to tracking
@@ -453,8 +450,10 @@ No implications to node runtime.
453450
#### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
454451

455452
Integration tests cover feature gate disablement.
456-
A manual upgrade->downgrade->upgrade flow will be executed to ensure that a
457-
running Job falls back to tracking without finalizers.
453+
454+
A manual upgrade->downgrade->upgrade flow will be executed prior to graduation
455+
to ensure that a running Job falls back to tracking without finalizers. The
456+
KEP will be updated with the findings of the test.
458457

459458
#### Is the rollout accompanied by any deprecations and/or removals of features, APIs,
460459
fields of API types, flags, etc.?
@@ -465,10 +464,12 @@ fields of API types, flags, etc.?
465464

466465
#### How can an operator determine if the feature is in use by workloads?
467466

468-
There is no metric provided.
469-
Administrators can check for the existence of Job objects with the annotation
470-
`batch.kubernetes.io/job-completion` or Pods with the finalizer
471-
`batch.kubernetes.io/job-completion`.
467+
- The metric `job_pod_finished` (with a label result=failed/completed)
468+
increments when the job controller removes a Pod out of
469+
`.status.uncountedTerminatedPods` to increase the failed/completed counters.
470+
- Administrators can check for the existence of Job objects with the annotation
471+
`batch.kubernetes.io/job-completion` or Pods with the finalizer
472+
`batch.kubernetes.io/job-completion`.
472473

473474
###### How can someone using this feature know that it is working for their instance?
474475

@@ -482,8 +483,8 @@ fields of API types, flags, etc.?
482483

483484
#### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
484485

485-
- 99% percentile over day for Job syncs is <= 15s, assuming a client-side QPS
486-
limit of 50 calls per second.
486+
- 99% percentile over day for Job syncs is <= 15s for a client-side 50 QPS
487+
limit.
487488

488489
#### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
489490

keps/sig-apps/2307-job-tracking-without-lingering-pods/kep.yaml

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
11
title: Job tracking without lingering Pods
22
kep-number: 2307
33
authors:
4-
- "@alculquicondor"
4+
- "@alculquicondor"
55
owning-sig: sig-apps
66
status: implementable
77
creation-date: 2020-01-21
88
reviewers:
9-
- "@erictune"
10-
- "@lavalamp"
11-
- "@soltysh"
9+
- "@erictune"
10+
- "@lavalamp"
11+
- "@soltysh"
1212
approvers:
13-
- "@janetkuo"
13+
- "@janetkuo"
1414
prr-approvers:
1515
- "@wojtek-t"
1616

@@ -31,12 +31,14 @@ milestone:
3131
# The following PRR answers are required at alpha release
3232
# List the feature gate name and the components for which it must be enabled
3333
feature-gates:
34-
- name: JobTrackingWithFinalizers
35-
components:
36-
- kube-apiserver
37-
- kube-controller-manager
34+
- name: JobTrackingWithFinalizers
35+
components:
36+
- kube-apiserver
37+
- kube-controller-manager
3838
disable-supported: true
3939

4040
# The following PRR answers are required at beta release
4141
metrics:
42-
- TBD
42+
- 'job_sync_duration_seconds'
43+
- 'job_sync_total'
44+
- 'job_pod_finished'

0 commit comments

Comments
 (0)