Add count of ready Pods in Job status

alculquicondor · alculquicondor · commit 0de12d1b60da · 2021-08-19T15:51:34.000-04:00
diff --git a/keps/prod-readiness/sig-apps/2879.yaml b/keps/prod-readiness/sig-apps/2879.yaml
@@ -0,0 +1,3 @@
+kep-number: 2879
+beta:
+  approver: "@ehashman"
diff --git a/keps/sig-apps/2879-ready-pods-job-status/README.md b/keps/sig-apps/2879-ready-pods-job-status/README.md
@@ -0,0 +1,348 @@
+# KEP-2879: Track ready Pods in Job status
+
+<!-- toc -->
+- [Release Signoff Checklist](#release-signoff-checklist)
+- [Summary](#summary)
+- [Motivation](#motivation)
+  - [Goals](#goals)
+  - [Non-Goals](#non-goals)
+- [Proposal](#proposal)
+  - [Risks and Mitigations](#risks-and-mitigations)
+- [Design Details](#design-details)
+  - [API](#api)
+  - [Changes to the Job controller](#changes-to-the-job-controller)
+  - [Test Plan](#test-plan)
+  - [Graduation Criteria](#graduation-criteria)
+    - [Alpha](#alpha)
+    - [Beta](#beta)
+    - [GA](#ga)
+    - [Deprecation](#deprecation)
+  - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
+  - [Version Skew Strategy](#version-skew-strategy)
+- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
+  - [Feature Enablement and Rollback](#feature-enablement-and-rollback)
+  - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
+  - [Monitoring Requirements](#monitoring-requirements)
+  - [Dependencies](#dependencies)
+  - [Scalability](#scalability)
+  - [Troubleshooting](#troubleshooting)
+- [Implementation History](#implementation-history)
+- [Drawbacks](#drawbacks)
+- [Alternatives](#alternatives)
+<!-- /toc -->
+
+## Release Signoff Checklist
+
+Items marked with (R) are required *prior to targeting to a milestone / release*.
+
+- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
+- [x] (R) KEP approvers have approved the KEP status as `implementable`
+- [x] (R) Design details are appropriately documented
+- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
+  - [ ] e2e Tests for all Beta API Operations (endpoints)
+  - [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) 
+  - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
+- [ ] (R) Graduation criteria is in place
+  - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) 
+- [x] (R) Production readiness review completed
+- [x] (R) Production readiness review approved
+- [ ] "Implementation History" section is up-to-date for milestone
+- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
+- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
+
+[kubernetes.io]: https://kubernetes.io/
+[kubernetes/enhancements]: https://git.k8s.io/enhancements
+[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
+[kubernetes/website]: https://git.k8s.io/website
+
+## Summary
+
+The Job status has a field `active` which counts the number of Job Pods that
+are in `Running` or `Pending` phases. In this KEP, we add a field `ready` that
+counts the number of Job Pods that have a `Ready` condition, with the same
+best effort guarantees as the existing `active` field.
+
+## Motivation
+
+Job Pods can remain in the `Pending` phase for a long time in clusters with
+tight resources and when image pulls take long. Since the `Job.status.active`
+field includes `Pending` Pods, this can give a false impression of progress
+to end users or other controllers. This is more important when the pods serve
+as workers and need to communicate among themselves.
+
+A separate `Job.status.ready` field can provide more information for users
+and controllers, reducing the need to listen to Pod updates themselves.
+
+Note that other workload APIs (such as ReplicaSet and StatefulSet) have a
+similar field: `.status.readyReplicas`.
+
+### Goals
+
+- Add the field `Job.status.ready` that keeps a count of Job Pods with the
+  `Ready` condition.
+
+### Non-Goals
+
+- Provide strong guarantees for the accuracy of the count. Due to the
+  asynchronous nature of k8s, there are can be more or less Pods currently
+  ready than what the count provides.
+
+## Proposal
+
+Add the field `.status.ready` to the Job API. The job controller updates the
+field based on the number of Pods that have the `Ready` condition.
+
+### Risks and Mitigations
+
+During upgrades, a cluster can have apiservers with version skew, or the
+administrator might decide to do a rollback. This can cause:
+
+- Loss of the new API field value
+
+  This is acceptable for the first release. The value is only informative: the
+  kubernetes control plane doesn't use the value to influence behavior.
+  
+- Repeated Job status updates.
+
+  If one apiserver populates the value and another apiserver (running an older
+  version) drops the field, the job controller might try to update the field
+  again, potentially causing subsequent updates. This can be mitigated by only
+  updating the field if the job controller is already updating the status due
+  to changes in other fields. This check is only necessary in the first release.
+  
+For both problems, in the first release, the API documentation, can state that
+the field can remain at zero indefinitely even if pods have been Ready for a long
+time.
+
+## Design Details
+
+### API
+
+```golang
+type JobStatus struct {
+	...
+	Active    int32
+	Ready     int32  // new field
+	Succeeded int32
+	Failed    int32
+}
+```
+
+### Changes to the Job controller
+
+The Job controller already lists the Pods to populate the `active`, `succeeded`
+and `failed` fields. To count `ready` pods, the job controller will filter the
+pods that have the `Ready` condition.
+
+In a first release, the Job controller counts the ready pods and updates the
+field if and only if:
+- The job controller is already updating other Job status fields.
+- The `JobReadyPods` feature gate is enabled.
+
+In the second release, the Job controller updates the field unconditionally.
+
+### Test Plan
+
+- Unit and integration tests covering:
+  - Count of ready pods.
+  - Not producing updates in the cases described in the design.
+- Verify passing existing E2E and conformance tests for Job.
+
+### Graduation Criteria
+
+#### Alpha
+
+This KEP proposes to skip this stage, for the following reasons:
+- The added calculation is trivial.
+- It is acceptable to report .status.ready as zero in the first release, as
+  the value is only informative.
+
+#### Beta
+
+- Ability to completely disable the feature, through a feature gate. The feature
+  gate is enabled by default.
+  
+In a first release:
+
+- The job controller only fills the field if there are other Job status updates.
+- Unit and integration tests.
+
+In a second release:
+
+- The job controller fills the field whenever the number of ready Pods changes.
+  The feature can still be disabled through the feature gate.
+
+#### GA
+
+- Every bug report is fixed.
+- The job controller ignores the feature gate.
+
+#### Deprecation
+
+N/A
+
+### Upgrade / Downgrade Strategy
+
+No changes required for existing cluster to use the enhancement.
+
+### Version Skew Strategy
+
+The feature doesn't affect nodes.
+
+In the first release, a version skew between apiservers might cause the new field
+to remain at zero even if there are Pods ready.
+
+## Production Readiness Review Questionnaire
+
+### Feature Enablement and Rollback
+
+###### How can this feature be enabled / disabled in a live cluster?
+
+- [x] Feature gate (also fill in values in `kep.yaml`)
+  - Feature gate name: JobReadyPods
+  - Components depending on the feature gate: kube-controller-manager
+- [ ] Other
+  - Describe the mechanism:
+  - Will enabling / disabling the feature require downtime of the control
+    plane?
+  - Will enabling / disabling the feature require downtime or reprovisioning
+    of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).
+
+###### Does enabling the feature change any default behavior?
+
+Yes, the Job controller might upgrade the Job status more frequently to
+report ready pods.
+
+###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
+
+Yes, the lost of information is acceptable as the field is only informative.
+
+###### What happens if we reenable the feature if it was previously rolled back?
+
+The Job controller will start populating the field again.
+
+###### Are there any tests for feature enablement/disablement?
+
+Yes, at unit and integration level.
+
+### Rollout, Upgrade and Rollback Planning
+
+###### How can a rollout or rollback fail? Can it impact already running workloads?
+
+The field is only informative, it doesn't affect running workloads.
+
+###### What specific metrics should inform a rollback?
+
+N/A
+
+###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
+
+N/A
+
+###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
+
+No
+
+### Monitoring Requirements
+
+###### How can an operator determine if the feature is in use by workloads?
+
+The feature applies to all Jobs, unless the feature gate is disabled.
+
+###### How can someone using this feature know that it is working for their instance?
+
+- [x] API .status
+  - Other field: `ready`
+
+###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
+
+The 99% percentile of Job status updates below 1s, when the controller doesn't
+create new Pods or tracks finishing Pods.
+
+###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
+
+- [x] Metrics
+  - Metric name: `job_sync_duration_seconds`, `job_sync_total`.
+
+###### Are there any missing metrics that would be useful to have to improve observability of this feature?
+
+No.
+
+### Dependencies
+
+###### Does this feature depend on any specific services running in the cluster?
+
+No.
+
+### Scalability
+
+###### Will enabling / using this feature result in any new API calls?
+
+
+- API: PUT Job/status
+
+  Estimated throughput: at most one API call for each Job Pod reaching Ready
+  condition.
+  
+  Originating component: job-controller
+
+###### Will enabling / using this feature result in introducing new API types?
+
+No.
+
+###### Will enabling / using this feature result in any new calls to the cloud provider?
+
+No.
+
+###### Will enabling / using this feature result in increasing size or count of the existing API objects?
+
+- API: Job/status
+
+  Estimated increase in size: New field of less than 10B.
+
+###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
+
+No.
+
+###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
+
+No.
+
+### Troubleshooting
+
+###### How does this feature react if the API server and/or etcd is unavailable?
+
+No change from existing behavior of the Job controller.
+
+###### What are other known failure modes?
+
+- When the cluster has apiservers with skewed versions, the `Job.status.ready`
+  might remain zero.
+
+###### What steps should be taken if SLOs are not being met to determine the problem?
+
+1. Check reachability between kube-controller-manager and apiserver.
+1. If the `job_sync_duration_seconds` is too high, check for the number
+   of requests in apiserver coming from the kube-system/job-controller service
+   account. Consider increasing the number of inflight requests for
+   apiserver or tuning [API priority and fairness](https://kubernetes.io/docs/concepts/cluster-administration/flow-control/)
+   to give more priority for the job-controller requests.
+1. If the steps above are insufficient disable the `JobTrackingWithFinalizers`
+   feature gate from apiserver and kube-controller-manager and [report an issue](https://github.com/kubernetes/kubernetes/issues).
+
+## Implementation History
+
+- 2021-08-19: Proposed KEP starting in beta status.
+
+## Drawbacks
+
+The only drawback is an increase in API calls. However, this is capped by
+the number of times a Pod flips ready status. This is usually once for each
+Pod created.
+
+## Alternatives
+
+- Add `Job.status.running`, counting Pods with `Running` phase. The `Running`
+  phase doesn't take into account preparation work before the worker is ready
+  to accept connections. The `Ready` condition is configurable through a
+  readiness probe.
diff --git a/keps/sig-apps/2879-ready-pods-job-status/kep.yaml b/keps/sig-apps/2879-ready-pods-job-status/kep.yaml
@@ -0,0 +1,34 @@
+title: Track ready Pods in Job status
+kep-number: 2879
+authors:
+  - "@alculquicondor"
+owning-sig: sig-apps
+participating-sigs:
+status: implementable
+creation-date: 2021-08-19
+reviewers:
+  - "@soltysh"
+  - TBD API reviewer
+approvers:
+  - "@soltysh"
+
+see-also:
+replaces:
+
+stage: beta
+
+latest-milestone: "v1.23"
+
+milestone:
+  beta: "v1.23"
+  stable: "v1.25"
+
+feature-gates:
+  - name: JobReadyPods
+    components:
+    - kube-controller-manager
+disable-supported: true
+
+metrics:
+  - job_sync_duration_seconds
+  - job_sync_total

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+kep-number: 2879`
	`2`	`+beta:`
	`3`	`+ approver: "@ehashman"`