Skip to content

Commit 0de12d1

Browse files
Add count of ready Pods in Job status
1 parent 4e068bd commit 0de12d1

File tree

3 files changed

+385
-0
lines changed

3 files changed

+385
-0
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 2879
2+
beta:
3+
approver: "@ehashman"
Lines changed: 348 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,348 @@
1+
# KEP-2879: Track ready Pods in Job status
2+
3+
<!-- toc -->
4+
- [Release Signoff Checklist](#release-signoff-checklist)
5+
- [Summary](#summary)
6+
- [Motivation](#motivation)
7+
- [Goals](#goals)
8+
- [Non-Goals](#non-goals)
9+
- [Proposal](#proposal)
10+
- [Risks and Mitigations](#risks-and-mitigations)
11+
- [Design Details](#design-details)
12+
- [API](#api)
13+
- [Changes to the Job controller](#changes-to-the-job-controller)
14+
- [Test Plan](#test-plan)
15+
- [Graduation Criteria](#graduation-criteria)
16+
- [Alpha](#alpha)
17+
- [Beta](#beta)
18+
- [GA](#ga)
19+
- [Deprecation](#deprecation)
20+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
21+
- [Version Skew Strategy](#version-skew-strategy)
22+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
23+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
24+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
25+
- [Monitoring Requirements](#monitoring-requirements)
26+
- [Dependencies](#dependencies)
27+
- [Scalability](#scalability)
28+
- [Troubleshooting](#troubleshooting)
29+
- [Implementation History](#implementation-history)
30+
- [Drawbacks](#drawbacks)
31+
- [Alternatives](#alternatives)
32+
<!-- /toc -->
33+
34+
## Release Signoff Checklist
35+
36+
Items marked with (R) are required *prior to targeting to a milestone / release*.
37+
38+
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
39+
- [x] (R) KEP approvers have approved the KEP status as `implementable`
40+
- [x] (R) Design details are appropriately documented
41+
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
42+
- [ ] e2e Tests for all Beta API Operations (endpoints)
43+
- [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
44+
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
45+
- [ ] (R) Graduation criteria is in place
46+
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
47+
- [x] (R) Production readiness review completed
48+
- [x] (R) Production readiness review approved
49+
- [ ] "Implementation History" section is up-to-date for milestone
50+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
51+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
52+
53+
[kubernetes.io]: https://kubernetes.io/
54+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
55+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
56+
[kubernetes/website]: https://git.k8s.io/website
57+
58+
## Summary
59+
60+
The Job status has a field `active` which counts the number of Job Pods that
61+
are in `Running` or `Pending` phases. In this KEP, we add a field `ready` that
62+
counts the number of Job Pods that have a `Ready` condition, with the same
63+
best effort guarantees as the existing `active` field.
64+
65+
## Motivation
66+
67+
Job Pods can remain in the `Pending` phase for a long time in clusters with
68+
tight resources and when image pulls take long. Since the `Job.status.active`
69+
field includes `Pending` Pods, this can give a false impression of progress
70+
to end users or other controllers. This is more important when the pods serve
71+
as workers and need to communicate among themselves.
72+
73+
A separate `Job.status.ready` field can provide more information for users
74+
and controllers, reducing the need to listen to Pod updates themselves.
75+
76+
Note that other workload APIs (such as ReplicaSet and StatefulSet) have a
77+
similar field: `.status.readyReplicas`.
78+
79+
### Goals
80+
81+
- Add the field `Job.status.ready` that keeps a count of Job Pods with the
82+
`Ready` condition.
83+
84+
### Non-Goals
85+
86+
- Provide strong guarantees for the accuracy of the count. Due to the
87+
asynchronous nature of k8s, there are can be more or less Pods currently
88+
ready than what the count provides.
89+
90+
## Proposal
91+
92+
Add the field `.status.ready` to the Job API. The job controller updates the
93+
field based on the number of Pods that have the `Ready` condition.
94+
95+
### Risks and Mitigations
96+
97+
During upgrades, a cluster can have apiservers with version skew, or the
98+
administrator might decide to do a rollback. This can cause:
99+
100+
- Loss of the new API field value
101+
102+
This is acceptable for the first release. The value is only informative: the
103+
kubernetes control plane doesn't use the value to influence behavior.
104+
105+
- Repeated Job status updates.
106+
107+
If one apiserver populates the value and another apiserver (running an older
108+
version) drops the field, the job controller might try to update the field
109+
again, potentially causing subsequent updates. This can be mitigated by only
110+
updating the field if the job controller is already updating the status due
111+
to changes in other fields. This check is only necessary in the first release.
112+
113+
For both problems, in the first release, the API documentation, can state that
114+
the field can remain at zero indefinitely even if pods have been Ready for a long
115+
time.
116+
117+
## Design Details
118+
119+
### API
120+
121+
```golang
122+
type JobStatus struct {
123+
...
124+
Active int32
125+
Ready int32 // new field
126+
Succeeded int32
127+
Failed int32
128+
}
129+
```
130+
131+
### Changes to the Job controller
132+
133+
The Job controller already lists the Pods to populate the `active`, `succeeded`
134+
and `failed` fields. To count `ready` pods, the job controller will filter the
135+
pods that have the `Ready` condition.
136+
137+
In a first release, the Job controller counts the ready pods and updates the
138+
field if and only if:
139+
- The job controller is already updating other Job status fields.
140+
- The `JobReadyPods` feature gate is enabled.
141+
142+
In the second release, the Job controller updates the field unconditionally.
143+
144+
### Test Plan
145+
146+
- Unit and integration tests covering:
147+
- Count of ready pods.
148+
- Not producing updates in the cases described in the design.
149+
- Verify passing existing E2E and conformance tests for Job.
150+
151+
### Graduation Criteria
152+
153+
#### Alpha
154+
155+
This KEP proposes to skip this stage, for the following reasons:
156+
- The added calculation is trivial.
157+
- It is acceptable to report .status.ready as zero in the first release, as
158+
the value is only informative.
159+
160+
#### Beta
161+
162+
- Ability to completely disable the feature, through a feature gate. The feature
163+
gate is enabled by default.
164+
165+
In a first release:
166+
167+
- The job controller only fills the field if there are other Job status updates.
168+
- Unit and integration tests.
169+
170+
In a second release:
171+
172+
- The job controller fills the field whenever the number of ready Pods changes.
173+
The feature can still be disabled through the feature gate.
174+
175+
#### GA
176+
177+
- Every bug report is fixed.
178+
- The job controller ignores the feature gate.
179+
180+
#### Deprecation
181+
182+
N/A
183+
184+
### Upgrade / Downgrade Strategy
185+
186+
No changes required for existing cluster to use the enhancement.
187+
188+
### Version Skew Strategy
189+
190+
The feature doesn't affect nodes.
191+
192+
In the first release, a version skew between apiservers might cause the new field
193+
to remain at zero even if there are Pods ready.
194+
195+
## Production Readiness Review Questionnaire
196+
197+
### Feature Enablement and Rollback
198+
199+
###### How can this feature be enabled / disabled in a live cluster?
200+
201+
- [x] Feature gate (also fill in values in `kep.yaml`)
202+
- Feature gate name: JobReadyPods
203+
- Components depending on the feature gate: kube-controller-manager
204+
- [ ] Other
205+
- Describe the mechanism:
206+
- Will enabling / disabling the feature require downtime of the control
207+
plane?
208+
- Will enabling / disabling the feature require downtime or reprovisioning
209+
of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).
210+
211+
###### Does enabling the feature change any default behavior?
212+
213+
Yes, the Job controller might upgrade the Job status more frequently to
214+
report ready pods.
215+
216+
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
217+
218+
Yes, the lost of information is acceptable as the field is only informative.
219+
220+
###### What happens if we reenable the feature if it was previously rolled back?
221+
222+
The Job controller will start populating the field again.
223+
224+
###### Are there any tests for feature enablement/disablement?
225+
226+
Yes, at unit and integration level.
227+
228+
### Rollout, Upgrade and Rollback Planning
229+
230+
###### How can a rollout or rollback fail? Can it impact already running workloads?
231+
232+
The field is only informative, it doesn't affect running workloads.
233+
234+
###### What specific metrics should inform a rollback?
235+
236+
N/A
237+
238+
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
239+
240+
N/A
241+
242+
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
243+
244+
No
245+
246+
### Monitoring Requirements
247+
248+
###### How can an operator determine if the feature is in use by workloads?
249+
250+
The feature applies to all Jobs, unless the feature gate is disabled.
251+
252+
###### How can someone using this feature know that it is working for their instance?
253+
254+
- [x] API .status
255+
- Other field: `ready`
256+
257+
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
258+
259+
The 99% percentile of Job status updates below 1s, when the controller doesn't
260+
create new Pods or tracks finishing Pods.
261+
262+
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
263+
264+
- [x] Metrics
265+
- Metric name: `job_sync_duration_seconds`, `job_sync_total`.
266+
267+
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
268+
269+
No.
270+
271+
### Dependencies
272+
273+
###### Does this feature depend on any specific services running in the cluster?
274+
275+
No.
276+
277+
### Scalability
278+
279+
###### Will enabling / using this feature result in any new API calls?
280+
281+
282+
- API: PUT Job/status
283+
284+
Estimated throughput: at most one API call for each Job Pod reaching Ready
285+
condition.
286+
287+
Originating component: job-controller
288+
289+
###### Will enabling / using this feature result in introducing new API types?
290+
291+
No.
292+
293+
###### Will enabling / using this feature result in any new calls to the cloud provider?
294+
295+
No.
296+
297+
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
298+
299+
- API: Job/status
300+
301+
Estimated increase in size: New field of less than 10B.
302+
303+
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
304+
305+
No.
306+
307+
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
308+
309+
No.
310+
311+
### Troubleshooting
312+
313+
###### How does this feature react if the API server and/or etcd is unavailable?
314+
315+
No change from existing behavior of the Job controller.
316+
317+
###### What are other known failure modes?
318+
319+
- When the cluster has apiservers with skewed versions, the `Job.status.ready`
320+
might remain zero.
321+
322+
###### What steps should be taken if SLOs are not being met to determine the problem?
323+
324+
1. Check reachability between kube-controller-manager and apiserver.
325+
1. If the `job_sync_duration_seconds` is too high, check for the number
326+
of requests in apiserver coming from the kube-system/job-controller service
327+
account. Consider increasing the number of inflight requests for
328+
apiserver or tuning [API priority and fairness](https://kubernetes.io/docs/concepts/cluster-administration/flow-control/)
329+
to give more priority for the job-controller requests.
330+
1. If the steps above are insufficient disable the `JobTrackingWithFinalizers`
331+
feature gate from apiserver and kube-controller-manager and [report an issue](https://github.com/kubernetes/kubernetes/issues).
332+
333+
## Implementation History
334+
335+
- 2021-08-19: Proposed KEP starting in beta status.
336+
337+
## Drawbacks
338+
339+
The only drawback is an increase in API calls. However, this is capped by
340+
the number of times a Pod flips ready status. This is usually once for each
341+
Pod created.
342+
343+
## Alternatives
344+
345+
- Add `Job.status.running`, counting Pods with `Running` phase. The `Running`
346+
phase doesn't take into account preparation work before the worker is ready
347+
to accept connections. The `Ready` condition is configurable through a
348+
readiness probe.
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
title: Track ready Pods in Job status
2+
kep-number: 2879
3+
authors:
4+
- "@alculquicondor"
5+
owning-sig: sig-apps
6+
participating-sigs:
7+
status: implementable
8+
creation-date: 2021-08-19
9+
reviewers:
10+
- "@soltysh"
11+
- TBD API reviewer
12+
approvers:
13+
- "@soltysh"
14+
15+
see-also:
16+
replaces:
17+
18+
stage: beta
19+
20+
latest-milestone: "v1.23"
21+
22+
milestone:
23+
beta: "v1.23"
24+
stable: "v1.25"
25+
26+
feature-gates:
27+
- name: JobReadyPods
28+
components:
29+
- kube-controller-manager
30+
disable-supported: true
31+
32+
metrics:
33+
- job_sync_duration_seconds
34+
- job_sync_total

0 commit comments

Comments
 (0)