Skip to content

Commit 85a0ee2

Browse files
authored
KEP-4026: Add new annotation for job creation timestamp (#4027)
* KEP-4026 for adding new job annotation Signed-off-by: Heba Elayoty <[email protected]> * Address review comments Signed-off-by: Heba Elayoty <[email protected]> * change annotation name Signed-off-by: Heba Elayoty <[email protected]> * Add alternatives Signed-off-by: Heba Elayoty <[email protected]> * run update-toc Signed-off-by: Heba Elayoty <[email protected]> * fix status to metadata Signed-off-by: Heba Elayoty <[email protected]> * address code review comments Signed-off-by: Heba Elayoty <[email protected]> * update timestamp in the KEP to use scheduled Signed-off-by: Heba Elayoty <[email protected]> * Update timezone to use spec.timeZone then UTC if nil Signed-off-by: Heba Elayoty <[email protected]> * Update risk section and rename feature flag Signed-off-by: Heba Elayoty <[email protected]> * rename kep folder Signed-off-by: Heba Elayoty <[email protected]> * Code review comments Signed-off-by: Heba Elayoty <[email protected]> * Code review comments Signed-off-by: Heba Elayoty <[email protected]> * update feature gate name Signed-off-by: Heba Elayoty <[email protected]> * Code review comments Signed-off-by: Heba Elayoty <[email protected]> * Code review comments Signed-off-by: Heba Elayoty <[email protected]> * Update toc Signed-off-by: Heba Elayoty <[email protected]> * Add Monitoring Requirements title Signed-off-by: Heba Elayoty <[email protected]> * Remove line Signed-off-by: Heba Elayoty <[email protected]> * Remove N/A Signed-off-by: Heba Elayoty <[email protected]> * Add titles Signed-off-by: Heba Elayoty <[email protected]> * Add titles Signed-off-by: Heba Elayoty <[email protected]> --------- Signed-off-by: Heba Elayoty <[email protected]>
1 parent ff6aeac commit 85a0ee2

File tree

3 files changed

+327
-0
lines changed

3 files changed

+327
-0
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 4026
2+
beta:
3+
approver: "@wojtek-t"
Lines changed: 290 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,290 @@
1+
# KEP-4026: Add job creation timestamp to job annotations
2+
3+
<!-- toc -->
4+
- [Release Signoff Checklist](#release-signoff-checklist)
5+
- [Summary](#summary)
6+
- [Motivation](#motivation)
7+
- [Goals](#goals)
8+
- [Non-Goals](#non-goals)
9+
- [Proposal](#proposal)
10+
- [User Stories (Optional)](#user-stories-optional)
11+
- [Story 1](#story-1)
12+
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
13+
- [Risks and Mitigations](#risks-and-mitigations)
14+
- [Design Details](#design-details)
15+
- [Test Plan](#test-plan)
16+
- [Prerequisite testing updates](#prerequisite-testing-updates)
17+
- [Unit tests](#unit-tests)
18+
- [Integration tests](#integration-tests)
19+
- [e2e tests](#e2e-tests)
20+
- [Graduation Criteria](#graduation-criteria)
21+
- [Beta](#beta)
22+
- [GA](#ga)
23+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
24+
- [Version Skew Strategy](#version-skew-strategy)
25+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
26+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
27+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
28+
- [Monitoring Requirements](#monitoring-requirements)
29+
- [Dependencies](#dependencies)
30+
- [Scalability](#scalability)
31+
- [Troubleshooting](#troubleshooting)
32+
- [Implementation History](#implementation-history)
33+
- [Drawbacks](#drawbacks)
34+
- [Alternatives](#alternatives)
35+
- [Infrastructure Needed (Optional)](#infrastructure-needed-optional)
36+
<!-- /toc -->
37+
38+
## Release Signoff Checklist
39+
40+
Items marked with (R) are required *prior to targeting to a milestone / release*.
41+
42+
- [X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
43+
- [X] (R) KEP approvers have approved the KEP status as `implementable`
44+
- [X] (R) Design details are appropriately documented
45+
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
46+
- [ ] e2e Tests for all Beta API Operations (endpoints)
47+
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
48+
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
49+
- [X] (R) Graduation criteria is in place
50+
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
51+
- [ ] (R) Production readiness review completed
52+
- [ ] (R) Production readiness review approved
53+
- [ ] "Implementation History" section is up-to-date for milestone
54+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
55+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
56+
57+
[kubernetes.io]: https://kubernetes.io/
58+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
59+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
60+
[kubernetes/website]: https://git.k8s.io/website
61+
62+
## Summary
63+
64+
Currently, there is no supported way to get the original/expected initial scheduled timestamp for the job created from a cronjob. This KEP proposes to set the original scheduled time as an annotation in the job metadata.
65+
66+
## Motivation
67+
68+
### Goals
69+
70+
- Set job scheduled timestamp as an annotation on the job.
71+
- Adding the annotation should not be disruptive to existing workloads.
72+
73+
### Non-Goals
74+
75+
## Proposal
76+
77+
At a high level, the proposal is to modify the CronJob controller to set the job scheduled timestamp as a job annotation. The details of this are outlined in the Design Details section below.
78+
79+
Job scheduled timestamp annotation: `batch.kubernetes.io/cronjob-scheduled-timestamp`
80+
81+
### User Stories (Optional)
82+
83+
#### Story 1
84+
85+
As a user, I would like to get the job's scheduled timestamp that this job was expected to be running.
86+
87+
### Notes/Constraints/Caveats (Optional)
88+
89+
### Risks and Mitigations
90+
91+
CronJobs are always working with the assumption that the changes apply only to newly created jobs after the change. Therefore, the change will be to inject the annotation for newly created Jobs from CronJobs for when the feature is on. This will nicely play with downgrade and doesn't introduce unnecessary complexity.
92+
93+
## Design Details
94+
95+
The CronJob controller will only need a minor update to the [getJobFromTemplate2](https://github.com/kubernetes/kubernetes/blob/7024beeeeb1f2e4cde93805a137cd7ad92fec466/pkg/controller/cronjob/utils.go#L188) function, to add the job scheduled timestamp as the job annotation `batch.kubernetes.io/cronjob-scheduled-timestamp`. The scheduled timestamp is represented in `RFC3339`.
96+
97+
For the scheduled timestamp's timezone, the initial thought was to use `UTC` as it's used as the primary one for less confusion. However, since the `job` object has a `spec.timeZone`, it was a better to use the same timezone within the same object. If the job `spec.timeZone` is not set or `nil`, the annotation will use the `UTC` timezone as a default.
98+
99+
### Test Plan
100+
101+
- [X] I/we understand the owners of the involved components may require updates to
102+
existing tests to make this code solid enough prior to committing the changes necessary
103+
to implement this enhancement.
104+
105+
##### Prerequisite testing updates
106+
107+
108+
##### Unit tests
109+
110+
- `k8s.io/kubernetes/pkg/controller/cronjob`: `05/22/2023` - `96.2%`
111+
112+
##### Integration tests
113+
114+
- Unit tests will ensure the new annotation is correctly added to jobs.
115+
- The integration test should ensure the annotation is present when the feature is on and missing when off. It will also verify that the annotation is only added to jobs from newly created CronJobs, not existing workloads.
116+
117+
##### e2e tests
118+
119+
E2E tests will not provide any additional coverage that isn't already covered by unit + integration tests, since we are simply adding an annotation, so no e2e tests will be necessary for this change.
120+
121+
### Graduation Criteria
122+
123+
The feature will be released directly in Beta state since there is no benefit in having an alpha release, since we are simply adding a new annotation so there is very little risk.
124+
125+
#### Beta
126+
127+
- Feature implemented behind the `CronJobsScheduledAnnotation` feature gate.
128+
- Unit and integration tests passing.
129+
130+
#### GA
131+
132+
Fix any potentially reported bugs.
133+
134+
### Upgrade / Downgrade Strategy
135+
136+
No changes required to existing cluster to use this feature.
137+
138+
### Version Skew Strategy
139+
140+
N/A. This feature doesn't require coordination between control plane components,
141+
the changes to each controller are self-contained.
142+
143+
## Production Readiness Review Questionnaire
144+
145+
146+
### Feature Enablement and Rollback
147+
148+
149+
###### How can this feature be enabled / disabled in a live cluster?
150+
151+
152+
- [X] Feature gate (also fill in values in `kep.yaml`)
153+
- Feature gate name: `CronJobCreationAnnotation`
154+
- Components depending on the feature gate: `kube-controller-manager`
155+
- [ ] Other
156+
- Describe the mechanism: N/A.
157+
- Will enabling / disabling the feature require downtime of the control
158+
plane? No
159+
- Will enabling / disabling the feature require downtime or re-provisioning of a node? No
160+
161+
###### Does enabling the feature change any default behavior?
162+
163+
The jobs newly created by cronjob controller will contain a new annotation `CronJobsScheduledAnnotation`.
164+
165+
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
166+
167+
Yes. If the feature gate is disabled, the CronJob controller will not add the
168+
scheduled timestamp as an annotation.
169+
170+
###### What happens if we reenable the feature if it was previously rolled back?
171+
172+
The CronJob controller will begin adding the scheduled timestamp as an annotation to jobs created while the feature is enabled, and existing jobs will be unaffected.
173+
174+
###### Are there any tests for feature enablement/disablement?
175+
176+
Given the feature results in adding an annotation only to newly created objects, those tests won't really be different from the actual feature tests.
177+
178+
### Rollout, Upgrade and Rollback Planning
179+
180+
###### How can a rollout or rollback fail? Can it impact already running workloads?
181+
182+
This change will not impact the rollout or rollback fail. It also will not impact the already running workloads.
183+
184+
###### What specific metrics should inform a rollback?
185+
186+
- Users can monitor CronJobs metrics `job_creation_skew_duration_seconds` and `cronjob_controller_rate_limiter_use`, `cronjob_job_creation_skew`.
187+
188+
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
189+
190+
The feature will be tested manually prior to beta launch.
191+
192+
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
193+
194+
No.
195+
196+
### Monitoring Requirements
197+
198+
199+
###### How can an operator determine if the feature is in use by workloads?
200+
201+
Randomly checking the CronJobs annotation `batch.kubernetes.io/cronjob-scheduled-timestamp` is sufficient. For monitoring purposes, we can rely on pre-existing metrics which monitor both the cronjob queue and the job creation skew, which should provide sufficient signal if the controller is working as expected. For small clusters, checking the annotation will determine the feature is used.
202+
203+
###### How can someone using this feature know that it is working for their instance?
204+
205+
- [ ] Events
206+
- Event Reason:
207+
- [X] API .metadata
208+
- Condition name:
209+
- Other field:
210+
- `.metadata.annotations['batch.kubernetes.io/cronjob-scheduled-timestamp']`
211+
212+
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
213+
214+
- 99% percentile over day for Job syncs is <= 15s for a client-side 50 QPS limit.
215+
216+
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
217+
218+
- [X] Metrics
219+
- Metric name: cronjob_job_creation_skew
220+
- Components exposing the metric: kube-controller-manager
221+
- Metric name: job_creation_skew_duration_seconds
222+
- Components exposing the metric: kube-controller-manager
223+
224+
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
225+
226+
No.
227+
228+
### Dependencies
229+
230+
###### Does this feature depend on any specific services running in the cluster?
231+
232+
No.
233+
234+
### Scalability
235+
236+
###### Will enabling / using this feature result in any new API calls?
237+
238+
No.
239+
240+
###### Will enabling / using this feature result in introducing new API types?
241+
242+
No.
243+
244+
###### Will enabling / using this feature result in any new calls to the cloud provider?
245+
246+
No.
247+
248+
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
249+
250+
Yes, each job created by a cronjob-controller will have an additional annotation containing `RFC3339` timestamp, which together with annotation name results in ~70B per job object.
251+
252+
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
253+
254+
No.
255+
256+
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
257+
258+
No.
259+
260+
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
261+
262+
No.
263+
264+
### Troubleshooting
265+
266+
###### How does this feature react if the API server and/or etcd is unavailable?
267+
268+
No change comparing to existing failure modes.
269+
270+
###### What are other known failure modes?
271+
272+
N/A
273+
274+
###### What steps should be taken if SLOs are not being met to determine the problem?
275+
276+
- 2023-06-06: KEP published
277+
278+
## Implementation History
279+
280+
## Drawbacks
281+
282+
## Alternatives
283+
284+
- Add label instead of annotation
285+
- Labels are unnecessary as we need to pass data that won't be used with search or satisfy certain conditions.
286+
287+
- Add a status field
288+
- The object already has the `CreationTimestamp` field, but it will get overridden with the time the CronJob will start. The point of the new annotation is to pass the original/expected scheduled timestamp information.
289+
290+
## Infrastructure Needed (Optional)
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
title: KEP Template
2+
kep-number: 4026
3+
authors:
4+
- "@helayoty"
5+
owning-sig: sig-apps
6+
participating-sigs:
7+
status: implementable
8+
creation-date: 2023-06-06
9+
reviewers:
10+
- "@soltysh"
11+
approvers:
12+
- "@soltysh"
13+
14+
# The target maturity stage in the current dev cycle for this KEP.
15+
stage: beta
16+
17+
latest-milestone: "v1.28"
18+
19+
# The milestone at which this feature was, or is targeted to be, at each stage.
20+
milestone:
21+
alpha: ""
22+
beta: "v1.28"
23+
stable: ""
24+
25+
# The following PRR answers are required at alpha release
26+
# List the feature gate name and the components for which it must be enabled
27+
feature-gates:
28+
- name: CronJobCreationAnnotation
29+
components:
30+
- kube-controller-manager
31+
disable-supported: true
32+
33+
# The following PRR answers are required at beta release
34+
metrics:

0 commit comments

Comments
 (0)