Skip to content

Commit 893cd43

Browse files
committed
KEP3998: Update the metrics and condition reason for the Beta graduation
Signed-off-by: Yuki Iwai <[email protected]>
1 parent 382fe4b commit 893cd43

File tree

2 files changed

+35
-12
lines changed

2 files changed

+35
-12
lines changed

keps/sig-apps/3998-job-success-completion-policy/README.md

Lines changed: 33 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -225,6 +225,9 @@ However, we are going to extend the scope of the condition to the scenario when
225225
the Job completes by reaching the `.spec.completions`, as part of fixing
226226
(issue #123775)[https://github.com/kubernetes/kubernetes/issues/123775].
227227

228+
Additionally, we introduce a new `CompletionsReached` condition reason for the `Complete` and `SuccessCriteriaMet` condition
229+
so that we can represent the place where the `SuccessCriteriaMet` condition when the number of succeeded Job Pods reached the `.spec.completions`.
230+
228231
See more details in the
229232
[Job API managed-by mechanism](https://github.com/kubernetes/enhancements/blob/master/keps/sig-apps/4368-support-managed-by-for-batch-jobs/README.md).
230233

@@ -311,6 +314,20 @@ const (
311314
JobSucceessCriteriaMet JobConditionType = "SuccessCriteriaMet"
312315
...
313316
)
317+
...
318+
319+
const (
320+
...
321+
// JobReasonSuccessPolicy reason indicates SuccessCriteriaMet condition is added due to
322+
// a Job met successPolicy.
323+
// https://kep.k8s.io/3998
324+
JobReasonSuccessPolicy string = "SuccessPolicy"
325+
// JobReasonCompletionsReached reason indicates SuccessCriteriaMet condition is added due to
326+
// a number of succeeded Job Pods met completions.
327+
// https://kep.k8s.io/3998
328+
JobReasonCompletionsReached string = "CompletionsReached"
329+
)
330+
...
314331
```
315332

316333
Moreover, we validate the following constraints for the `rules` and `status.conditions`:
@@ -341,8 +358,8 @@ Every time the pod condition are updated, the job-controller evaluates the succe
341358
- `succeededIndexes`: the job-controller evaluates `.status.completedIndexes` to see if a set of indexes is there.
342359
- `succeededCount`: the job-controller evaluates `.status.succeeded` to see if the value is `succeededCount` or more.
343360

344-
After that, the job-controller adds a `SuccessCriteriaMet` condition instead of a `Failed` condition to `.status.conditions`
345-
and the job-controller terminates the lingering pods. At that time, `JobSuccessPolicy` is set to the `status.reason` field.
361+
After that, the job-controller adds a `SuccessCriteriaMet` condition instead of a `FailureTarget` condition to `.status.conditions`
362+
and the job-controller terminates the lingering pods. At that time, `SuccessPolicy` is set to the `status.reason` field.
346363

347364
Note that when the job meets one of successPolicies, other successPolicies are ignored.
348365

@@ -431,9 +448,9 @@ to implement this enhancement.
431448
##### e2e tests
432449

433450
- Test scenarios:
434-
- handling of successPolicy when all indexes succeeded
435-
- handling of the `.spec.successPolicy.rules.succeededIndexes` when some indexes remain pending
436-
- handling of the `.spec.successPolicy.rules.succeededCount` when some indexes remain pending
451+
- [handling of successPolicy when all indexes succeeded](https://github.com/kubernetes/kubernetes/blob/3a8a60eba29940e26ac8db52329a91ba87305114/test/e2e/apps/job.go#L524-L530)
452+
- [handling of the `.spec.successPolicy.rules.succeededIndexes` when some indexes remain pending](https://github.com/kubernetes/kubernetes/blob/3a8a60eba29940e26ac8db52329a91ba87305114/test/e2e/apps/job.go#L563-L569)
453+
- [handling of the `.spec.successPolicy.rules.succeededCount` when some indexes remain pending](https://github.com/kubernetes/kubernetes/blob/3a8a60eba29940e26ac8db52329a91ba87305114/test/e2e/apps/job.go#L602-L608)
437454

438455
### Graduation Criteria
439456

@@ -445,7 +462,8 @@ to implement this enhancement.
445462
#### Beta
446463

447464
- E2E tests passed as designed in [TestPlan](#test-plan).
448-
- Introduced a new `job_succeeded_total` metric in [Monitoring Requirements](#monitoring-requirements).
465+
- Introduced new `CompletionsReached` and `SuccessPolicy` reason labels to the `jobs_finished_total` metric in [Monitoring Requirements](#monitoring-requirements).
466+
- Introduced a new `CompletionsReached` condition reason for the `Complete` and `SuccessCriteriaMet` condition type.
449467
- Feature is enabled by default.
450468
- Address all issues reported by users.
451469

@@ -614,16 +632,19 @@ No.
614632

615633
###### How can an operator determine if the feature is in use by workloads?
616634

617-
We will introduce the new `job_succeeded_total` metric with `JobSuccessPolicy` and `Completions` reasons,
618-
which indicates the following situations:
635+
We will introduce the new `CompletionsReached` and `SuccessPolicy` reason labels to the `jobs_finished_total`,
636+
which indicates the following situations:
637+
638+
- `CompletionsReached` indicates a job is declared as `Complete` because the number of succeeded job pods meet `.spec.completions`.
639+
- `SuccessPolicy` indicates a job is declared as `Complete` because the job meets `.spec.successPolicy`.
619640

620-
- `JobSuccessPolicy` indicates a job is declared as `SuccessCriteriaMet` because the job meets `spec.succesPolicy`.
621-
- `Completions` indicates a job is declared as `SuccessCriteriaMet` because the job meets `spec.completions`.
641+
As we discussed in [this thread](https://github.com/kubernetes/kubernetes/pull/126075#discussion_r1677411216),
642+
the new `CompletionsReached` reason label is used to count the successful jobs instead of existing "" reason label.
622643

623644
###### How can someone using this feature know that it is working for their instance?
624645

625646
- [x] Job API .status
626-
- The Job controller will add a condition with `JobSuccessPolicy` reason to `conditions`.
647+
- The Job controller will add a `SuccessCriteriaMet` condition with `SuccessPolicy` reason to `conditions`.
627648

628649
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
629650

@@ -717,6 +738,7 @@ consider tuning the parameters for [APF](https://kubernetes.io/docs/concepts/clu
717738
- 2024.02.07: API is finalized for the alpha stage.
718739
- 2024.03.09: "Criteria" is replaced with "Rules".
719740
- 2024.06.11: Beta Graduation.
741+
- 2024.07.26: "CompletionsReached" reason is added and new reason labels are added to the "jobs_finished_total" metric.
720742

721743
## Drawbacks
722744

keps/sig-apps/3998-job-success-completion-policy/kep.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,4 +45,5 @@ disable-supported: true
4545

4646
# The following PRR answers are required at beta release
4747
metrics:
48-
- job_succeeded_total
48+
- job_sync_duration_seconds
49+
- jobs_finished_total

0 commit comments

Comments
 (0)