@@ -198,7 +198,8 @@ know that this has succeeded?
198
198
### Non-Goals
199
199
200
200
- Change the existing behavior of ` NonIndexed ` mode.
201
- - Change the existing behavior of ` Indexed ` mode for jobs that never mutate.
201
+ - Change the existing behavior of ` Indexed ` mode for jobs that never mutate
202
+ ` .spec.completions ` and ` .spec.parallelism ` together.
202
203
203
204
<!--
204
205
What is out of scope for this KEP? Listing non-goals helps to focus discussion
@@ -248,7 +249,7 @@ The manager communicates with the workers to manage their load; examples of such
248
249
and districuted ML training. Examples of frameworks include Horovod with MPI, Spark and Ray.
249
250
250
251
This workload can be modeled as two ` Indexed ` Jobs, one for the manager and one for the workers. A
251
- headless service is created to set up stable hostnames for the worker pods . The workers job is
252
+ headless service is created to set up stable hostnames for the worker pods. The workers job is
252
253
scaled up/down by updating ` spec.completions ` and ` spec.parallelism ` in tandem.
253
254
254
255
The success semantics of these workloads are tied to the manager, not the workers. However, in
@@ -406,7 +407,7 @@ Note that if tests reveals a required change that invalidates the above understa
406
407
revert and start in Alpha.
407
408
408
409
#### Beta
409
- - Validtion logic in place to allow mutating ` spec.completions `
410
+ - Validation logic in place to allow mutating ` spec.completions ` in tandem with ` .spec.parallelism ` .
410
411
411
412
#### GA
412
413
- Fix any potentially reported bugs.
@@ -534,6 +535,8 @@ you need any help or guidance.
534
535
535
536
### Feature Enablement and Rollback
536
537
538
+ The feature can be safely rolled back.
539
+
537
540
<!--
538
541
This section must be completed when targeting alpha to a release.
539
542
-->
@@ -572,6 +575,8 @@ automations, so be extremely careful here.
572
575
573
576
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
574
577
Yes. If disabled, kube-apiserver will reject mutating requests to ` spec.completions ` for ` Indexed ` jobs.
578
+ For Jobs that were previously mutated, then the only implication is that future mutating requests will be
579
+ rejected.
575
580
576
581
<!--
577
582
Describe the consequences on existing workloads (e.g., if this is a runtime
@@ -865,6 +870,10 @@ Create requests will be rejected.
865
870
In a multi-master setup, when the cluster has skewed apiservers, some create requests
866
871
may get accepted and some may get rejected.
867
872
873
+ Detection: failed update requests; metric that an operator can monitor is apiserver_request_total[ resource=job, group=batch, verb=UPDATE, code=403]
874
+ Diagnostics: apiserver logs indicating rejected/accepted job update requests
875
+ Testing: no testing, this is a transient state until all instances are on the same k8s version.
876
+
868
877
<!--
869
878
For each of them, fill in the following information by copying the below template:
870
879
- [Failure mode brief description]
0 commit comments