Skip to content

Commit 1b024fb

Browse files
committed
updates
1 parent 616f12c commit 1b024fb

File tree

1 file changed

+12
-3
lines changed
  • keps/sig-apps/3715-elastic-indexed-job

1 file changed

+12
-3
lines changed

keps/sig-apps/3715-elastic-indexed-job/README.md

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,8 @@ know that this has succeeded?
198198
### Non-Goals
199199

200200
- Change the existing behavior of `NonIndexed` mode.
201-
- Change the existing behavior of `Indexed` mode for jobs that never mutate.
201+
- Change the existing behavior of `Indexed` mode for jobs that never mutate
202+
`.spec.completions` and `.spec.parallelism` together.
202203

203204
<!--
204205
What is out of scope for this KEP? Listing non-goals helps to focus discussion
@@ -248,7 +249,7 @@ The manager communicates with the workers to manage their load; examples of such
248249
and districuted ML training. Examples of frameworks include Horovod with MPI, Spark and Ray.
249250

250251
This workload can be modeled as two `Indexed` Jobs, one for the manager and one for the workers. A
251-
headless service is created to set up stable hostnames for the worker pods . The workers job is
252+
headless service is created to set up stable hostnames for the worker pods. The workers job is
252253
scaled up/down by updating `spec.completions` and `spec.parallelism` in tandem.
253254

254255
The success semantics of these workloads are tied to the manager, not the workers. However, in
@@ -406,7 +407,7 @@ Note that if tests reveals a required change that invalidates the above understa
406407
revert and start in Alpha.
407408

408409
#### Beta
409-
- Validtion logic in place to allow mutating `spec.completions`
410+
- Validation logic in place to allow mutating `spec.completions` in tandem with `.spec.parallelism`.
410411

411412
#### GA
412413
- Fix any potentially reported bugs.
@@ -534,6 +535,8 @@ you need any help or guidance.
534535

535536
### Feature Enablement and Rollback
536537

538+
The feature can be safely rolled back.
539+
537540
<!--
538541
This section must be completed when targeting alpha to a release.
539542
-->
@@ -572,6 +575,8 @@ automations, so be extremely careful here.
572575

573576
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
574577
Yes. If disabled, kube-apiserver will reject mutating requests to `spec.completions` for `Indexed` jobs.
578+
For Jobs that were previously mutated, then the only implication is that future mutating requests will be
579+
rejected.
575580

576581
<!--
577582
Describe the consequences on existing workloads (e.g., if this is a runtime
@@ -865,6 +870,10 @@ Create requests will be rejected.
865870
In a multi-master setup, when the cluster has skewed apiservers, some create requests
866871
may get accepted and some may get rejected.
867872

873+
Detection: failed update requests; metric that an operator can monitor is apiserver_request_total[resource=job, group=batch, verb=UPDATE, code=403]
874+
Diagnostics: apiserver logs indicating rejected/accepted job update requests
875+
Testing: no testing, this is a transient state until all instances are on the same k8s version.
876+
868877
<!--
869878
For each of them, fill in the following information by copying the below template:
870879
- [Failure mode brief description]

0 commit comments

Comments
 (0)