Skip to content

Commit 5730f1b

Browse files
authored
Merge pull request kubernetes#3249 from alculquicondor/ready-pods
Add load test results for JobReadyPods
2 parents 5b83201 + d101e5a commit 5730f1b

File tree

1 file changed

+15
-4
lines changed
  • keps/sig-apps/2879-ready-pods-job-status

1 file changed

+15
-4
lines changed

keps/sig-apps/2879-ready-pods-job-status/README.md

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ field based on the number of Pods that have the `Ready` condition.
9696

9797
- An increase in Job status updates. To mitigate this, the job controller holds
9898
the Pod updates that happen in X ms before syncing a Job.
99-
From experiments using integration tests, X=500ms was found to be a reasonable
99+
From experiments using E2E load tests, X=1s was found to be a reasonable
100100
value.
101101

102102
## Design Details
@@ -139,16 +139,27 @@ pods that have the `Ready` condition.
139139

140140
- Feature gate enabled by default.
141141
- Existing [E2E] and [conformance] tests passing.
142-
- Scalability tests for Jobs of varying sizes, up to 500 parallelism, that keep
143-
track of metric `job_sync_duration_seconds`. There should be no significant
144-
degradation after enabling the feature gate.
142+
- Scalability tests for Jobs of varying sizes, up to 500 parallelism. There
143+
should be no significant degradation in E2E time after enabling the feature
144+
gate.
145+
146+
Using a [clusterloader test](https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/testing/batch/config.yaml)
147+
that creates 338 jobs (total of ~3000 pods) on a 100 nodes cluster, with 100
148+
QPS for the job controller, where each pod sleeps for 30s, I obtained the
149+
following results (averaged for 3 runs, from the time the first job got created):
150+
- Feature disabled, no batching of pod updates : 68s
151+
- Feature enabled, batching pod updates for 0.5s: 72s (+5.9%)
152+
- Feature enabled, batching pod updates for 1s: 71s (+4.4%)
145153

146154
[E2E]: https://testgrid.k8s.io/sig-apps#gce&include-filter-by-regex=apps%5C%5D%20Job
147155
[Conformance]: https://testgrid.k8s.io/conformance-all#Conformance%20-%20GCE%20-%20master&include-filter-by-regex=sig-apps&include-filter-by-regex=Job&exclude-filter-by-regex=CronJob
148156

149157
#### GA
150158

151159
- Every bug report is fixed.
160+
- Explore setting different batch periods for regular pod updates versus
161+
finished pod updates, so we can do less pod readiness updates without
162+
compromising how fast we can declare a job finished.
152163
- The job controller ignores the feature gate.
153164

154165
#### Deprecation

0 commit comments

Comments
 (0)