@@ -96,7 +96,7 @@ field based on the number of Pods that have the `Ready` condition.
96
96
97
97
- An increase in Job status updates. To mitigate this, the job controller holds
98
98
the Pod updates that happen in X ms before syncing a Job.
99
- From experiments using integration tests, X=500ms was found to be a reasonable
99
+ From experiments using E2E load tests, X=1s was found to be a reasonable
100
100
value.
101
101
102
102
## Design Details
@@ -139,16 +139,27 @@ pods that have the `Ready` condition.
139
139
140
140
- Feature gate enabled by default.
141
141
- Existing [ E2E] and [ conformance] tests passing.
142
- - Scalability tests for Jobs of varying sizes, up to 500 parallelism, that keep
143
- track of metric ` job_sync_duration_seconds ` . There should be no significant
144
- degradation after enabling the feature gate.
142
+ - Scalability tests for Jobs of varying sizes, up to 500 parallelism. There
143
+ should be no significant degradation in E2E time after enabling the feature
144
+ gate.
145
+
146
+ Using a [ clusterloader test] ( https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/testing/batch/config.yaml )
147
+ that creates 338 jobs (total of ~ 3000 pods) on a 100 nodes cluster, with 100
148
+ QPS for the job controller, where each pod sleeps for 30s, I obtained the
149
+ following results (averaged for 3 runs, from the time the first job got created):
150
+ - Feature disabled, no batching of pod updates : 68s
151
+ - Feature enabled, batching pod updates for 0.5s: 72s (+5.9%)
152
+ - Feature enabled, batching pod updates for 1s: 71s (+4.4%)
145
153
146
154
[ E2E ] : https://testgrid.k8s.io/sig-apps#gce&include-filter-by-regex=apps%5C%5D%20Job
147
155
[ Conformance ] : https://testgrid.k8s.io/conformance-all#Conformance%20-%20GCE%20-%20master&include-filter-by-regex=sig-apps&include-filter-by-regex=Job&exclude-filter-by-regex=CronJob
148
156
149
157
#### GA
150
158
151
159
- Every bug report is fixed.
160
+ - Explore setting different batch periods for regular pod updates versus
161
+ finished pod updates, so we can do less pod readiness updates without
162
+ compromising how fast we can declare a job finished.
152
163
- The job controller ignores the feature gate.
153
164
154
165
#### Deprecation
0 commit comments