@@ -94,8 +94,10 @@ field based on the number of Pods that have the `Ready` condition.
94
94
95
95
### Risks and Mitigations
96
96
97
- An increase in Job status updates. This is capped by the number of times Pods
98
- reach the ready State, usually once in their lifetime.
97
+ - An increase in Job status updates. To mitigate this, the job controller holds
98
+ the Pod updates that happen in X ms before syncing a Job. X will be determined
99
+ from experiments on integration tests, but we expect it to be between 500ms
100
+ and 1s.
99
101
100
102
## Design Details
101
103
@@ -189,7 +191,7 @@ The Job controller will start populating the field again.
189
191
190
192
###### Are there any tests for feature enablement/disablement?
191
193
192
- Yes, at unit and integration level.
194
+ Yes, there will be tests at unit and integration level.
193
195
194
196
### Rollout, Upgrade and Rollback Planning
195
197
@@ -222,8 +224,8 @@ The feature applies to all Jobs, unless the feature gate is disabled.
222
224
223
225
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
224
226
225
- The 99% percentile of Job status updates below 1s, when the controller doesn't
226
- create new Pods or tracks finishing Pods.
227
+ The 99% percentile of Job status sync (processing+API calls) is below 2s, when
228
+ the controller doesn't create new Pods or tracks finishing Pods.
227
229
228
230
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
229
231
0 commit comments