You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -210,12 +214,30 @@ This feature is already implemented for Jobs ([KEP-3939](https://github.com/kube
210
214
211
215
### Risks and Mitigations
212
216
217
+
#### Feature Impact
218
+
213
219
Deployment rollouts might be slower when using the `TerminationComplete` PodReplacementPolicy.
214
220
215
221
Deployment rollouts might consume excessive resources when using the `TerminationStarted` PodReplacementPolicy.
216
222
217
223
This is mitigated by making this feature opt-in.
218
224
225
+
#### kubectl Skew
226
+
The `deployment.kubernetes.io/replicaset-replicas-before-scale` annotation should be removed during
227
+
deployment rollback when annotations are copied from the ReplicaSet to the Deployment. Support for
228
+
this removal will be added to kubectl in the same release as this feature. Therefore, rollback using
229
+
an older kubectl will not be supported until one minor release after the feature first reaches
230
+
alpha. The documentation for Deployments will include a notice about this.
231
+
232
+
If an older kubectl version is used, the impact should be minimal. The deployment may end up with an
233
+
unnecessary `deployment.kubernetes.io/replicaset-replicas-before-scale` annotation. The deployment
234
+
controller then synchronizes Deployment annotations back to the ReplicaSet. This is done by the
235
+
Deployment controller, which will ignore this new annotations if the feature gate is on.
236
+
237
+
The bug should be mainly visual (extra annotation in the Deployment), unless the feature is turned
238
+
on and off in a succession. In this case, incorrect annotations could end up on a ReplicaSet, which
239
+
would affect the scaling proportions during a rollout.
240
+
219
241
## Design Details
220
242
221
243
### Deployment Behavior Changes
@@ -242,7 +264,8 @@ Scaling logic:
242
264
ReplicaSets as well and not spawn replicas that would be higher than Deployment's
243
265
`.spec.replicas + .spec.strategy.rollingUpdate.maxSurge`. This will be implemented by
244
266
checking ReplicaSet's `.spec.replicas`, `.status.replicas` and `.status.terminatingReplicas`
245
-
to determine the number of pods.
267
+
to determine the number of pods. See [Deployment Scaling Changes and a New Annotation for ReplicaSets](#deployment-scaling-changes-and-a-new-annotation-for-replicasets)
268
+
for more details.
246
269
- Changing scaling down logic is not necessary, and we can scale down as many pods as we want
247
270
because the policy does not affect this since we are not replacing the pods.
248
271
@@ -256,6 +279,101 @@ To satisfy the requirement for tracking terminating pods, and for implementation
256
279
we propose a new field `.status.terminatingReplicas` to the ReplicaSet's and Deployment's
257
280
status.
258
281
282
+
### Deployment Scaling Changes and a New Annotation for ReplicaSets
283
+
284
+
Currently, scaling is done proportionally over all ReplicaSets to mitigate the risk of losing
285
+
availability during a rolling update.
286
+
287
+
To calculate the new ReplicaSet size, we need to know
288
+
-`replicasBeforeScale`: The `.spec.replicas` of the ReplicaSet before the scaling began.
289
+
-`deploymentMaxReplicas`: Equals to `.spec.replicas + .spec.strategy.rollingUpdate.maxSurge` of
290
+
the current Deployment.
291
+
-`deploymentMaxReplicasBeforeScale`: Equals to
292
+
`.spec.replicas + .spec.strategy.rollingUpdate.maxSurge` of the old Deployment. This information
293
+
is stored in the `deployment.kubernetes.io/max-replicas` annotation in each ReplicaSet.
294
+
295
+
Then we can calculate a new size for each ReplicaSet proportionally as follows:
This is currently done in the [getReplicaSetFraction](https://github.com/kubernetes/kubernetes/blob/1cfaa95cab0f69ecc62ad9923eec2ba15f01fc2a/pkg/controller/deployment/util/deployment_util.go#L492-L512)
302
+
function. The leftover pods are added to the newest ReplicaSet.
303
+
304
+
This results in the following scaling behavior.
305
+
306
+
The first scale operation occurs at T2 and the second scale at T3.
307
+
308
+
| Time | Terminating Pods | RS1 Replicas | RS2 Replicas | RS3 Replicas | All RS Total | Deployment .spec.replicas | Deployment .spec.replicas + MaxSurge | Scale ratio |
- At T2, a ful scale was done for RS1 with a ratio of 1.182. RS1 can then use the new scale ratio
357
+
at T3 with a value of 1.077.
358
+
- RS2 has been partially scaled (1.182 ratio) and RS3 has not been scaled at all at T2 due to the
359
+
terminating pods. When a new scale occurs at T3, RS2 and RS3 have not yet completed the first
360
+
scale. So their annotations still point to the T1 state. A new ratio of 1.273 is calculated and
361
+
used for the second scale.
362
+
363
+
As we can see, we will get a slightly different result when compared to the first table. This is
364
+
due to the consecutive scales and the fact that the last scale is not yet fully completed.
365
+
366
+
The consecutive partial scaling behavior is a best effort. We still adhere to all deployment
367
+
constraints and have a bias toward scaling the newest ReplicaSet. To implement this properly we
368
+
would have to introduce a full scaling history, which is probably not worth the added complexity.
369
+
370
+
### kubectl Changes
371
+
372
+
Similar to `deployment.kubernetes.io/max-replicas`, we have to remove
373
+
`deployment.kubernetes.io/replicaset-replicas-before-scale` annotations from [annotationsToSkip](https://github.com/kubernetes/kubernetes/blob/9e2075b3c87061d25759b0ad112266f03601afd8/staging/src/k8s.io/kubectl/pkg/polymorphichelpers/rollback.go#L184)
374
+
to support rollbacks.
375
+
See [kubectl Skew](#kubectl-skew) for more details.
0 commit comments