Skip to content

Commit c1353e4

Browse files
authored
feat: support dynamic scaling of stable ReplicaSet as inverse of canary weight (argoproj#1430)
Signed-off-by: Jesse Suen <[email protected]>
1 parent bab546d commit c1353e4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+2271
-603
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
dist/
55
*.iml
66
# delve debug binaries
7+
__debug_bin
78
cmd/**/debug
89
debug.test
910
coverage.out

docs/features/canary.md

Lines changed: 53 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ If no `duration` is specified for a pause step, the rollout will be paused indef
6666
kubectl argo rollouts promote <rollout>
6767
```
6868

69-
## Controlling Canary Scale
69+
## Dynamic Canary Scale (with Traffic Routing)
7070

7171
By default, the rollout controller will scale the canary to match the current trafficWeight of the
7272
current step. For example, if the current weight is 25%, and there are four replicas, then the
@@ -109,11 +109,59 @@ spec:
109109
matchTrafficWeight: true
110110
```
111111

112-
If no `duration` is specified for a pause step, the rollout will be paused indefinitely. To unpause, use the [argo kubectl plugin](kubectl-plugin.md) `promote` command.
112+
When using `setCanaryScale` with explicit values for either replicas or weight, one must be careful
113+
if used in conjunction with the `setWeight` step. If done incorrectly, an imbalanced amount of traffic
114+
may be directed to the canary (in proportion to the Rollout's scale). For example, the following set
115+
of steps would cause 90% of traffic to only be served by 10% of pods:
113116

114-
```shell
115-
# promote to the next step
116-
kubectl argo rollouts promote <rollout>
117+
```yaml
118+
spec:
119+
replicas: 10
120+
strategy:
121+
canary:
122+
steps:
123+
- setCanaryScale:
124+
weight: 10
125+
- setWeight: 90
126+
- pause: {}
127+
```
128+
129+
## Dynamic Stable Scale (with Traffic Routing)
130+
131+
!!! important
132+
Available since v1.1
133+
134+
When using traffic routing, by default the stable ReplicaSet is left scaled to 100% during the update.
135+
This has the advantage that if an abort occurs, traffic can be immediately shifted back to the
136+
stable ReplicaSet without delay. However, it has the disadvantage that during the update, there will
137+
eventually exist double the number of replica pods running (similar to in a blue-green deployment),
138+
since the stable ReplicaSet is left scaled up for the full duration of the update.
139+
140+
It is possible to dynamically reduce the scale of the stable ReplicaSet during an update such that
141+
it scales down as the traffic weight increases to canary. This would be desirable in scenarios where
142+
the Rollout has a high replica count and resource cost is a concern, or in bare-metal situations
143+
where it is not possible to create additional node capacity to accommodate double the replicas.
144+
145+
The ability to dynamically scale the stable ReplicaSet can be enabled by setting the
146+
`canary.dynamicStableScale` flag to true:
147+
148+
```yaml
149+
spec:
150+
strategy:
151+
canary:
152+
dynamicStableScale: true
153+
```
154+
155+
NOTE: that if `dynamicStableScale` is set, and the rollout is aborted, the canary ReplicaSet will
156+
dynamically scale down as traffic shifts back to stable. If you wish to leave the canary ReplicaSet
157+
scaled up while aborting, an explicit value for `abortScaleDownDelaySeconds` can be set:
158+
159+
```yaml
160+
spec:
161+
strategy:
162+
canary:
163+
dynamicStableScale: true
164+
abortScaleDownDelaySeconds: 600
117165
```
118166

119167

go.mod

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,7 @@ require (
66
github.com/antonmedv/expr v1.8.9
77
github.com/argoproj/notifications-engine v0.2.1-0.20210525191332-e8e293898477
88
github.com/argoproj/pkg v0.9.0
9-
github.com/aws/aws-sdk-go-v2/config v1.0.0
10-
github.com/aws/aws-sdk-go-v2/internal/ini v1.2.1 // indirect
9+
github.com/aws/aws-sdk-go-v2/config v1.8.1
1110
github.com/aws/aws-sdk-go-v2/service/cloudwatch v1.5.0
1211
github.com/aws/aws-sdk-go-v2/service/elasticloadbalancingv2 v1.6.1
1312
github.com/blang/semver v3.5.1+incompatible

go.sum

Lines changed: 18 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -176,30 +176,32 @@ github.com/aws/aws-sdk-go v1.31.13/go.mod h1:5zCpMtNQVjRREroY7sYe8lOMRSxkhG6MZve
176176
github.com/aws/aws-sdk-go v1.33.16/go.mod h1:5zCpMtNQVjRREroY7sYe8lOMRSxkhG6MZveU8YkpAk0=
177177
github.com/aws/aws-sdk-go v1.35.24/go.mod h1:tlPOdRjfxPBpNIwqDj61rmsnA85v9jc0Ps9+muhnW+k=
178178
github.com/aws/aws-sdk-go-v2 v0.18.0/go.mod h1:JWVYvqSMppoMJC0x5wdwiImzgXTI9FuZwxzkQq9wy+g=
179-
github.com/aws/aws-sdk-go-v2 v1.0.0/go.mod h1:smfAbmpW+tcRVuNUjo3MOArSZmW72t62rkCzc2i0TWM=
180179
github.com/aws/aws-sdk-go-v2 v1.7.0/go.mod h1:tb9wi5s61kTDA5qCkcDbt3KRVV74GGslQkl/DRdX/P4=
181-
github.com/aws/aws-sdk-go-v2 v1.8.1 h1:GcFgQl7MsBygmeeqXyV1ivrTEmsVz/rdFJaTcltG9ag=
182180
github.com/aws/aws-sdk-go-v2 v1.8.1/go.mod h1:xEFuWz+3TYdlPRuo+CqATbeDWIWyaT5uAPwPaWtgse0=
183-
github.com/aws/aws-sdk-go-v2/config v1.0.0 h1:x6vSFAwqAvhYPeSu60f0ZUlGHo3PKKmwDOTL8aMXtv4=
184-
github.com/aws/aws-sdk-go-v2/config v1.0.0/go.mod h1:WysE/OpUgE37tjtmtJd8GXgT8s1euilE5XtUkRNUQ1w=
185-
github.com/aws/aws-sdk-go-v2/credentials v1.0.0 h1:0M7netgZ8gCV4v7z1km+Fbl7j6KQYyZL7SS0/l5Jn/4=
186-
github.com/aws/aws-sdk-go-v2/credentials v1.0.0/go.mod h1:/SvsiqBf509hG4Bddigr3NB12MIpfHhZapyBurJe8aY=
187-
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.0.0 h1:lO7fH5n7Q1dKcDBpuTmwJylD1bOQiRig8LI6TD9yVQk=
188-
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.0.0/go.mod h1:wpMHDCXvOXZxGCRSidyepa8uJHY4vaBGfY2/+oKU/Bc=
189-
github.com/aws/aws-sdk-go-v2/internal/ini v1.2.1 h1:IkqRRUZTKaS16P2vpX+FNc2jq3JWa3c478gykQp4ow4=
190-
github.com/aws/aws-sdk-go-v2/internal/ini v1.2.1/go.mod h1:Pv3WenDjI0v2Jl7UaMFIIbPOBbhn33RmmAmGgkXDoqY=
181+
github.com/aws/aws-sdk-go-v2 v1.9.0 h1:+S+dSqQCN3MSU5vJRu1HqHrq00cJn6heIMU7X9hcsoo=
182+
github.com/aws/aws-sdk-go-v2 v1.9.0/go.mod h1:cK/D0BBs0b/oWPIcX/Z/obahJK1TT7IPVjy53i/mX/4=
183+
github.com/aws/aws-sdk-go-v2/config v1.8.1 h1:AcAenV2NVwOViG+3ts73uT08L1olN4NBNNz7lUlHSUo=
184+
github.com/aws/aws-sdk-go-v2/config v1.8.1/go.mod h1:AQtpYfVYjuuft4Dgh0jGSkPQJ9MvmK9vXfSub7oSXlI=
185+
github.com/aws/aws-sdk-go-v2/credentials v1.4.1 h1:oDiUP50hKRwC6xAgESAj46lgL2prJRZQWnCBzn+TU/c=
186+
github.com/aws/aws-sdk-go-v2/credentials v1.4.1/go.mod h1:dgGR+Qq7Wjcd4AOAW5Rf5Tnv3+x7ed6kETXyS9WCuAY=
187+
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.5.0 h1:OxTAgH8Y4BXHD6PGCJ8DHx2kaZPCQfSTqmDsdRZFezE=
188+
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.5.0/go.mod h1:CpNzHK9VEFUCknu50kkB8z58AH2B5DvPP7ea1LHve/Y=
189+
github.com/aws/aws-sdk-go-v2/internal/ini v1.2.2 h1:d95cddM3yTm4qffj3P6EnP+TzX1SSkWaQypXSgT/hpA=
190+
github.com/aws/aws-sdk-go-v2/internal/ini v1.2.2/go.mod h1:BQV0agm+JEhqR+2RT5e1XTFIDcAAV0eW6z2trp+iduw=
191191
github.com/aws/aws-sdk-go-v2/service/cloudwatch v1.5.0 h1:XO1uX7dQKWfD0WzycEfz+bL/7rl0SsQ05VJwLPWGzGM=
192192
github.com/aws/aws-sdk-go-v2/service/cloudwatch v1.5.0/go.mod h1:acH3+MQoiMzozT/ivU+DbRg7Ooo2298RdRaWcOv+4vM=
193193
github.com/aws/aws-sdk-go-v2/service/elasticloadbalancingv2 v1.6.1 h1:mGc8UvJS4XJv8Tp7Doxlx2p3vfwPx46K9zg+9s9szPE=
194194
github.com/aws/aws-sdk-go-v2/service/elasticloadbalancingv2 v1.6.1/go.mod h1:lGKz4aJbqGX+pgyXG47ZBAJPjwrlA5+TJsAuJ2+aE2g=
195-
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.0.0 h1:IAutMPSrynpvKOpHG6HyWHmh1xmxWAmYOK84NrQVqVQ=
196-
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.0.0/go.mod h1:3jExOmpbjgPnz2FJaMOfbSk1heTkZ66aD3yNtVhnjvI=
197-
github.com/aws/aws-sdk-go-v2/service/sts v1.0.0 h1:6XCgxNfE4L/Fnq+InhVNd16DKc6Ue1f3dJl3IwwJRUQ=
198-
github.com/aws/aws-sdk-go-v2/service/sts v1.0.0/go.mod h1:5f+cELGATgill5Pu3/vK3Ebuigstc+qYEHW5MvGWZO4=
199-
github.com/aws/smithy-go v1.0.0/go.mod h1:EzMw8dbp/YJL4A5/sbhGddag+NPT7q084agLbB9LgIw=
195+
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.3.0 h1:VNJ5NLBteVXEwE2F1zEXVmyIH58mZ6kIQGJoC7C+vkg=
196+
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.3.0/go.mod h1:R1KK+vY8AfalhG1AOu5e35pOD2SdoPKQCFLTvnxiohk=
197+
github.com/aws/aws-sdk-go-v2/service/sso v1.4.0 h1:sHXMIKYS6YiLPzmKSvDpPmOpJDHxmAUgbiF49YNVztg=
198+
github.com/aws/aws-sdk-go-v2/service/sso v1.4.0/go.mod h1:+1fpWnL96DL23aXPpMGbsmKe8jLTEfbjuQoA4WS1VaA=
199+
github.com/aws/aws-sdk-go-v2/service/sts v1.7.0 h1:1at4e5P+lvHNl2nUktdM2/v+rpICg/QSEr9TO/uW9vU=
200+
github.com/aws/aws-sdk-go-v2/service/sts v1.7.0/go.mod h1:0qcSMCyASQPN2sk/1KQLQ2Fh6yq8wm0HSDAimPhzCoM=
200201
github.com/aws/smithy-go v1.5.0/go.mod h1:SObp3lf9smib00L/v3U2eAKG8FyQ7iLrJnQiAmR5n+E=
201-
github.com/aws/smithy-go v1.7.0 h1:+cLHMRrDZvQ4wk+KuQ9yH6eEg6KZEJ9RI2IkDqnygCg=
202202
github.com/aws/smithy-go v1.7.0/go.mod h1:SObp3lf9smib00L/v3U2eAKG8FyQ7iLrJnQiAmR5n+E=
203+
github.com/aws/smithy-go v1.8.0 h1:AEwwwXQZtUwP5Mz506FeXXrKBe0jA8gVM+1gEcSRooc=
204+
github.com/aws/smithy-go v1.8.0/go.mod h1:SObp3lf9smib00L/v3U2eAKG8FyQ7iLrJnQiAmR5n+E=
203205
github.com/aybabtme/rgbterm v0.0.0-20170906152045-cc83f3b3ce59/go.mod h1:q/89r3U2H7sSsE2t6Kca0lfwTK8JdoNGS/yzM/4iH5I=
204206
github.com/beevik/ntp v0.2.0/go.mod h1:hIHWr+l3+/clUnF44zdK+CWW7fO8dR5cIylAQ76NRpg=
205207
github.com/beorn7/perks v0.0.0-20180321164747-3a771d992973/go.mod h1:Dwedo/Wpr24TaqPxmxbtue+5NUziq4I4S80YR8gNf3Q=

manifests/crds/rollout-crd.yaml

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -305,6 +305,8 @@ spec:
305305
type: object
306306
canaryService:
307307
type: string
308+
dynamicStableScale:
309+
type: boolean
308310
maxSurge:
309311
anyOf:
310312
- type: integer
@@ -2799,6 +2801,52 @@ spec:
27992801
- name
28002802
- status
28012803
type: object
2804+
weights:
2805+
properties:
2806+
additional:
2807+
items:
2808+
properties:
2809+
podTemplateHash:
2810+
type: string
2811+
serviceName:
2812+
type: string
2813+
weight:
2814+
format: int32
2815+
type: integer
2816+
required:
2817+
- weight
2818+
type: object
2819+
type: array
2820+
canary:
2821+
properties:
2822+
podTemplateHash:
2823+
type: string
2824+
serviceName:
2825+
type: string
2826+
weight:
2827+
format: int32
2828+
type: integer
2829+
required:
2830+
- weight
2831+
type: object
2832+
stable:
2833+
properties:
2834+
podTemplateHash:
2835+
type: string
2836+
serviceName:
2837+
type: string
2838+
weight:
2839+
format: int32
2840+
type: integer
2841+
required:
2842+
- weight
2843+
type: object
2844+
verified:
2845+
type: boolean
2846+
required:
2847+
- canary
2848+
- stable
2849+
type: object
28022850
type: object
28032851
collisionCount:
28042852
format: int32

manifests/install.yaml

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10183,6 +10183,8 @@ spec:
1018310183
type: object
1018410184
canaryService:
1018510185
type: string
10186+
dynamicStableScale:
10187+
type: boolean
1018610188
maxSurge:
1018710189
anyOf:
1018810190
- type: integer
@@ -12677,6 +12679,52 @@ spec:
1267712679
- name
1267812680
- status
1267912681
type: object
12682+
weights:
12683+
properties:
12684+
additional:
12685+
items:
12686+
properties:
12687+
podTemplateHash:
12688+
type: string
12689+
serviceName:
12690+
type: string
12691+
weight:
12692+
format: int32
12693+
type: integer
12694+
required:
12695+
- weight
12696+
type: object
12697+
type: array
12698+
canary:
12699+
properties:
12700+
podTemplateHash:
12701+
type: string
12702+
serviceName:
12703+
type: string
12704+
weight:
12705+
format: int32
12706+
type: integer
12707+
required:
12708+
- weight
12709+
type: object
12710+
stable:
12711+
properties:
12712+
podTemplateHash:
12713+
type: string
12714+
serviceName:
12715+
type: string
12716+
weight:
12717+
format: int32
12718+
type: integer
12719+
required:
12720+
- weight
12721+
type: object
12722+
verified:
12723+
type: boolean
12724+
required:
12725+
- canary
12726+
- stable
12727+
type: object
1268012728
type: object
1268112729
collisionCount:
1268212730
format: int32

manifests/namespace-install.yaml

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10183,6 +10183,8 @@ spec:
1018310183
type: object
1018410184
canaryService:
1018510185
type: string
10186+
dynamicStableScale:
10187+
type: boolean
1018610188
maxSurge:
1018710189
anyOf:
1018810190
- type: integer
@@ -12677,6 +12679,52 @@ spec:
1267712679
- name
1267812680
- status
1267912681
type: object
12682+
weights:
12683+
properties:
12684+
additional:
12685+
items:
12686+
properties:
12687+
podTemplateHash:
12688+
type: string
12689+
serviceName:
12690+
type: string
12691+
weight:
12692+
format: int32
12693+
type: integer
12694+
required:
12695+
- weight
12696+
type: object
12697+
type: array
12698+
canary:
12699+
properties:
12700+
podTemplateHash:
12701+
type: string
12702+
serviceName:
12703+
type: string
12704+
weight:
12705+
format: int32
12706+
type: integer
12707+
required:
12708+
- weight
12709+
type: object
12710+
stable:
12711+
properties:
12712+
podTemplateHash:
12713+
type: string
12714+
serviceName:
12715+
type: string
12716+
weight:
12717+
format: int32
12718+
type: integer
12719+
required:
12720+
- weight
12721+
type: object
12722+
verified:
12723+
type: boolean
12724+
required:
12725+
- canary
12726+
- stable
12727+
type: object
1268012728
type: object
1268112729
collisionCount:
1268212730
format: int32

pkg/apiclient/rollout/rollout.swagger.json

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -701,6 +701,10 @@
701701
"currentExperiment": {
702702
"type": "string",
703703
"title": "CurrentExperiment indicates the running experiment"
704+
},
705+
"weights": {
706+
"$ref": "#/definitions/github.com.argoproj.argo_rollouts.pkg.apis.rollouts.v1alpha1.TrafficWeights",
707+
"title": "Weights records the weights which have been set on traffic provider. Only valid when using traffic routing"
704708
}
705709
},
706710
"title": "CanaryStatus status fields that only pertain to the canary rollout"
@@ -792,6 +796,10 @@
792796
"type": "integer",
793797
"format": "int32",
794798
"title": "AbortScaleDownDelaySeconds adds a delay in second before scaling down the canary pods when update\nis aborted for canary strategy with traffic routing (not applicable for basic canary).\n0 means canary pods are not scaled down.\nDefault is 30 seconds.\n+optional"
799+
},
800+
"dynamicStableScale": {
801+
"type": "boolean",
802+
"description": "DynamicStableScale is a traffic routing feature which dynamically scales the stable\nReplicaSet to minimize total pods which are running during an update. This is calculated by\nscaling down the stable as traffic is increased to canary. When disabled (the default behavior)\nthe stable ReplicaSet remains fully scaled to support instantaneous aborts."
795803
}
796804
},
797805
"title": "CanaryStrategy defines parameters for a Replica Based Canary"
@@ -1419,6 +1427,49 @@
14191427
},
14201428
"description": "TLSRoute holds the information on the virtual service's TLS/HTTPS routes that are desired to be matched for changing weights."
14211429
},
1430+
"github.com.argoproj.argo_rollouts.pkg.apis.rollouts.v1alpha1.TrafficWeights": {
1431+
"type": "object",
1432+
"properties": {
1433+
"canary": {
1434+
"$ref": "#/definitions/github.com.argoproj.argo_rollouts.pkg.apis.rollouts.v1alpha1.WeightDestination",
1435+
"title": "Canary is the current traffic weight split to canary ReplicaSet"
1436+
},
1437+
"stable": {
1438+
"$ref": "#/definitions/github.com.argoproj.argo_rollouts.pkg.apis.rollouts.v1alpha1.WeightDestination",
1439+
"title": "Stable is the current traffic weight split to stable ReplicaSet"
1440+
},
1441+
"additional": {
1442+
"type": "array",
1443+
"items": {
1444+
"$ref": "#/definitions/github.com.argoproj.argo_rollouts.pkg.apis.rollouts.v1alpha1.WeightDestination"
1445+
},
1446+
"title": "Additional holds the weights split to additional ReplicaSets such as experiment ReplicaSets"
1447+
},
1448+
"verified": {
1449+
"type": "boolean",
1450+
"title": "Verified is an optional indicator that the weight has been verified to have taken effect.\nThis is currently only applicable to ALB traffic router"
1451+
}
1452+
},
1453+
"title": "TrafficWeights describes the current status of how traffic has been split"
1454+
},
1455+
"github.com.argoproj.argo_rollouts.pkg.apis.rollouts.v1alpha1.WeightDestination": {
1456+
"type": "object",
1457+
"properties": {
1458+
"weight": {
1459+
"type": "integer",
1460+
"format": "int32",
1461+
"title": "Weight is an percentage of traffic being sent to this destination"
1462+
},
1463+
"serviceName": {
1464+
"type": "string",
1465+
"title": "ServiceName is the Kubernetes service name traffic is being sent to"
1466+
},
1467+
"podTemplateHash": {
1468+
"type": "string",
1469+
"title": "PodTemplateHash is the pod template hash label for this destination"
1470+
}
1471+
}
1472+
},
14221473
"google.protobuf.Any": {
14231474
"type": "object",
14241475
"properties": {

0 commit comments

Comments
 (0)