-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Summary
Add a new optional method to the Traffic Router Plugin interface that allows plugins to signal whether a ReplicaSet can be safely scaled down. This would enable traffic routers that manage external systems with their own drain/shutdown semantics to prevent premature pod termination.
Use Cases
We're using Argo Rollouts with a custom traffic router plugin for Temporal Worker Versioning. Temporal manages traffic routing to workers externally based on deployment versions, and has its own "drain" lifecycle - when a version is superseded, existing workflows must complete before workers can be safely terminated.
We've encountered two scenarios where pods are terminated before Temporal reports the version as drained:
Scenario 1: Full Promote
- Version A is stable (100% traffic)
- Version B is deployed as canary, progresses through steps
- User clicks "Promote Full"
- Argo shifts 100% traffic to B, B becomes new stable
- Argo starts
scaleDownDelaySecondstimer for A's ReplicaSet - Timer expires → A's pods are deleted
- Problem: Temporal workflows are still running on A's workers
The scaleDownDelaySeconds timer runs independently and is not gated by traffic router plugin responses. Even if our plugin's VerifyWeight is waiting for drain to complete, the scale-down proceeds when the timer expires.
Scenario 2: Rainbow Deployment Abort
- Version A is stable (100% traffic)
- Version B is deployed as canary, reaches 25%
- Version C is deployed before B completes
- Argo correctly starts draining B's traffic and marks B's ReplicaSet for scale-down
- Problem: Argo scales down B's ReplicaSet immediately, killing pods while Temporal workflows are still running on those workers
In both scenarios, workflows can run for hours, so time-based solutions don't work for our case.
What We've Tried
| Approach | Why It Doesn't Work |
|---|---|
scaleDownDelaySeconds |
Delays scale-down but doesn't wait for actual drain completion. The timer is fixed and not gated by any external condition. |
terminationGracePeriodSeconds + preStop hook |
We use KEDA to scale workers based on queue depth. Pods in Terminating state can't be "un-terminated" if KEDA needs to scale up. |
Traffic router plugin VerifyWeight |
Only called for traffic operations, not before ReplicaSet scale-down. The scaleDownDelaySeconds timer runs independently. |
Traffic router plugin UpdateHash |
Returning an error here blocks the new rollout from proceeding, creating a deadlock where old pods stay but new rollout can't progress. |
Proposed Solution
Add an optional method to the Traffic Router Plugin interface:
// CanScaleDown is called before scaling down a ReplicaSet.
// Plugins can return false to delay scale-down until an external condition is met.
// This is useful for traffic routers that manage external systems with drain semantics.
//
// Parameters:
// - rollout: The rollout being processed
// - replicaSetHash: The hash of the ReplicaSet being considered for scale-down
//
// Returns:
// - canScaleDown: true if the ReplicaSet can be safely scaled down
// - message: optional message explaining why scale-down is delayed (for status/events)
// - error: if an error occurred checking the condition
type CanScaleDown func(
rollout *v1alpha1.Rollout,
replicaSetHash string,
) (canScaleDown bool, message string, err error)The Argo Rollouts controller would call this method before scaling down any ReplicaSet managed by a traffic router plugin. If canScaleDown returns false, the controller would:
- Skip scaling down that ReplicaSet for this reconciliation cycle
- Optionally surface the message in rollout status/events
- Retry on the next reconciliation
This check would occur after scaleDownDelaySeconds expires but before actually scaling down the ReplicaSet, giving plugins the final say on whether scale-down is safe.
Alternative: Annotation-based delay
A simpler alternative would be an annotation that plugins can set on ReplicaSets to prevent scale-down:
metadata:
annotations:
rollouts.argoproj.io/scale-down-blocked: "true"
rollouts.argoproj.io/scale-down-blocked-reason: "Temporal drain in progress"The controller would skip scaling down ReplicaSets with this annotation. Traffic router plugins would be responsible for adding/removing it.
Scenarios Addressed
This hook would cover all ReplicaSet scale-down scenarios:
| Scenario | Current Behavior | With Hook |
|---|---|---|
| Full Promote | scaleDownDelaySeconds timer runs independently of traffic router drain status |
Hook blocks scale-down until drain completes |
| Rainbow Abort | Old canary scaled down immediately when new canary starts | Hook blocks until old canary is drained |
| Rollback | Canary scaled down immediately | Hook can verify canary is drained |
| Normal Completion | Works with VerifyWeight at 0% |
Hook provides additional safety |
Impact
This would enable Argo Rollouts to integrate with external systems that have their own lifecycle management, such as:
- Temporal worker versioning
- Systems with long-running connections/sessions
- Message queue consumers that need to finish processing
- Any traffic router where "drained" is determined by an external system rather than time
Am happy to look at contributing if this seems like a viable solution, thanks!
Message from the maintainers:
Need this enhancement? Give it a 👍. We prioritize the issues with the most 👍.