[Serve] Skip steady-state per-tick work in DeploymentState via dirty flags#60840
[Serve] Skip steady-state per-tick work in DeploymentState via dirty flags#60840abrarsheikh wants to merge 3 commits intomasterfrom
Conversation
…via dirty flag Signed-off-by: abrar <abrar@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a significant performance optimization for broadcast_running_replicas_if_changed() by using a dirty flag, _replicas_changed. This avoids unnecessary work in steady state, leading to impressive speedups, especially for deployments with a large number of replicas. The implementation is clean and the new tests cover the main scenarios.
I found one potential issue where a lightweight configuration update (like changing max_ongoing_requests) might not set the dirty flag, leading to a missed broadcast. I've added a specific comment with a suggested fix. Other than that, the changes look great.
Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: abrar <abrar@anyscale.com>
broadcast_running_replicas_if_changed() in steady state via dirty flag| self._replica_constructor_retry_counter = 0 | ||
| # Deployment is in steady state: all replicas RUNNING at | ||
| # target version, no pending operations. | ||
| self._needs_reconciliation = False |
There was a problem hiding this comment.
PENDING_MIGRATION replicas stuck due to incomplete steady-state check
Medium Severity
The _needs_reconciliation flag is cleared at line 3058 when the deployment is considered "in steady state," but the pending operations check at lines 3032-3040 only includes STARTING, UPDATING, RECOVERING, and STOPPING states — not PENDING_MIGRATION. This allows the flag to be cleared while PENDING_MIGRATION replicas still exist. Combined with the new fast path in migrate_replicas_on_draining_nodes (lines 3506-3507), these replicas become stuck and never transition back to RUNNING or get stopped when their node stops draining.


The Serve controller runs an update loop every tick for every deployment. In steady state (all replicas healthy, nothing scaling), nearly all of this work is pure waste that scales linearly with replica count:
broadcast_running_replicas_if_changed()constructs NRunningReplicaInfoobjects and builds two sets just to compare them — every tick, for every deployment.check_curr_status()(called twice per tick),scale_deployment_replicas(), andmigrate_replicas_on_draining_nodes()all pop, count, and re-add replicas unconditionally, even when nothing has changed.This directly addresses the existing TODO:
What
Two complementary dirty flags on
DeploymentState:_replicas_changed— guardsbroadcast_running_replicas_if_changed(). When False (and_request_routing_info_updatedis also False), the method returns immediately: zero object construction, zero hashing, zero set comparison. Set when replicas transition state, routing stats change, availability-related fields change, or a config update requires a long-poll broadcast._needs_reconciliation— guards the expensive reconciliation methods. When False, the deployment is in steady state (all replicas RUNNING at target version, status HEALTHY). The following methods early-return:check_curr_status()→ returns(False, False)scale_deployment_replicas()→ returns([], None)migrate_replicas_on_draining_nodes({})→ returns immediatelycheck_and_update_replicas()→ skippedHealth checks on RUNNING/PENDING_MIGRATION replicas and the rank consistency check always run regardless of flag state.
Cleared only when
check_curr_status()confirms: no pending ops, all replicas RUNNING at target version, at target count.Flag-setting sites (both flags are set together at each site):
_set_target_state()— deploy, redeploy, autoscale, manual replica count change_set_target_state_deleting()— deployment deletion_stop_replica()— replica stopped (health check failure, scaling down, version update)_check_startup_replicas()— replica enters RUNNING_stop_or_update_outdated_version_replicas()— RUNNING→UPDATING, orrequires_long_poll_broadcastrecord_replica_startup_failure()— startup failure (may change availability)check_curr_status()—_replica_has_startedflips Truemigrate_replicas_on_draining_nodes()— RUNNING→PENDING_MIGRATION_stop_one_running_replica_for_testing()— test helperRefactoring: Extracted
_check_and_update_transitioning_replicas()fromcheck_and_update_replicas()so the startup/stopping sections can be cleanly guarded by_needs_reconciliationwhile the rank consistency check remains accessible.Benchmark results
Broadcast optimization — steady-state
broadcast_running_replicas_if_changed():Reconciliation optimization — skippable methods (
check_curr_status×2,scale_deployment_replicas,migrate,broadcast):Both fast paths are constant-time (~0.2–0.5µs) regardless of replica count. At 4,096 replicas, the optimization eliminates ~98ms of per-deployment per-tick overhead that was previously pure waste in steady state.