Skip to content

[Serve] downscale_delay_s / upscale_delay_s effective wall-clock delay ignores run_control_loop_step duration #62004

@Wodswos

Description

@Wodswos

What happened + What you expected to happen

The default autoscaling policy interprets downscale_delay_s and upscale_delay_s by counting control-loop iterations and comparing to int(delay_s / CONTROL_LOOP_INTERVAL_S).

if decision_counter < -int(delay_s / CONTROL_LOOP_INTERVAL_S):

That implicitly assumes each iteration corresponds to roughly CONTROL_LOOP_INTERVAL_S seconds of wall time. In reality, the controller spends time executing run_control_loop_step() before sleeping for CONTROL_LOOP_INTERVAL_S, so the interval between two policy invocations is approximately loop_step_duration + CONTROL_LOOP_INTERVAL_S.

When loop_step_duration is not negligible (e.g. large clusters, heavy deployment state), the observed time before scaling actions can be much larger than the configured delay, even though the parameter name and docs read as seconds.

The same counting logic applies to upscale (upscale_delay_s).

Versions / Dependencies

ray=2.54

Reproduction script

By adding an extra await asyncio.sleep(0.5) to the async def run_control_loop_step function to simulate a time-consuming step, the actual downscale time increased by a factor of 5.

Issue Severity

Low: It annoys or frustrates me.

Metadata

Metadata

Labels

Type

No type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions