-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Description
What happened + What you expected to happen
The default autoscaling policy interprets downscale_delay_s and upscale_delay_s by counting control-loop iterations and comparing to int(delay_s / CONTROL_LOOP_INTERVAL_S).
ray/python/ray/serve/autoscaling_policy.py
Line 101 in 23e587f
| if decision_counter < -int(delay_s / CONTROL_LOOP_INTERVAL_S): |
That implicitly assumes each iteration corresponds to roughly CONTROL_LOOP_INTERVAL_S seconds of wall time. In reality, the controller spends time executing run_control_loop_step() before sleeping for CONTROL_LOOP_INTERVAL_S, so the interval between two policy invocations is approximately loop_step_duration + CONTROL_LOOP_INTERVAL_S.
When loop_step_duration is not negligible (e.g. large clusters, heavy deployment state), the observed time before scaling actions can be much larger than the configured delay, even though the parameter name and docs read as seconds.
The same counting logic applies to upscale (upscale_delay_s).
Versions / Dependencies
ray=2.54
Reproduction script
By adding an extra await asyncio.sleep(0.5) to the async def run_control_loop_step function to simulate a time-consuming step, the actual downscale time increased by a factor of 5.
Issue Severity
Low: It annoys or frustrates me.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status