[Serve LLM] Fix OpenAiIngress scale-to-zero when all models set min_replicas=0#60836
[Serve LLM] Fix OpenAiIngress scale-to-zero when all models set min_replicas=0#60836thjung123 wants to merge 1 commit intoray-project:masterfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request addresses an issue where OpenAiIngress would not scale to zero even if all underlying LLM models were configured with min_replicas=0. The fix correctly inspects the llm_configs and propagates min_replicas=0 to the ingress deployment options when appropriate. The changes are well-implemented and include a comprehensive set of tests covering various scenarios. My review includes a couple of suggestions to improve code conciseness and maintainability.
| options = { | ||
| k: (v.copy() if isinstance(v, dict) else v) | ||
| for k, v in DEFAULT_INGRESS_OPTIONS.items() | ||
| } |
There was a problem hiding this comment.
For creating a mutable copy of DEFAULT_INGRESS_OPTIONS, using copy.deepcopy() is more idiomatic and robust against future changes to the structure of DEFAULT_INGRESS_OPTIONS. This avoids potential bugs if the dictionary becomes more deeply nested.
You'll need to add import copy at the top of the file.
options = copy.deepcopy(DEFAULT_INGRESS_OPTIONS)There was a problem hiding this comment.
I intentionally used a shallow dict comprehension here since
DEFAULT_INGRESS_OPTIONS is only two levels deep and all nested values are plain dicts.
Using copy.deepcopy would also work, but feels a bit overkill for the current structure.
Happy to switch if maintainers prefer it.
There was a problem hiding this comment.
yes let's do deepcopy to avoid future bugs that might get introduced if overlooked. The cost is minimal since these dicts don't get that big and are not on the data path.
There was a problem hiding this comment.
Agreed, thanks! I've updated this to use copy.deepcopy() as suggested.
38ff781 to
4d394fb
Compare
4d394fb to
a324643
Compare
kouroshHakha
left a comment
There was a problem hiding this comment.
Just the nit. LGTM otherwise. Thanks.
| options = { | ||
| k: (v.copy() if isinstance(v, dict) else v) | ||
| for k, v in DEFAULT_INGRESS_OPTIONS.items() | ||
| } |
There was a problem hiding this comment.
yes let's do deepcopy to avoid future bugs that might get introduced if overlooked. The cost is minimal since these dicts don't get that big and are not on the data path.
…icas=0 Signed-off-by: thjung123 <jeothen@gmail.com>
a324643 to
f084aeb
Compare
Description
Summary
OpenAiIngressdoes not scale to zero when all LLM modelsare configured with
min_replicas=0get_deployment_options()to inspectllm_configsand propagatemin_replicas=0to the ingress when appropriateOpenAiIngressProblem
OpenAiIngress.get_deployment_options()previously returnedDEFAULT_INGRESS_OPTIONSas-is, ignoring the providedllm_configs.Because
DEFAULT_INGRESS_OPTIONSdoes not include amin_replicasfield,Ray Serve’s
AutoscalingConfigdefaults tomin_replicas=1, which preventsthe Serve application from fully scaling to zero even when all model
deployments have scaled down.
Solution
autoscaling_config.min_replicas=0min_replicas=0to the ingress deployment optionsingress_deployment_configFixes #60664