Skip to content

[Serve LLM] Fix OpenAiIngress scale-to-zero when all models set min_replicas=0#60836

Open
thjung123 wants to merge 1 commit intoray-project:masterfrom
thjung123:fix-ingress-scale-to-zero
Open

[Serve LLM] Fix OpenAiIngress scale-to-zero when all models set min_replicas=0#60836
thjung123 wants to merge 1 commit intoray-project:masterfrom
thjung123:fix-ingress-scale-to-zero

Conversation

@thjung123
Copy link

Description

Summary

  • Fix an issue where OpenAiIngress does not scale to zero when all LLM models
    are configured with min_replicas=0
  • Update get_deployment_options() to inspect llm_configs and propagate
    min_replicas=0 to the ingress when appropriate
  • Applies to all ingress types inheriting from OpenAiIngress

Problem

OpenAiIngress.get_deployment_options() previously returned
DEFAULT_INGRESS_OPTIONS as-is, ignoring the provided llm_configs.
Because DEFAULT_INGRESS_OPTIONS does not include a min_replicas field,
Ray Serve’s AutoscalingConfig defaults to min_replicas=1, which prevents
the Serve application from fully scaling to zero even when all model
deployments have scaled down.

Solution

  • Determine whether all LLM deployments explicitly configure
    autoscaling_config.min_replicas=0
  • If so, propagate min_replicas=0 to the ingress deployment options
  • Otherwise, preserve the existing default behavior
  • Continue to give precedence to user-provided ingress_deployment_config

Fixes #60664

@thjung123 thjung123 requested a review from a team as a code owner February 7, 2026 18:08
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an issue where OpenAiIngress would not scale to zero even if all underlying LLM models were configured with min_replicas=0. The fix correctly inspects the llm_configs and propagates min_replicas=0 to the ingress deployment options when appropriate. The changes are well-implemented and include a comprehensive set of tests covering various scenarios. My review includes a couple of suggestions to improve code conciseness and maintainability.

Comment on lines 797 to 796
options = {
k: (v.copy() if isinstance(v, dict) else v)
for k, v in DEFAULT_INGRESS_OPTIONS.items()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For creating a mutable copy of DEFAULT_INGRESS_OPTIONS, using copy.deepcopy() is more idiomatic and robust against future changes to the structure of DEFAULT_INGRESS_OPTIONS. This avoids potential bugs if the dictionary becomes more deeply nested.

You'll need to add import copy at the top of the file.

        options = copy.deepcopy(DEFAULT_INGRESS_OPTIONS)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally used a shallow dict comprehension here since
DEFAULT_INGRESS_OPTIONS is only two levels deep and all nested values are plain dicts.

Using copy.deepcopy would also work, but feels a bit overkill for the current structure.
Happy to switch if maintainers prefer it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes let's do deepcopy to avoid future bugs that might get introduced if overlooked. The cost is minimal since these dicts don't get that big and are not on the data path.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, thanks! I've updated this to use copy.deepcopy() as suggested.

@thjung123 thjung123 force-pushed the fix-ingress-scale-to-zero branch from 38ff781 to 4d394fb Compare February 7, 2026 18:14
@thjung123 thjung123 changed the title [Serve] Fix OpenAIIngress scale-to-zero when all models have min_repl… [Serve] Fix OpenAiIngress scale-to-zero when all models set min_replicas=0 Feb 7, 2026
@thjung123 thjung123 force-pushed the fix-ingress-scale-to-zero branch from 4d394fb to a324643 Compare February 7, 2026 18:49
@ray-gardener ray-gardener bot added the community-contribution Contributed by the community label Feb 7, 2026
Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the nit. LGTM otherwise. Thanks.

Comment on lines 797 to 796
options = {
k: (v.copy() if isinstance(v, dict) else v)
for k, v in DEFAULT_INGRESS_OPTIONS.items()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes let's do deepcopy to avoid future bugs that might get introduced if overlooked. The cost is minimal since these dicts don't get that big and are not on the data path.

@kouroshHakha kouroshHakha changed the title [Serve] Fix OpenAiIngress scale-to-zero when all models set min_replicas=0 [Serve LLM] Fix OpenAiIngress scale-to-zero when all models set min_replicas=0 Feb 7, 2026
…icas=0

Signed-off-by: thjung123 <jeothen@gmail.com>
@thjung123 thjung123 force-pushed the fix-ingress-scale-to-zero branch from a324643 to f084aeb Compare February 8, 2026 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ServeLLM] Downscaling models to 0 should also downscale OpenAIIngress to 0

2 participants