Skip to content

[Feature] Support launch_kwargs for local models #9100

@GR4HAM

Description

@GR4HAM

What feature would you like to see?

I propose to add a small but useful extension to LocalProvider.launch() that lets users pass through arbitrary SGLang server flags via:

lm.launch_kwargs = {"extra_args": "..."}

This enables use cases like controlling CUDA graph batch sizes, enabling/disabling graph capture, tuning KV cache behavior, etc., without modifying dspy internals.

Motivation

SGLang exposes many important runtime configuration flags (e.g. --cuda-graph-max-bs, --cuda-graph-bs, --disable-cuda-graph, etc.).
Currently, dspy does not offer a way to pass these flags through when launching a local model. This limits advanced performance tuning (especially for GPU memory–constrained deployments such as Modal instances).

Implementation Details

A small patch inside LocalProvider.launch():

    timeout = launch_kwargs.get("timeout", 1800)
    command = f"python -m sglang.launch_server --model-path {model} --port {port} --host 0.0.0.0"
  •    # Allow user to supply extra CLI arguments, e.g. CUDA graph flags
    
  •    extra_args = launch_kwargs.get("extra_args")
    
  •    if extra_args:
    
  •        command = f"{command} {extra_args}"
    
      # We will manually stream & capture logs.
      process = subprocess.Popen(
          command.replace("\\\n", " ").replace("\\", " ").split(),
    

Example Usage

lm.launch_kwargs = {
"timeout": 1800,
"extra_args": "--cuda-graph-max-bs 96 --cuda-graph-bs 1 8 32 64 96"
}
lm.launch()

This minimal change should be completely backwards compatible and not impact existing users in any way.

Would you like to contribute?

  • Yes, I'd like to help implement this.
  • No, I just want to request it.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions