dynamo/docs/backends/sglang/README.md at main · drivenets/dynamo

title
SGLang

Use the Latest Release

We recommend using the latest stable release of Dynamo to avoid breaking changes.

Dynamo SGLang integrates SGLang engines into Dynamo's distributed runtime, enabling disaggregated serving, KV-aware routing, and request cancellation while maintaining full compatibility with SGLang's native engine arguments. It supports LLM inference, embedding models, multimodal vision models, and diffusion-based generation (LLM, image, video).

Installation

Install Latest Release

We recommend using uv to install:

uv venv --python 3.12 --seed
uv pip install "ai-dynamo[sglang]"

This installs Dynamo with the compatible SGLang version.

Install for Development

Requires Rust and the CUDA toolkit (`nvcc`).

# install dynamo
uv venv --python 3.12 --seed
uv pip install maturin nixl
cd $DYNAMO_HOME/lib/bindings/python
maturin develop --uv
cd $DYNAMO_HOME
uv pip install -e .
# install sglang
git clone https://github.com/sgl-project/sglang.git
cd sglang && uv pip install -e "python"

This is the ideal way for agents to also develop. You can provide the path to both repos and the virtual environment and have it rerun these commands as it makes changes

Docker

```bash cd $DYNAMO_ROOT python container/render.py --framework sglang --output-short-filename docker build -f container/rendered.Dockerfile -t dynamo:latest-sglang . ```

docker run \
    --gpus all -it --rm \
    --network host --shm-size=10G \
    --ulimit memlock=-1 --ulimit stack=67108864 \
    --ulimit nofile=65536:65536 \
    --cap-add CAP_SYS_PTRACE --ipc host \
    dynamo:latest-sglang

Feature Support Matrix

Feature	Status	Notes
Disaggregated Serving	✅	Prefill/decode separation with NIXL KV transfer
KV-Aware Routing	✅
SLA-Based Planner	✅
Multimodal Support	✅	Image via EPD, E/PD, E/P/D patterns
Diffusion Models	✅	LLM diffusion, image, and video generation
Request Cancellation	✅	Aggregated full; disaggregated decode-only
Graceful Shutdown	✅	Discovery unregister + grace period
Observability	✅	Metrics, tracing, and Grafana dashboards
KVBM	❌	Planned

Quick Start

Python / CLI Deployment

Start infrastructure services for local development:

docker compose -f deploy/docker-compose.yml up -d

Launch an aggregated serving deployment:

cd $DYNAMO_HOME/examples/backends/sglang
./launch/agg.sh

Verify the deployment:

curl localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-0.6B",
    "messages": [{"role": "user", "content": "Explain why Roger Federer is considered one of the greatest tennis players of all time"}],
    "stream": true,
    "max_tokens": 30
  }'

Kubernetes Deployment

You can deploy SGLang with Dynamo on Kubernetes using a DynamoGraphDeployment. For more details, see the SGLang Kubernetes Deployment Guide.

Next Steps

Reference Guide: Worker types, architecture, and configuration
Examples: All deployment patterns with launch scripts
Disaggregation: P/D architecture and KV transfer details
Diffusion: LLM, image, and video diffusion models
Observability: Metrics, tracing, and Grafana dashboards
Deploying SGLang with Dynamo on Kubernetes: Kubernetes deployment guide

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the Latest Release

Installation

Install Latest Release

Install for Development

Docker

Feature Support Matrix

Quick Start

Python / CLI Deployment

Kubernetes Deployment

Next Steps

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Use the Latest Release

Installation

Install Latest Release

Install for Development

Docker

Feature Support Matrix

Quick Start

Python / CLI Deployment

Kubernetes Deployment

Next Steps