| title |
|---|
Diffusion |
Dynamo SGLang supports three types of diffusion-based generation: LLM diffusion (text generation via iterative refinement), image diffusion (text-to-image), and video generation (text-to-video). Each uses a different worker flag and handler, but all integrate with SGLang's DiffGenerator.
| Type | Worker Flag | API Endpoint |
|---|---|---|
| LLM Diffusion | --dllm-algorithm <algo> |
/v1/chat/completions, /v1/completions |
| Image Diffusion | --image-diffusion-worker |
/v1/images/generations |
| Video Generation | --video-generation-worker |
/v1/videos |
Diffusion Language Models generate text through iterative refinement rather than autoregressive token-by-token generation. The model starts with masked tokens and progressively replaces them with predictions, refining low-confidence tokens each step.
LLM diffusion is auto-detected: when --dllm-algorithm is set, the worker automatically uses DiffusionWorkerHandler without needing a separate flag. For more details on diffusion algorithms, see the SGLang Diffusion Language Models documentation.
cd $DYNAMO_HOME/examples/backends/sglang
./launch/diffusion_llada.shSee the launch script for configuration options.
curl -X POST http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "inclusionAI/LLaDA2.0-mini-preview",
"messages": [{"role": "user", "content": "Explain why Roger Federer is considered one of the greatest tennis players of all time"}],
"temperature": 0.7,
"max_tokens": 512
}'Image diffusion workers generate images from text prompts using SGLang's DiffGenerator. Generated images are returned as either URLs (when using --media-output-fs-url for storage) or base64 data, in an OpenAI-compatible response format.
cd $DYNAMO_HOME/examples/backends/sglang
./launch/image_diffusion.shSupports local storage (--fs-url file:///tmp/images) and S3 (--fs-url s3://bucket). Pass --http-url to set the base URL for serving stored images. See the launch script for all configuration options.
curl http://localhost:8000/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"model": "black-forest-labs/FLUX.1-dev",
"prompt": "Explain why Roger Federer is considered one of the greatest tennis players of all time",
"size": "1024x1024",
"response_format": "url",
"nvext": {
"num_inference_steps": 15
}
}'Video generation workers produce videos from text or image prompts using SGLang's DiffGenerator with frame-to-video encoding. Supports text-to-video (T2V) and image-to-video (I2V) workflows.
cd $DYNAMO_HOME/examples/backends/sglang
./launch/text-to-video-diffusion.shUse --wan-size 1b (default, 1 GPU) or --wan-size 14b (2 GPUs). See the launch script for all configuration options.
curl http://localhost:8000/v1/videos \
-H "Content-Type: application/json" \
-d '{
"prompt": "Roger Federer winning his 19th grand slam",
"model": "Wan-AI/Wan2.1-T2V-1.3B-Diffusers",
"seconds": 2,
"size": "832x480",
"response_format": "url",
"nvext": {
"fps": 8,
"num_frames": 17,
"num_inference_steps": 50
}
}'- Examples: Launch scripts for all deployment patterns
- Reference Guide: Worker types and argument reference
- SGLang Diffusion LMs (upstream): SGLang diffusion documentation