Skip to content

Commit 6364473

Browse files
authored
add sgl-diffusion.md (#239)
1 parent 5d4b049 commit 6364473

File tree

2 files changed

+208
-0
lines changed

2 files changed

+208
-0
lines changed
Lines changed: 208 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,208 @@
1+
---
2+
title: "SGLang Diffusion: Accelerating Video and Image Generation"
3+
author: "The SGLang Diffusion Team"
4+
date: "November 7, 2025"
5+
previewImg: /images/blog/sgl-diffusion/sgl-diffusion-banner-16-9.png
6+
---
7+
8+
We are excited to introduce SGLang Diffusion, which brings SGLang's state-of-the-art performance to accelerate image and video generation for diffusion models.
9+
SGLang Diffusion supports major open-source video and image generation models (Wan, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux) while providing fast inference speeds and ease of use via multiple API entry points (OpenAI-compatible API, CLI, Python interface). SGLang Diffusion delivers 1.2x–5.9x speedup across diverse workloads.
10+
In collaboration with the FastVideo team, we provide a complete ecosystem for diffusion models, from post-training to production serving. The code is available [here](https://github.com/sgl-project/sglang/tree/main/python/sglang/multimodal_gen).
11+
12+
<iframe
13+
width="600"
14+
height="371"
15+
seamless
16+
frameborder="0"
17+
scrolling="no"
18+
src="https://docs.google.com/spreadsheets/d/e/2PACX-1vT3u_F1P6TIUItyXdTctVV4pJVEcBuyPBTqmrdXR3KeQuiN1OdkIhjVNpZyHUDPw_5ZIKe88w2Xz6Dd/pubchart?oid=1360546403&format=interactive"
19+
style="display:block; margin:15px auto 0 auto;">
20+
</iframe>
21+
22+
<p style="color:gray; text-align: center;">SGL Diffusion Performance Benchmark on an H100 GPU.</p>
23+
24+
## Why Diffusion in SGLang?
25+
26+
With diffusion models becoming the backbone for state-of-the-art image and video generation, we have heard strong community demand for bringing SGLang's signature performance and seamless user experience to these new modalities. We built SGLang Diffusion to answer this call, providing a unified, high-performance engine for both language and diffusion tasks.
27+
28+
This unified approach is crucial, as the future of generation lies in combining architectures.
29+
Pioneering models are already fusing the strengths of autoregressive (AR) and diffusion-based approaches—from models like ByteDance's [Bagel](https://github.com/ByteDance-Seed/Bagel) and Meta's [Transfusion](https://arxiv.org/abs/2408.11039) that use a single transformer for both tasks, to NVIDIA's [Fast-dLLM v2](https://nvlabs.github.io/Fast-dLLM/v2/) which adapts AR models for parallel generation.
30+
31+
SGLang Diffusion is designed to be a future-proof, high-performance solution ready to power these innovative systems.
32+
33+
## Architecture
34+
35+
SGLang Diffusion is engineered for both performance and flexibility, built upon SGLang's battle-tested serving architecture. It inherits the powerful SGLang scheduler and reuses highly-optimized sgl-kernel for maximum efficiency.
36+
37+
At its core, our architecture is designed to accommodate the diverse structures of modern diffusion models. We introduce `ComposedPipelineBase`, a flexible abstraction that orchestrates a series of modular `PipelineStage`s. Each stage encapsulates a common diffusion function—such as the denoising loop in `DenoisingStage` or VAE decoding in `DecodingStage`—allowing developers to easily combine and reuse these components to construct complex, customized pipelines.
38+
39+
To achieve state-of-the-art speed, we integrate advanced parallelism techniques. It supports Unified Sequence Parallelism (USP)—a combination of Ulysses-SP and Ring-Attention—for the core transformer blocks, alongside CFG-parallelism and tensor parallelism (TP) for other model components.
40+
41+
To accelerate development and foster a powerful ecosystem, our system is built on an enhanced fork of **FastVideo**, and we are collaborating closely with their team. This partnership allows SGLang Diffusion to focus on delivering cutting-edge inference speed, while **FastVideo** provides comprehensive support for training-related tasks like model distillation.
42+
43+
## Model Support
44+
45+
We support various popular open-source video & image generation models, including:
46+
- Video models: Wan-series, FastWan, Hunyuan
47+
- Image models: Qwen-Image, Qwen-Image-Edit, Flux
48+
49+
For full list of supported models, reference [here](https://github.com/sgl-project/sglang/blob/main/python/sglang/multimodal_gen/docs/support_matrix.md).
50+
51+
## Usage
52+
53+
For a seamless user experience, we provide a suite of familiar interfaces, including a CLI, a Python engine API, and an OpenAI-compatible API, allowing users to integrate diffusion generation into their workflows with minimal effort.
54+
55+
### Install
56+
57+
SGLang Diffusion can be installed via multiple ways:
58+
59+
```bash
60+
# with pip or uv
61+
uv pip install 'sglang[diffusion]' --prerelease=allow
62+
63+
# from source
64+
git clone https://github.com/sgl-project/sglang.git
65+
cd sglang
66+
uv pip install -e "python[diffusion]" --prerelease=allow
67+
```
68+
### CLI
69+
70+
Launch a server and then send requests:
71+
```bash
72+
sglang serve --model-path black-forest-labs/FLUX.1-dev
73+
74+
curl -s -D >(grep -i x-request-id >&2) \
75+
-o >(jq -r '.data[0].b64_json' | base64 --decode > meme.png) \
76+
-X POST "$OPENAI_API_BASE/images/edits" \
77+
-H "Authorization: Bearer $OPENAI_API_KEY" \
78+
-F "model=Qwen/Qwen-Image-Edit" \
79+
-F "image[][email protected]" \
80+
-F 'prompt=Create a meme based on image provide'
81+
```
82+
83+
Or, Generate an image without launching a server:
84+
```bash
85+
sglang generate --model-path black-forest-labs/FLUX.1-dev \
86+
--prompt "A Logo With Bold Large Text: SGL Diffusion" \
87+
--save-output
88+
```
89+
90+
Reference [install guide](https://github.com/sgl-project/sglang/blob/main/python/sglang/multimodal_gen/docs/install.md) and [cli guide](https://github.com/sgl-project/sglang/blob/main/python/sglang/multimodal_gen/docs/cli.md) for more installation methods.
91+
92+
### Demo
93+
94+
#### Text to Video: Wan-AI/Wan2.1
95+
96+
```bash
97+
sglang generate --model-path Wan-AI/Wan2.1-T2V-1.3B-Diffusers \
98+
--prompt "A curious raccoon" \
99+
--save-output
100+
```
101+
102+
<video width="800" controls poster="https://via.placeholder.com/800x450?text=Video+Preview">
103+
<source src="https://github.com/lm-sys/lm-sys.github.io/releases/download/test/T2V.mp4" type="video/mp4">
104+
Your browser does not support the video tag.
105+
</video>
106+
<p>Fallback link: <a href="https://github.com/lm-sys/lm-sys.github.io/releases/download/test/T2V.mp4">Download the video</a></p>
107+
108+
#### Image to Video: Wan-AI/Wan2.1-I2V
109+
110+
```bash
111+
sglang generate --model-path=Wan-AI/Wan2.1-I2V-14B-480P-Diffusers \
112+
--prompt="Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside." \
113+
--image-path="https://github.com/Wan-Video/Wan2.2/blob/990af50de458c19590c245151197326e208d7191/examples/i2v_input.JPG?raw=true" \
114+
--save-output --num-gpus 2 --enable-cfg-parallel
115+
```
116+
117+
<video width="800" controls poster="https://via.placeholder.com/800x450?text=Video+Preview"> <!-- Replace poster with a real thumbnail URL if you have one -->
118+
<source src="https://github.com/lm-sys/lm-sys.github.io/releases/download/test/TI2V.mp4" type="video/mp4">
119+
Your browser does not support the video tag.
120+
</video>
121+
122+
<p>Fallback link: <a href="https://github.com/lm-sys/lm-sys.github.io/releases/download/test/TI2V.mp4">Download the video</a></p>
123+
124+
#### Text to Image: FLUX
125+
126+
```bash
127+
sglang generate --model-path black-forest-labs/FLUX.1-dev \
128+
--prompt "A Logo With Bold Large Text: SGL Diffusion" \
129+
--save-output
130+
```
131+
132+
133+
<img src="https://github.com/lm-sys/lm-sys.github.io/releases/download/test/T2I_FLUX.jpg" alt="Text to Image: FLUX" style="display:block; margin-top: 20px; width: 65%;">
134+
135+
136+
#### Text to Image: Qwen-Image
137+
138+
```bash
139+
sglang generate --model-path=Qwen/Qwen-Image \
140+
--prompt='A curious raccoon' \
141+
--width=720 --height=720 \
142+
--save-output \
143+
```
144+
145+
<img src="https://github.com/lm-sys/lm-sys.github.io/releases/download/test/T2I_Qwen_Image.jpg" alt="Text to Image: FLUX" style="display:block; margin-top: 20px; width: 65%;">
146+
147+
148+
#### Image to Image: Qwen-Image-Edit
149+
150+
151+
```bash
152+
sglang generate \
153+
--prompt="Convert 2D style to 3D style" --image-path="https://github.com/lm-sys/lm-sys.github.io/releases/download/test/TI2I_Qwen_Image_Edit_Input.jpg" --model-path=Qwen/Qwen-Image-Edit \
154+
--width=1024 --height=1536 --save-output
155+
```
156+
157+
158+
<div style="display: flex; justify-content: center; gap: 20px;">
159+
<div style="text-align: center;">
160+
<img src="https://github.com/lm-sys/lm-sys.github.io/releases/download/test/TI2I_Qwen_Image_Edit_Input.jpg" alt="Input" style="max-width: 100%; height: auto; border: 1px solid #ccc;">
161+
<div style="margin-top: -25px;">Input</div>
162+
</div>
163+
<div style="text-align: center;">
164+
<img src="https://github.com/lm-sys/lm-sys.github.io/releases/download/test/TI2I_Qwen_Image_Edit_Output.jpg" alt="Output" style="max-width: 100%; height: auto; border: 1px solid #ccc;">
165+
<div style="margin-top: -25px;">Output</div>
166+
</div>
167+
</div>
168+
169+
170+
## Performance Benchmark
171+
We benchmarked the performance of SGLang Diffusion against a popular open-source baseline Huggingface Diffuser.
172+
As benchmarked in the chart at the top of this post, SGLang Diffusion delivers state-of-the-art performance, significantly accelerating both image and video generation.
173+
174+
## Roadmap and Diffusion Ecosystem
175+
176+
Our vision is to build a comprehensive diffusion ecosystem in collaboration with the **FastVideo** team, providing an end-to-end solution from model training to high-performance inference.
177+
178+
The SGLang Diffusion team is centered on continuous innovation in performance and model support:
179+
180+
- Model support and optimizations
181+
- Optimize Wan, FastWan, Hunyuan, Qwen-Image series, FLUX
182+
- Support LongCat-Video
183+
- Kernel support and fusions
184+
- Quantization kernels
185+
- Rotary embedding kernels
186+
- Flash Attention 4 integration in sgl-kernel for blackwell
187+
- More server features
188+
- Configurable cloud storage upload of generated files
189+
- Batching support
190+
- More parallelism methods
191+
- Quantization
192+
- General architecture:
193+
- Simplify the effort of supporting new models
194+
- Enhance cache and attention backend supports
195+
196+
Building this ecosystem is a community effort, and we welcome and encourage all forms of contribution. Join us in shaping the future of open-source diffusion generation.
197+
198+
## Acknowledgment
199+
200+
SGLang Diffusion Team: [Yuhao Yang](https://github.com/yhyang201), [Xinyuan Tong](https://github.com/JustinTong0323), [Yi Zhang](https://github.com/yizhang2077), [Bao Ke](https://github.com/ispobock), [Ji Li](https://github.com/GeLee-Q/GeLee-Q), [Xi Chen](https://github.com/RubiaCx), [Laixin Xie](https://github.com/laixinn), [Yikai Zhu](https://github.com/zyksir), [Mick](https://github.com/mickqian)
201+
202+
FastVideo Team: [Zhang Peiyuan](https://github.com/jzhang38), [William Lin](https://github.com/SolitaryThinker), [BrianChen1129](https://github.com/BrianChen1129), [kevin314](https://github.com/kevin314), [Edenzzzz](https://github.com/Edenzzzz), [JerryZhou54](https://github.com/JerryZhou54), [rlsu9](https://github.com/rlsu9), [Eigensystem](https://github.com/Eigensystem), [foreverpiano](https://github.com/foreverpiano), [RandNMR73](https://github.com/RandNMR73), [PorridgeSwim](https://github.com/PorridgeSwim), [Gary-ChenJL](https://github.com/Gary-ChenJL), [Hao Zhang](https://cseweb.ucsd.edu/~haozhang/)
203+
204+
## Learn more
205+
206+
- Roadmap: [Diffusion (2025 Q4)](https://github.com/sgl-project/sglang/issues/12799)
207+
- Slack channel: [#diffusion](https://sgl-fru7574.slack.com/archives/C09P0HTKE6A) (join via slack.sglang.ai)
208+
117 KB
Loading

0 commit comments

Comments
 (0)