[Diffusion] Add PCG support for diffusion models by BBuf · Pull Request #19828 · sgl-project/sglang

BBuf · 2026-03-04T04:23:38Z

Created by codex (gpt5.3 -codex-high, about 2$ cost)

main with torch compile:

sglang generate --model-path=black-forest-labs/FLUX.1-dev  --prompt="A futuristic cyberpunk city at night, neon lights reflecting on wet streets, highly detailed, 8k" --width=1024 --height=1024 --num-inference-steps=50 --guidance-scale=4.0 --seed=42 --save-output  --warmup=True --enable-torch-compile

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:06<00:00,  7.25it/s]
[03-04 03:52:28] [DenoisingStage] average time per step: 0.1379 seconds
[03-04 03:52:28] [DenoisingStage] finished in 6.8999 seconds
[03-04 03:52:28] [DecodingStage] started...
[03-04 03:52:28] [DecodingStage] finished in 0.0307 seconds
[03-04 03:52:28] Peak GPU memory: 31.51 GB, Peak allocated: 27.30 GB, Memory pool overhead: 4.21 GB (13.4%), Remaining GPU memory at peak: 108.89 GB. Components that could stay resident (based on the last request workload): ['text_encoder', 'text_encoder_2', 'transformer']. Related offload server args to disable: --dit-cpu-offload, --text-encoder-cpu-offload
[03-04 03:52:29] Output saved to outputs/A_futuristic_cyberpunk_city_at_night_neon_lights_reflecting_on_wet_streets_highly_detailed_8k_20260304-035129_e4d47a2b.png
[03-04 03:52:29] Pixel data generated successfully in 59.57 seconds
[03-04 03:52:29] Completed batch processing. Generated 1 outputs in 59.57 seconds
[03-04 03:52:29] Warmed-up request processed in 7.14 seconds (with warmup excluded)
[03-04 03:52:29] Memory usage - Max peak: 32268.00 MB, Avg peak: 32268.00 MB

main without torch compile:

sglang generate --model-path=black-forest-labs/FLUX.1-dev  --prompt="A futuristic cyberpunk city at night, neon lights reflecting on wet streets, highly detailed, 8k" --width=1024 --height=1024 --num-inference-steps=50 --guidance-scale=4.0 --seed=42 --save-output  --warmup=True 

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:07<00:00,  6.69it/s]
[03-04 04:01:00] [DenoisingStage] average time per step: 0.1495 seconds
[03-04 04:01:00] [DenoisingStage] finished in 7.4808 seconds
[03-04 04:01:00] [DecodingStage] started...
[03-04 04:01:00] [DecodingStage] finished in 0.0339 seconds
[03-04 04:01:00] Peak GPU memory: 31.29 GB, Peak allocated: 27.30 GB, Memory pool overhead: 3.99 GB (12.8%), Remaining GPU memory at peak: 109.11 GB. Components that could stay resident (based on the last request workload): ['text_encoder', 'text_encoder_2', 'transformer']. Related offload server args to disable: --dit-cpu-offload, --text-encoder-cpu-offload
[03-04 04:01:00] Output saved to outputs/A_futuristic_cyberpunk_city_at_night_neon_lights_reflecting_on_wet_streets_highly_detailed_8k_20260304-040043_50a75896.png
[03-04 04:01:00] Pixel data generated successfully in 17.61 seconds
[03-04 04:01:00] Completed batch processing. Generated 1 outputs in 17.61 seconds
[03-04 04:01:00] Warmed-up request processed in 7.73 seconds (with warmup excluded)
[03-04 04:01:00] Memory usage - Max peak: 32044.00 MB, Avg peak: 32044.00 MB

pr with pcg:

sglang generate --model-path=black-forest-labs/FLUX.1-dev  --prompt="A futuristic cyberpunk city at night, neon lights reflecting on wet streets, highly detailed, 8k" --width=1024 --height=1024 --num-inference-steps=50 --guidance-scale=4.0 --seed=42 --save-output --enable-piecewise-cuda-graph

[03-04 04:03:51] [InputValidationStage] started...
[03-04 04:03:51] [InputValidationStage] finished in 0.0001 seconds
[03-04 04:03:51] [TextEncodingStage] started...
[03-04 04:03:51] [TextEncodingStage] finished in 0.4045 seconds
[03-04 04:03:52] [TimestepPreparationStage] started...
[03-04 04:03:52] [TimestepPreparationStage] finished in 0.0050 seconds
[03-04 04:03:52] [LatentPreparationStage] started...
[03-04 04:03:52] [LatentPreparationStage] finished in 0.0016 seconds
[03-04 04:03:52] [DenoisingStage] started...
[03-04 04:03:52] Pre-capturing diffusion PCG before denoising loop (target_models=1)
[03-04 04:03:59] Enable diffusion PCG for FluxTransformer2DModel with 58 capture buckets (max=8192)
[03-04 04:03:59] install_torch_compiled
[03-04 04:03:59] Diffusion PCG init for FluxTransformer2DModel (raw_seq=4096, static_seq=4096)
[03-04 04:04:05] Initializing SGLangBackend
[03-04 04:04:05] SGLangBackend __call__
[03-04 04:04:07] Compiling a graph for dynamic shape takes 0.00 s
[03-04 04:04:07] Computation graph saved to /root/.cache/sglang/torch_compile_cache/rank_0_0/backbone/computation_graph_1772597047.1930342.py
[03-04 04:04:10] Pre-capture finished for 1 model(s) before formal denoising
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:07<00:00,  6.66it/s]
[03-04 04:04:18] [DenoisingStage] average time per step: 0.1521 seconds
[03-04 04:04:18] [DenoisingStage] finished in 25.9847 seconds
[03-04 04:04:18] [DecodingStage] started...
[03-04 04:04:18] [DecodingStage] finished in 0.4051 seconds
[03-04 04:04:18] Peak GPU memory: 32.10 GB, Peak allocated: 27.33 GB, Memory pool overhead: 4.77 GB (14.9%), Remaining GPU memory at peak: 108.30 GB. Components that could stay resident (based on the last request workload): ['text_encoder', 'text_encoder_2', 'transformer']. Related offload server args to disable: --dit-cpu-offload, --text-encoder-cpu-offload
[03-04 04:04:18] Output saved to outputs/A_futuristic_cyberpunk_city_at_night_neon_lights_reflecting_on_wet_streets_highly_detailed_8k_20260304-040351_d677fd0f.png
[03-04 04:04:18] Pixel data generated successfully in 27.39 seconds
[03-04 04:04:18] Completed batch processing. Generated 1 outputs in 27.39 seconds
[03-04 04:04:18] Memory usage - Max peak: 32874.00 MB, Avg peak: 32874.00 MB
[03-04 04:04:18] Generator was garbage collected without being shut down. Attempting to shut down the local server and client.
[03-04 04:04:26] Worker 0: Shutdown complete.

0.1495(torch compile) VS 0.1521(cuda graph)

we need more profile and devleop, so convert this pr to draft now.

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-03-04T04:23:41Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

zhaochenyang20 · 2026-03-04T04:26:08Z

Stop reviewing codex PR. Review mine please:

#18806
#19152
#19225

BBuf added 9 commits March 4, 2026 09:48

ud

624b4a8

ud

e9a46e4

ud

0ff111e

ud

c332d86

ud

78f902c

ud

3be0b1b

ud

abde434

ud

e1d4067

ud

d0d2ec9

BBuf requested review from mickqian, ping1jing2, yhyang201 and yingluosanqian as code owners March 4, 2026 04:23

BBuf marked this pull request as draft March 4, 2026 04:23

github-actions bot added the diffusion SGLang Diffusion label Mar 4, 2026

ud

3a1e700

github-actions bot added the piecewise-cuda-graph label Mar 4, 2026

ud

62ba402

BBuf closed this Mar 4, 2026

BBuf deleted the try_to_add_pcg branch March 4, 2026 07:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Diffusion] Add PCG support for diffusion models#19828

[Diffusion] Add PCG support for diffusion models#19828
BBuf wants to merge 11 commits intomainfrom
try_to_add_pcg

BBuf commented Mar 4, 2026

Uh oh!

gemini-code-assist bot commented Mar 4, 2026

Uh oh!

zhaochenyang20 commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BBuf commented Mar 4, 2026

main with torch compile:

main without torch compile:

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Mar 4, 2026

Uh oh!

zhaochenyang20 commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants