[Diffusion] Add native /v1/diffusion/generate endpoint for trajectory metadata#19892
Open
Godmook wants to merge 2 commits intosgl-project:mainfrom
Open
[Diffusion] Add native /v1/diffusion/generate endpoint for trajectory metadata#19892Godmook wants to merge 2 commits intosgl-project:mainfrom
/v1/diffusion/generate endpoint for trajectory metadata#19892Godmook wants to merge 2 commits intosgl-project:mainfrom
Conversation
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related Issue: #19827
Motivation
SGLang-D currently only exposes OpenAI-compatible endpoints (
/v1/images/generations,/v1/videos). The OpenAI image/video API schema has no standard fields forlatentsorlog_probs, so these values are silently dropped before the HTTP response — even though the pipeline already computes them when requested.This is a blocking issue for RL training workloads: every RL pipeline runs against the server, and without HTTP-level access to trajectory latents and log probs, the
log_probfeature added in #18806 is effectively unusable in production.This PR introduces a native SGLang-D API at
POST /v1/diffusion/generate, following the same pattern as SGLang's native LLM API (/generate) that coexists with the OpenAI-compatible endpoints.Modifications
New file:
python/sglang/multimodal_gen/runtime/entrypoints/diffusion_api.pyAdds a native generation endpoint with two extended metadata flags:
Tensors are serialized as base64-encoded NumPy
.npyblobs. Client deserialization:Modified file:
python/sglang/multimodal_gen/runtime/entrypoints/http_server.pyAdded 3 lines in
create_app()to register the new router. No existing code changed.Design notes
/v1/images/generations,/v1/videos,/v1/meshesare completely untouched.false: trajectory data is never collected or serialized unless explicitly requested.get_log_probsstructure is ready: the field is accepted and returnsnulltoday. Once PR [diffusion] feat: add rolloutlog_probwith flow-matching SDE/CPS support #18806 addstrajectory_log_probstoOutputBatch, enabling it requires uncommenting ~3 lines.build_sampling_params,prepare_request,process_generation_batch) is reused from existing code.Accuracy Tests
This PR adds a new endpoint and does not modify any model forward code, kernel, or pipeline logic. No accuracy regression is possible.
Functional test on NVIDIA A40,
Wan-AI/Wan2.1-T2V-1.3B-Diffusers:Server log confirms two successful
200 OKresponses:Benchmarking and Profiling
No performance impact on the existing OpenAI endpoints. The new endpoint is only invoked when explicitly called.
For the native endpoint itself, observed on A40 with
Wan2.1-T2V-1.3B(81 frames, 50 steps):get_latents=false)get_latents=true)Peak GPU memory: 12.02 GB (peak allocated 8.67 GB). When
get_latents=false, trajectory serialization is skipped entirely — no overhead.Checklist