[Diffusion] Add native `/v1/diffusion/generate` endpoint for trajectory metadata by Godmook · Pull Request #19892 · sgl-project/sglang

Godmook · 2026-03-04T18:01:25Z

Related Issue: #19827

Motivation

SGLang-D currently only exposes OpenAI-compatible endpoints (/v1/images/generations, /v1/videos). The OpenAI image/video API schema has no standard fields for latents or log_probs, so these values are silently dropped before the HTTP response — even though the pipeline already computes them when requested.

This is a blocking issue for RL training workloads: every RL pipeline runs against the server, and without HTTP-level access to trajectory latents and log probs, the log_prob feature added in #18806 is effectively unusable in production.

This PR introduces a native SGLang-D API at POST /v1/diffusion/generate, following the same pattern as SGLang's native LLM API (/generate) that coexists with the OpenAI-compatible endpoints.

Modifications

New file: `python/sglang/multimodal_gen/runtime/entrypoints/diffusion_api.py`

Adds a native generation endpoint with two extended metadata flags:

# Request
{
  "prompt": "A cat walking",
  "get_latents": false,   # default: false — no latency impact when unused
  "get_log_probs": false  # default: false — populated after PR #18806 lands
}

# Response
{
  "id": "...",
  "output_b64": "<base64-encoded mp4/png>",
  "output_format": "mp4",
  "peak_memory_mb": 12304.0,
  "inference_time_s": 336.46,
  "trajectory": {
    "latents": "<base64-encoded .npy blob>",
    "latents_shape": [1, 50, 16, 21, 60, 104],
    "latents_dtype": "torch.float32",
    "timesteps": ["<b64>", ...],
    "log_probs": null,       # populated after PR #18806
    "log_probs_shape": null
  }
}

Tensors are serialized as base64-encoded NumPy .npy blobs. Client deserialization:

import base64, io, numpy as np
arr = np.load(io.BytesIO(base64.b64decode(response["trajectory"]["latents"])))

Modified file: `python/sglang/multimodal_gen/runtime/entrypoints/http_server.py`

Added 3 lines in create_app() to register the new router. No existing code changed.

from sglang.multimodal_gen.runtime.entrypoints import diffusion_api
app.include_router(diffusion_api.router)

Design notes

Zero impact on existing endpoints: /v1/images/generations, /v1/videos, /v1/meshes are completely untouched.
No latency overhead when flags are false: trajectory data is never collected or serialized unless explicitly requested.
get_log_probs structure is ready: the field is accepted and returns null today. Once PR [diffusion] feat: add rollout log_prob with flow-matching SDE/CPS support #18806 adds trajectory_log_probs to OutputBatch, enabling it requires uncommenting ~3 lines.
All pipeline plumbing (build_sampling_params, prepare_request, process_generation_batch) is reused from existing code.

Accuracy Tests

This PR adds a new endpoint and does not modify any model forward code, kernel, or pipeline logic. No accuracy regression is possible.

Functional test on NVIDIA A40, Wan-AI/Wan2.1-T2V-1.3B-Diffusers:

# Basic generation (no trajectory)
curl -X POST http://localhost:30000/v1/diffusion/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "A cat walking"}'
# -> HTTP 200, output_b64 present, trajectory: null

# With latents
curl -X POST http://localhost:30000/v1/diffusion/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "A cat walking", "get_latents": true}'
# -> HTTP 200, trajectory.latents present (base64 npy)

Server log confirms two successful 200 OK responses:

[2026-03-04 08:55:48] INFO: "POST /v1/diffusion/generate HTTP/1.1" 200 OK
[2026-03-04 09:00:44] INFO: "POST /v1/diffusion/generate HTTP/1.1" 200 OK

Benchmarking and Profiling

No performance impact on the existing OpenAI endpoints. The new endpoint is only invoked when explicitly called.

For the native endpoint itself, observed on A40 with Wan2.1-T2V-1.3B (81 frames, 50 steps):

Run	Denoising	Decoding	Total
1 (cold, `get_latents=false`)	305.6s	17.4s	336.5s
2 (warm, `get_latents=true`)	261.0s	12.1s	282.4s

Peak GPU memory: 12.02 GB (peak allocated 8.67 GB). When get_latents=false, trajectory serialization is skipped entirely — no overhead.

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

… metadata

…y metadata

gemini-code-assist · 2026-03-04T18:05:21Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Godmook added 2 commits March 4, 2026 08:08

[Diffusion] Add native /v1/diffusion/generate endpoint for trajectory…

33ae81c

… metadata

[Diffusion] Add native /v1/diffusion/generate endpoint for trajector…

b17e559

…y metadata

Godmook requested review from mickqian, ping1jing2 and yhyang201 as code owners March 4, 2026 18:01

github-actions bot added the diffusion SGLang Diffusion label Mar 4, 2026

alphabetc1 mentioned this pull request Mar 4, 2026

Patch Up Diffusion Utils zhaochenyang20/sglang-diffusion-routing#36

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Diffusion] Add native `/v1/diffusion/generate` endpoint for trajectory metadata#19892

[Diffusion] Add native `/v1/diffusion/generate` endpoint for trajectory metadata#19892
Godmook wants to merge 2 commits intosgl-project:mainfrom
Godmook:feat/native-sglang

Godmook commented Mar 4, 2026

Uh oh!

gemini-code-assist bot commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Godmook commented Mar 4, 2026

Motivation

Modifications

New file: python/sglang/multimodal_gen/runtime/entrypoints/diffusion_api.py

Modified file: python/sglang/multimodal_gen/runtime/entrypoints/http_server.py

Design notes

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New file: `python/sglang/multimodal_gen/runtime/entrypoints/diffusion_api.py`

Modified file: `python/sglang/multimodal_gen/runtime/entrypoints/http_server.py`