build(deps): bump sglang from 0.5.2 to 0.5.10 by dependabot[bot] · Pull Request #5927 · verl-project/verl

dependabot · 2026-04-08T17:34:50Z

Bumps sglang from 0.5.2 to 0.5.10.

Release notes

v0.5.10

Highlights

Piecewise CUDA Graph Enabled by Default: Piecewise CUDA graph capture is now the default execution mode, reducing memory overhead and improving throughput for models with complex control flow patterns: #16331

Elastic EP for Partial Failure Tolerance: Integrate Elastic NIXL-EP into SGLang, enabling partial failure tolerance for DeepSeek MoE deployments — when a GPU fails, the system redistributes expert weights and continues serving without full restart: #19248, #17374, #12068 blog

GPU Staging Buffer for PD Disaggregation: Gathers scattered head slices into contiguous memory for bulk RDMA transfer, reducing RDMA request count on GQA models by ~1000x. TPS/GPU on large concurrency increased by ~5x with Prefill TP4+Decode DEP4 on Qwen3.5: #19890

HiSparse for Sparse Attention: Integrate HiSparse sparse attention backend for efficient long-context inference with reduced compute through sparsity-aware attention: #20343

SGLang-Diffusion Update:

Model support: LTX-2, Hunyuan3D-2, Helios

Performance improvements on Qwen-image, Z-image increased by 1.5x

New platform: macOS

New feature: enhance the performance of diffusers backend by integrating all optimization from Cache-DiT

SKILLs: feel free to explore the curated skill for developing and optimizing sglang-diffusion!

FlashInfer MXFP8 Kernel Support: Integrate FlashInfer mxfp8 kernels for GEMM and MoE operations, enabling mixed-precision FP8 inference with higher accuracy through microscaling for RL and general workloads: #19537

Transformers 5.3.0 Upgrade: Major upgrade from transformers 4.57.1 to 5.3.0, unlocking support for the latest model architectures and features from HuggingFace. GLM-5 model is now supported in this image instead of the custom built image: #17784

DeepSeek V3.2 / GLM-5 Optimization: GLM-5 runnable on main branch (with upgraded transformers). Fused Triton kernel for prefill KV cache fetching, NSA fuse store indexer for K cache, TRT-LLM prefill/decode DSA kernels as default on SM100/SM103, and IndexCache for improved throughput by more than 10% on high workloads: #19319, #19148, #20062, #21914, #21405

Qwen3.5 GDN/KDA Optimization: Transpose linear attention state layout from [N, HV, K, V] to [N, HV, V, K] and fuse split/reshape/cat ops in GDN projection with Triton kernel, plus CuTeDSL KDA decode kernel support for improved Qwen3.5 performance: #20283, #21019, #21203

LoRA Support for MoE Layers: Add LoRA fine-tuning support for Mixture-of-Experts layers with JIT alignment kernels, fused Triton kernels, TP support, CUDA graph support, and auto-detection of LoRA target modules — enabling efficient adapter-based tuning on MoE models like DeepSeek: #19710, #19711, #14105, #21439, #21647

Prefill Context Parallel for MHA (Qwen3): Enable context parallelism during prefill for multi-head attention models like Qwen3 MoE, distributing long sequences across GPUs to reduce per-GPU memory and accelerate prefill: #18233

Flash Attention 4 Official Library Support: Upgrade to the official FlashAttention 4 package, bringing the latest attention optimizations and Blackwell GPU support: #20303

Skip-Softmax Attention for FlashInfer TRT-LLM Kernels: Reduce computation overhead in attention layers by skipping redundant softmax normalization: #19089

Speculative Decoding with FA4 Backend: Enable speculative decoding for the FA4 attention backend, combining speculative inference with next-generation flash attention for faster generation: #21080

MM Attention FA4 Default on SM100: Multi-modal attention now uses FA4 by default on Blackwell hardware for improved VLM performance: #21595

Stronger Transformers Modeling Backend: Enhanced transformers backend with full TP, PP, MoE, VLM support, and torch.compile compatibility: #19163

sglang-kernel 0.4.1: Major kernel package release with renamed package (sgl-kernel → sglang-kernel), consolidated kernels, and cleanup of deprecated ops: #20440, #22009

Native MLX Backend for Apple Silicon: Add native MLX execution backend enabling SGLang to run inference directly on Apple Silicon Macs without CUDA: #20342

New Model Support

Nemotron-3-Super (bf16/fp8/nvfp4): #20407, cookbook

Mistral Small 4 (Pixtral): #20708

LFM2-VL (Liquid Foundation Model 2 Vision-Language): #21230

Voxtral (speech-to-text): #21635

GLM-5: Supported on main branch with transformers 5.3.0

... (truncated)

Commits

1519acf [Hotfix] Fix router gemm on sm103 (#22134)
c1927e1 fix: TRT-LLM MHA CUDA illegal address with EAGLE v2 + DP attention (#21649)
07f57fc Enable IndexCache for DeepSeek V3.2 (#21405)
164bc0a [Fix] Fix nightly tests (#22140)
43654ef [diffusion] CI: improve diffusion comparison benchmark setting for realistic ...
1ad6839 [Feature] Add Reasoning Tokens Usage (#15562)
bf984ae Revert "[Bugfix] Temporarily skip TRTLLM attention on (G)B300 (SM103) to avoi...
46bf19c chore: bump flashinfer version to 0.6.7.post2 (#22097)
2476325 [Speculative Decoding] Add FA4-based Spec Support (#21080)
34d5765 [VLM] Chunk-aware ViT encoding with per-image cache and lazy device transfer ...
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [sglang](https://github.com/sgl-project/sglang) from 0.5.2 to 0.5.10. - [Release notes](https://github.com/sgl-project/sglang/releases) - [Commits](sgl-project/sglang@v0.5.2...v0.5.10) --- updated-dependencies: - dependency-name: sglang dependency-version: 0.5.10 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build(deps): bump sglang from 0.5.2 to 0.5.10#5927

build(deps): bump sglang from 0.5.2 to 0.5.10#5927
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/sglang-0.5.10

dependabot bot commented on behalf of github Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

dependabot bot commented on behalf of github Apr 8, 2026

v0.5.10

Highlights

New Model Support

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants