chore(deps): bump vllm from 0.20.0 to 0.20.2 by dependabot[bot] · Pull Request #486 · NVIDIA-NeMo/Safe-Synthesizer

dependabot · 2026-05-12T13:02:38Z

Bumps vllm from 0.20.0 to 0.20.2.

Release notes

v0.20.2

vLLM v0.20.2

Highlights

This release features 6 commits from 6 contributors (0 new)!

This is a small patch release with bug fixes for DeepSeek V4, gpt-oss, and Qwen3-VL

Bug Fixes

DeepSeek V4 sparse attention: Re-enable the persistent topk path on Hopper and ensure the memset kernel runs at CUDA graph capture time regardless of max_seq_len, fixing the MTP=1 hang on DeepSeek V4 (#41665, revert of #41605).

DeepSeek V4 KV cache: Fixed a "failure to allocate KV blocks" error in the V1 engine KV cache manager (#41282).

gpt-oss MXFP4 + torch.compile: Plumbed hidden_dim_unpadded through the moe_forward fake op so MXFP4 works under torch.compile on v0.20.x (#42002, backport of #41646).

Qwen3-VL: Removed an invalid deepstack boundary check that could fail under heavy load (#40932).

Contributors

@ywang96, @zyongye, @stecasta, @wzhao18, @Isotr0py, @khluu

v0.20.1

vLLM v0.20.1

This is a patch release on top of v0.20.0 primarily focused on DeepSeek V4 stabilization and performance improvements, along with several important bug fixes.

DeepSeek V4

Base model support (#41006).

Multi-stream pre-attention GEMM (#41061), configurable pre-attn GEMM knob (#41443), and tuned default VLLM_MULTI_STREAM_GEMM_TOKEN_THRESHOLD (#41526).

BF16 and MXFP8 all-to-all support for FlashInfer one-sided communication (#40960).

PTX cvt instruction for faster FP32->FP4 conversion (#41015).

Integrated tile kernels (head_compute_mix_kernel) for optimized head computation (#41255).

Guard megamoe flag with Pure TP (#41522).

Fixed persistent topk cooperative deadlock at TopK=1024 (#41189) and inter-CTA init race on RadixRowState (#41444), with temporary disable of persistent topk as a workaround (#41442).

Fixed import error due to AOT compile cache loading (#41090).

Fixed torch inductor error (#41135).

Fixed repeated RoPE cache initialization (#41148).

Fixed missing type conversion for non-streaming tool calls in DSV3.2/V4 (#41198).

Bug Fixes

Fixed max_num_batched_token not being captured in CUDA graph (#40734).

Fixed num_gpu_blocks_override not accounted for in max_model_len checks (#41069).

Auto-disable expandable_segments around cumem memory pool (#40812).

Fixed BailingMoE linear layer (#40859) and MLA RoPE rotation for BailingMoE V2.5 (#41185).

Fixed reasoning parser kwargs not being passed to structured output (#41199).

[ROCm] Fixed input_ids and expert_map args for Quark W4A8 GPT-OSS (#41165).

List of contributors

@BugenZhao, @chaunceyjiang, @gau-nernst, @ghphotoframe, @Isotr0py, @jeejeelee, @khluu, @njhill, @Rohan138, @wzhao18, @youkaichao, @ywang96, @ZJY0516, @zixi-qi, @zyongye

Commits

bc150f5 [CI] Automate Docker Hub release image publishing (#40415)
9bc5a0d [Bugfix] Remove invalid deepstack boundary check for Qwen3-VL (#40932)
fa8acca [Bugfix] Fix failure to allocate KV blocks error (#41282)
637495c [Bugfix] Plumb hidden_dim_unpadded through moe_forward fake to fix gpt-oss MX...
fbd51e3 [Bugfix] Fix condition to clear persistent topk so that it can be captured re...
75b3867 Revert "Temporary disable persistent topk for Hopper (#41605)"
132765e Revert "[DSv4] Use cvt PTX for FP32->FP4 conversion (#41015)"
43a21e6 Temporary disable persistent topk for Hopper (#41605)
f98b274 [DSv4] Tune default value of VLLM_MULTI_STREAM_GEMM_TOKEN_THRESHOLD (#41526)
228d225 [DSV4] Guard megamoe flag with Pure TP (#41522)
Additional commits viewable in compare view

Bumps [vllm](https://github.com/vllm-project/vllm) from 0.20.0 to 0.20.2. - [Release notes](https://github.com/vllm-project/vllm/releases) - [Changelog](https://github.com/vllm-project/vllm/blob/main/RELEASE.md) - [Commits](vllm-project/vllm@v0.20.0...v0.20.2) --- updated-dependencies: - dependency-name: vllm dependency-version: 0.20.2 dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>

codecov · 2026-05-14T18:01:55Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels May 12, 2026

dependabot Bot requested a review from a team as a code owner May 12, 2026 13:02

dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels May 12, 2026

dependabot Bot changed the title ~~chore(deps): bump vllm from 0.18.0 to 0.20.2~~ chore(deps): bump vllm from 0.20.0 to 0.20.2 May 14, 2026

dependabot Bot force-pushed the dependabot/pip/vllm-0.20.2 branch from 97c1c5d to 8b37936 Compare May 14, 2026 17:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(deps): bump vllm from 0.20.0 to 0.20.2#486

chore(deps): bump vllm from 0.20.0 to 0.20.2#486
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/vllm-0.20.2

dependabot Bot commented on behalf of github May 12, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

dependabot Bot commented on behalf of github May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

v0.20.2

vLLM v0.20.2

Highlights

Bug Fixes

Contributors

v0.20.1

vLLM v0.20.1

DeepSeek V4

Bug Fixes

List of contributors

Uh oh!

codecov Bot commented May 14, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

dependabot Bot commented on behalf of github May 12, 2026 •

edited

Loading