Skip to content

chore(deps): bump vllm from 0.20.0 to 0.20.2#486

Open
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/vllm-0.20.2
Open

chore(deps): bump vllm from 0.20.0 to 0.20.2#486
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/vllm-0.20.2

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot Bot commented on behalf of github May 12, 2026

Bumps vllm from 0.20.0 to 0.20.2.

Release notes

Sourced from vllm's releases.

v0.20.2

vLLM v0.20.2

Highlights

This release features 6 commits from 6 contributors (0 new)!

This is a small patch release with bug fixes for DeepSeek V4, gpt-oss, and Qwen3-VL

Bug Fixes

  • DeepSeek V4 sparse attention: Re-enable the persistent topk path on Hopper and ensure the memset kernel runs at CUDA graph capture time regardless of max_seq_len, fixing the MTP=1 hang on DeepSeek V4 (#41665, revert of #41605).
  • DeepSeek V4 KV cache: Fixed a "failure to allocate KV blocks" error in the V1 engine KV cache manager (#41282).
  • gpt-oss MXFP4 + torch.compile: Plumbed hidden_dim_unpadded through the moe_forward fake op so MXFP4 works under torch.compile on v0.20.x (#42002, backport of #41646).
  • Qwen3-VL: Removed an invalid deepstack boundary check that could fail under heavy load (#40932).

Contributors

@​ywang96, @​zyongye, @​stecasta, @​wzhao18, @​Isotr0py, @​khluu

v0.20.1

vLLM v0.20.1

This is a patch release on top of v0.20.0 primarily focused on DeepSeek V4 stabilization and performance improvements, along with several important bug fixes.

DeepSeek V4

  • Base model support (#41006).
  • Multi-stream pre-attention GEMM (#41061), configurable pre-attn GEMM knob (#41443), and tuned default VLLM_MULTI_STREAM_GEMM_TOKEN_THRESHOLD (#41526).
  • BF16 and MXFP8 all-to-all support for FlashInfer one-sided communication (#40960).
  • PTX cvt instruction for faster FP32->FP4 conversion (#41015).
  • Integrated tile kernels (head_compute_mix_kernel) for optimized head computation (#41255).
  • Guard megamoe flag with Pure TP (#41522).
  • Fixed persistent topk cooperative deadlock at TopK=1024 (#41189) and inter-CTA init race on RadixRowState (#41444), with temporary disable of persistent topk as a workaround (#41442).
  • Fixed import error due to AOT compile cache loading (#41090).
  • Fixed torch inductor error (#41135).
  • Fixed repeated RoPE cache initialization (#41148).
  • Fixed missing type conversion for non-streaming tool calls in DSV3.2/V4 (#41198).

Bug Fixes

  • Fixed max_num_batched_token not being captured in CUDA graph (#40734).
  • Fixed num_gpu_blocks_override not accounted for in max_model_len checks (#41069).
  • Auto-disable expandable_segments around cumem memory pool (#40812).
  • Fixed BailingMoE linear layer (#40859) and MLA RoPE rotation for BailingMoE V2.5 (#41185).
  • Fixed reasoning parser kwargs not being passed to structured output (#41199).
  • [ROCm] Fixed input_ids and expert_map args for Quark W4A8 GPT-OSS (#41165).

List of contributors

@​BugenZhao, @​chaunceyjiang, @​gau-nernst, @​ghphotoframe, @​Isotr0py, @​jeejeelee, @​khluu, @​njhill, @​Rohan138, @​wzhao18, @​youkaichao, @​ywang96, @​ZJY0516, @​zixi-qi, @​zyongye

Commits
  • bc150f5 [CI] Automate Docker Hub release image publishing (#40415)
  • 9bc5a0d [Bugfix] Remove invalid deepstack boundary check for Qwen3-VL (#40932)
  • fa8acca [Bugfix] Fix failure to allocate KV blocks error (#41282)
  • 637495c [Bugfix] Plumb hidden_dim_unpadded through moe_forward fake to fix gpt-oss MX...
  • fbd51e3 [Bugfix] Fix condition to clear persistent topk so that it can be captured re...
  • 75b3867 Revert "Temporary disable persistent topk for Hopper (#41605)"
  • 132765e Revert "[DSv4] Use cvt PTX for FP32->FP4 conversion (#41015)"
  • 43a21e6 Temporary disable persistent topk for Hopper (#41605)
  • f98b274 [DSv4] Tune default value of VLLM_MULTI_STREAM_GEMM_TOKEN_THRESHOLD (#41526)
  • 228d225 [DSV4] Guard megamoe flag with Pure TP (#41522)
  • Additional commits viewable in compare view

@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels May 12, 2026
@dependabot dependabot Bot requested a review from a team as a code owner May 12, 2026 13:02
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels May 12, 2026
Bumps [vllm](https://github.com/vllm-project/vllm) from 0.20.0 to 0.20.2.
- [Release notes](https://github.com/vllm-project/vllm/releases)
- [Changelog](https://github.com/vllm-project/vllm/blob/main/RELEASE.md)
- [Commits](vllm-project/vllm@v0.20.0...v0.20.2)

---
updated-dependencies:
- dependency-name: vllm
  dependency-version: 0.20.2
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot changed the title chore(deps): bump vllm from 0.18.0 to 0.20.2 chore(deps): bump vllm from 0.20.0 to 0.20.2 May 14, 2026
@dependabot dependabot Bot force-pushed the dependabot/pip/vllm-0.20.2 branch from 97c1c5d to 8b37936 Compare May 14, 2026 17:57
@codecov
Copy link
Copy Markdown

codecov Bot commented May 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants