Update vllm requirement from <=0.6.3 to <=0.9.2 by dependabot[bot] · Pull Request #13 · THU-KEG/VerIF

dependabot · 2025-07-14T19:25:02Z

Updates the requirements on vllm to permit the latest version.

Release notes

v0.9.2

Highlights

This release contains 452 commits from 167 contributors (31 new!)

NOTE: This is the last version where V0 engine code and features stay intact. We highly recommend migrating to V1 engine.

Engine Core

Priority Scheduling is now implemented in V1 engine (#19057), embedding models in V1 (#16188), Mamba2 in V1 (#19327).

Full CUDA‑Graph execution is now available for all FlashAttention v3 (FA3) and FlashMLA paths, including prefix‑caching. CUDA graph now has a live capture progress bar makes debugging easier (#20301, #18581, #19617, #19501).

FlexAttention update – any head size, FP32 fallback (#20467, #19754).

Shared CachedRequestData objects and cached sampler‑ID stores deliver perf enhancements (#20232, #20291).

Model Support

New families: Ernie 4.5 (+MoE) (#20220), MiniMax‑M1 (#19677, #20297), Slim‑MoE “Phi‑tiny‑MoE‑instruct” (#20286), Tencent HunYuan‑MoE‑V1 (#20114), Keye‑VL‑8B‑Preview (#20126), GLM‑4.1 V (#19331), Gemma‑3 (text‑only, #20134), Tarsier 2 (#19887), Qwen 3 Embedding & Reranker (#19260), dots1 (#18254), GPT‑2 for Sequence Classification (#19663).

Granite hybrid MoE configurations with shared experts are fully supported (#19652).

Large‑Scale Serving & Engine Improvements

Expert‑Parallel Load Balancer (EPLB) has been added! (#18343, #19790, #19885).

Disaggregated serving enhancements: Avoid stranding blocks in P when aborted in D's waiting queue (#19223), let toy proxy handle /chat/completions (#19730)

Native xPyD P2P NCCL transport as a base case for native PD without external dependency (#18242, #20246).

Hardware & Performance

NVIDIA Blackwell

SM120: CUTLASS W8A8/FP8 kernels and related tuning, added to Dockerfile (#17280, #19566, #20071, #19794)

SM100: block‑scaled‑group GEMM, INT8/FP8 vectorization, deep‑GEMM kernels, activation‑chunking for MoE, and group‑size 64 for Machete (#19757, #19572, #19168, #19085, #20290, #20331).

Intel GPU (V1) backend with Flash‑Attention support (#19560).

AMD ROCm: full‑graph capture for TritonAttention, quick All‑Reduce, and chunked pre‑fill (#19158, #19744, #18596).

Split‑KV support landed in the unified Triton Attention kernel, boosting long‑context throughput (#19152).

Full‑graph mode enabled in ROCm AITER MLA V1 decode path (#20254).

TPU: dynamic‑grid KV‑cache updates, head‑dim less than 128, tuned paged‑attention kernels, and KV‑padding fixes (#19928, #20235, #19620, #19813, #20048, #20339).

Add models and features supporting matrix. (#20230)

Quantization

Calibration‑free RTN INT4/INT8 pipeline for effortless, accurate compression (#18768).

Compressed‑Tensor NVFP4 (including MoE) + emulation; FP4 emulation removed on < SM100 devices (#19879, #19990, #19563).

Dynamic MoE‑layer quant (Marlin/GPTQ) and INT8 vectorization primitives (#19395, #20331, #19233).

Bits‑and‑Bytes 0.45 + with improved double‑quant logic and AWQ quality (#20424, #20033, #19431, #20076).

API · CLI · Frontend

API Server: Eliminate api_key and x_request_id headers middleware overhead (#19946)

New OpenAI‑compatible endpoints: /v1/audio/translations & revamped /v1/audio/transcriptions (#19615, #20179, #19597).

Token‑level progress bar for LLM.beam_search and cached template‑resolution speed‑ups (#19301, #20065).

Image‑object support in llm.chat, tool‑choice expansion, and custom‑arg passthroughs enrich multi‑modal agents (#19635, #17177, #16862).

CLI QoL: better parsing for -O/--compilation-config, batch‑size‑sweep benchmarking, richer --help, faster startup (#20156, #20516, #20430, #19941).

Metrics: Deprecate metrics with gpu_ prefix for non GPU specific metrics (#18354), Export NaNs in logits to scheduler_stats if output is corrupted (#18777)

Platform & Deployment

No‑privileged CPU / Docker / K8s mode (#19241) and custom default max‑tokens for hosted platforms (#18557).

Security hardening – runtime (cloud)pickle imports forbidden (#18018).

... (truncated)

Commits

a5dd03c Revert "[V0 deprecation] Remove V0 CPU/XPU/TPU backends (#20412)"
c18b3b8 [Bugfix] Add use_cross_encoder flag to use correct activation in `Classifie...
9528e3a [BugFix][Spec Decode] Fix spec token ids in model runner (#20530)
9fb52e5 [V1] Support any head size for FlexAttention backend (#20467)
e202dd2 [V0 deprecation] Remove V0 CPU/XPU/TPU backends (#20412)
43813e6 [Misc] call the pre-defined func (#20518)
cede942 [Benchmark] Add support for multiple batch size benchmark through CLI in `ben...
fe1e924 [Frontend] Support image object in llm.chat (#19635)
4548c03 [TPU][Bugfix] fix the MoE OOM issue (#20339)
40b86aa [BugFix] Fix: ImportError when building on hopper systems (#20513)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Updates the requirements on [vllm](https://github.com/vllm-project/vllm) to permit the latest version. - [Release notes](https://github.com/vllm-project/vllm/releases) - [Changelog](https://github.com/vllm-project/vllm/blob/main/RELEASE.md) - [Commits](vllm-project/vllm@v0.1.0...v0.9.2) --- updated-dependencies: - dependency-name: vllm dependency-version: 0.9.2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot · 2025-07-28T19:13:27Z

Superseded by #22.

dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Jul 14, 2025

dependabot bot mentioned this pull request Jul 14, 2025

Update vllm requirement from <=0.6.3 to <=0.9.1 #7

Closed

dependabot bot closed this Jul 28, 2025

dependabot bot deleted the dependabot/pip/vllm-lte-0.9.2 branch July 28, 2025 19:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update vllm requirement from <=0.6.3 to <=0.9.2#13

Update vllm requirement from <=0.6.3 to <=0.9.2#13
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/vllm-lte-0.9.2

dependabot bot commented on behalf of github Jul 14, 2025

Uh oh!

dependabot bot commented on behalf of github Jul 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

dependabot bot commented on behalf of github Jul 14, 2025

v0.9.2

Highlights

Engine Core

Model Support

Large‑Scale Serving & Engine Improvements

Hardware & Performance

Quantization

API · CLI · Frontend

Platform & Deployment

Uh oh!

dependabot bot commented on behalf of github Jul 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants