Skip to content

Update vllm requirement from <=0.6.3 to <=0.9.2#13

Closed
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/vllm-lte-0.9.2
Closed

Update vllm requirement from <=0.6.3 to <=0.9.2#13
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/vllm-lte-0.9.2

Conversation

@dependabot
Copy link

@dependabot dependabot bot commented on behalf of github Jul 14, 2025

Updates the requirements on vllm to permit the latest version.

Release notes

Sourced from vllm's releases.

v0.9.2

Highlights

This release contains 452 commits from 167 contributors (31 new!)

NOTE: This is the last version where V0 engine code and features stay intact. We highly recommend migrating to V1 engine.

Engine Core

  • Priority Scheduling is now implemented in V1 engine (#19057), embedding models in V1 (#16188), Mamba2 in V1 (#19327).
  • Full CUDA‑Graph execution is now available for all FlashAttention v3 (FA3) and FlashMLA paths, including prefix‑caching. CUDA graph now has a live capture progress bar makes debugging easier (#20301, #18581, #19617, #19501).
  • FlexAttention update – any head size, FP32 fallback (#20467, #19754).
  • Shared CachedRequestData objects and cached sampler‑ID stores deliver perf enhancements (#20232, #20291).

Model Support

  • New families: Ernie 4.5 (+MoE) (#20220), MiniMax‑M1 (#19677, #20297), Slim‑MoE “Phi‑tiny‑MoE‑instruct” (#20286), Tencent HunYuan‑MoE‑V1 (#20114), Keye‑VL‑8B‑Preview (#20126), GLM‑4.1 V (#19331), Gemma‑3 (text‑only, #20134), Tarsier 2 (#19887), Qwen 3 Embedding & Reranker (#19260), dots1 (#18254), GPT‑2 for Sequence Classification (#19663).
  • Granite hybrid MoE configurations with shared experts are fully supported (#19652).

Large‑Scale Serving & Engine Improvements

  • Expert‑Parallel Load Balancer (EPLB) has been added! (#18343, #19790, #19885).
  • Disaggregated serving enhancements: Avoid stranding blocks in P when aborted in D's waiting queue (#19223), let toy proxy handle /chat/completions (#19730)
  • Native xPyD P2P NCCL transport as a base case for native PD without external dependency (#18242, #20246).

Hardware & Performance

  • NVIDIA Blackwell
    • SM120: CUTLASS W8A8/FP8 kernels and related tuning, added to Dockerfile (#17280, #19566, #20071, #19794)
    • SM100: block‑scaled‑group GEMM, INT8/FP8 vectorization, deep‑GEMM kernels, activation‑chunking for MoE, and group‑size 64 for Machete (#19757, #19572, #19168, #19085, #20290, #20331).
  • Intel GPU (V1) backend with Flash‑Attention support (#19560).
  • AMD ROCm: full‑graph capture for TritonAttention, quick All‑Reduce, and chunked pre‑fill (#19158, #19744, #18596).
    • Split‑KV support landed in the unified Triton Attention kernel, boosting long‑context throughput (#19152).
    • Full‑graph mode enabled in ROCm AITER MLA V1 decode path (#20254).
  • TPU: dynamic‑grid KV‑cache updates, head‑dim less than 128, tuned paged‑attention kernels, and KV‑padding fixes (#19928, #20235, #19620, #19813, #20048, #20339).
    • Add models and features supporting matrix. (#20230)

Quantization

  • Calibration‑free RTN INT4/INT8 pipeline for effortless, accurate compression (#18768).
  • Compressed‑Tensor NVFP4 (including MoE) + emulation; FP4 emulation removed on < SM100 devices (#19879, #19990, #19563).
  • Dynamic MoE‑layer quant (Marlin/GPTQ) and INT8 vectorization primitives (#19395, #20331, #19233).
  • Bits‑and‑Bytes 0.45 + with improved double‑quant logic and AWQ quality (#20424, #20033, #19431, #20076).

API · CLI · Frontend

  • API Server: Eliminate api_key and x_request_id headers middleware overhead (#19946)
  • New OpenAI‑compatible endpoints: /v1/audio/translations & revamped /v1/audio/transcriptions (#19615, #20179, #19597).
  • Token‑level progress bar for LLM.beam_search and cached template‑resolution speed‑ups (#19301, #20065).
  • Image‑object support in llm.chat, tool‑choice expansion, and custom‑arg passthroughs enrich multi‑modal agents (#19635, #17177, #16862).
  • CLI QoL: better parsing for -O/--compilation-config, batch‑size‑sweep benchmarking, richer --help, faster startup (#20156, #20516, #20430, #19941).
  • Metrics: Deprecate metrics with gpu_ prefix for non GPU specific metrics (#18354), Export NaNs in logits to scheduler_stats if output is corrupted (#18777)

Platform & Deployment

  • No‑privileged CPU / Docker / K8s mode (#19241) and custom default max‑tokens for hosted platforms (#18557).
  • Security hardening – runtime (cloud)pickle imports forbidden (#18018).

... (truncated)

Commits
  • a5dd03c Revert "[V0 deprecation] Remove V0 CPU/XPU/TPU backends (#20412)"
  • c18b3b8 [Bugfix] Add use_cross_encoder flag to use correct activation in `Classifie...
  • 9528e3a [BugFix][Spec Decode] Fix spec token ids in model runner (#20530)
  • 9fb52e5 [V1] Support any head size for FlexAttention backend (#20467)
  • e202dd2 [V0 deprecation] Remove V0 CPU/XPU/TPU backends (#20412)
  • 43813e6 [Misc] call the pre-defined func (#20518)
  • cede942 [Benchmark] Add support for multiple batch size benchmark through CLI in `ben...
  • fe1e924 [Frontend] Support image object in llm.chat (#19635)
  • 4548c03 [TPU][Bugfix] fix the MoE OOM issue (#20339)
  • 40b86aa [BugFix] Fix: ImportError when building on hopper systems (#20513)
  • Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Updates the requirements on [vllm](https://github.com/vllm-project/vllm) to permit the latest version.
- [Release notes](https://github.com/vllm-project/vllm/releases)
- [Changelog](https://github.com/vllm-project/vllm/blob/main/RELEASE.md)
- [Commits](vllm-project/vllm@v0.1.0...v0.9.2)

---
updated-dependencies:
- dependency-name: vllm
  dependency-version: 0.9.2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Jul 14, 2025
@dependabot @github
Copy link
Author

dependabot bot commented on behalf of github Jul 28, 2025

Superseded by #22.

@dependabot dependabot bot closed this Jul 28, 2025
@dependabot dependabot bot deleted the dependabot/pip/vllm-lte-0.9.2 branch July 28, 2025 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants