Skip to content

Update vllm requirement from <=0.6.3 to <=0.9.1#7

Closed
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/vllm-lte-0.9.1
Closed

Update vllm requirement from <=0.6.3 to <=0.9.1#7
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/vllm-lte-0.9.1

Conversation

@dependabot
Copy link

@dependabot dependabot bot commented on behalf of github Jun 16, 2025

Updates the requirements on vllm to permit the latest version.

Release notes

Sourced from vllm's releases.

v0.9.1

Highlights

This release features 274 commits, from 123 contributors (27 new contributors!)

  • Progress in large scale serving
    • DP Attention + Expert Parallelism: CUDA graph support (#18724), DeepEP dispatch-combine kernel (#18434), batched/masked DeepGEMM kernel (#19111), CUTLASS MoE kernel with PPLX (#18762)
    • Heterogeneous TP (#18833), NixlConnector Enable FlashInfer backend (#19090)
    • DP: API-server scaleout with many-to-many server-engine comms (#17546), Support DP with Ray (#18779), allow AsyncLLMEngine.generate to target a specific DP rank (#19102), data parallel rank to KVEventBatch (#18925)
    • Tooling: Simplify EP kernels installation (#19412)
  • RLHF workflow: Support inplace model weights loading (#18745)
  • Initial full support for Hybrid Memory Allocator (#17996), support cross-layer KV sharing (#18212)
  • Add FlexAttention to vLLM V1 (#16078)
  • Various production hardening related to full cuda graph mode (#19171, 19106, #19321)

Model Support

  • Support Magistral (#19193), LoRA support for InternVL (#18842), minicpm eagle support (#18943), NemotronH support (#18863, #19249)
  • Enable data parallel for Llama4 vision encoder (#18368)
  • Add DeepSeek-R1-0528 function call chat template (#18874)

Hardware Support & Performance Optimizations

  • Add H20-3e fused MoE kernel tuning configs for DeepSeek-R1/V3 (#19205), Qwen3-235B-A22B (#19315)
  • Blackwell: Add Cutlass MLA backend (#17625), Tunings for SM100 FP8 CUTLASS kernel (#18778), Use FlashInfer by default on Blackwell GPUs (#19118), Tune scaled_fp8_quant by increasing vectorization (#18844)
  • FP4: Add compressed-tensors NVFP4 support (#18312), FP4 MoE kernel optimization (#19110)
  • CPU: V1 support for the CPU backend (#16441)
  • ROCm: Add AITER grouped topk for DeepSeekV2 (#18825)
  • POWER: Add IBM POWER11 Support to CPU Extension Detection (#19082)
  • TPU: Initial support of model parallelism with single worker using SPMD (#18011), Multi-LoRA Optimizations for the V1 TPU backend (#15655)
  • Neuron: Add multi-LoRA support for Neuron. (#18284), Add Multi-Modal model support for Neuron (#18921), Support quantization on neuron (#18283)
  • Platform: Make torch distributed process group extendable (#18763)

Engine features

  • Add Lora Support to Beam Search (#18346)
  • Add rerank support to run_batch endpoint (#16278)
  • CLI: add run batch (#18804)
  • Server: custom logging (#18403), allowed_token_ids in ChatCompletionRequest (#19143)
  • LLM API: make use_tqdm accept a callable for custom progress bars (#19357)
  • perf: [KERNEL] Sampler. CUDA kernel for applying repetition penalty (#18437)

API Deprecations

  • Disallow pos-args other than model when initializing LLM (#18802)
  • Remove inputs arg fallback in Engine classes (#18799)
  • Remove fallbacks for Embeddings API (#18795)
  • Remove mean pooling default for Qwen2EmbeddingModel (#18913)
  • Require overriding get_dummy_text and get_dummy_mm_data (#18796)
  • Remove metrics that were deprecated in 0.8 (#18837)

Documentation

  • Add CLI doc (#18871)
  • Update SECURITY.md with link to our security guide (#18961), Add security warning to bug report template (#19365)

... (truncated)

Commits
  • b6553be [Misc] Slight improvement of the BNB (#19418)
  • 64a9af5 Simplify ep kernels installation (#19412)
  • e424884 [BugFix][CPU] Fix CPU CI by ignore collecting test_pixtral (#19411)
  • 467bef1 [BugFix][FlashInfer] Fix attention backend interface mismatch with unexpected...
  • 5f1ac1e Revert "[v1] Add fp32 support to v1 engine through flex attn" (#19404)
  • 9368cc9 Automatically bind CPU OMP Threads of a rank to CPU ids of a NUMA node. (#17930)
  • 32b3946 Add clear documentation around the impact of debugging flag (#19369)
  • 6b1391c [Misc] refactor neuron_multimodal and profiling (#19397)
  • a3f66e7 Add security warning to bug report template (#19365)
  • 319cb1e [Core] Batch multi modal input using pinned memory (#19169)
  • Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Updates the requirements on [vllm](https://github.com/vllm-project/vllm) to permit the latest version.
- [Release notes](https://github.com/vllm-project/vllm/releases)
- [Changelog](https://github.com/vllm-project/vllm/blob/main/RELEASE.md)
- [Commits](vllm-project/vllm@v0.1.0...v0.9.1)

---
updated-dependencies:
- dependency-name: vllm
  dependency-version: 0.9.1
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Jun 16, 2025
@dependabot @github
Copy link
Author

dependabot bot commented on behalf of github Jul 14, 2025

Superseded by #13.

@dependabot dependabot bot closed this Jul 14, 2025
@dependabot dependabot bot deleted the dependabot/pip/vllm-lte-0.9.1 branch July 14, 2025 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants