upgrade vllm to v0.14.0 by winglian · Pull Request #3345 · axolotl-ai-cloud/axolotl

winglian · 2026-01-06T14:59:30Z

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

Chores
- Updated vllm dependency to version 0.13.0
- Enhanced test suite configurations to include vllm support in CI/CD pipelines for improved test coverage

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-06T14:59:42Z

📝 Walkthrough

Walkthrough

This PR updates CI workflow matrices to include vllm in the build extras for specific test configurations and bumps the vllm version constraint in setup.py from 0.11.1 to 0.13.0 for Python 3.11+ environments.

Changes

Cohort / File(s)	Summary
CI Workflow Matrices `.github/workflows/multi-gpu-e2e.yml`, `.github/workflows/tests.yml`	Adds vllm to `axolotl_extras` field: multi-gpu-e2e.yml updates the second matrix entry to include vllm alongside fbgemm-gpu; tests.yml adds vllm as an extra for the CUDA 128, Python 3.11, PyTorch 2.9.1 configuration.
Dependency Version `setup.py`	Updates vllm version constraint from `vllm==0.11.1` to `vllm==0.13.0` in the parse_requirements logic for Python (major, minor) >= (2, 9).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

upgrade to support latest transformers release #2984: Modifies the same workflow matrix field (axolotl_extras) and vllm version handling in setup.py.
fix vllm tagging and add cloud images w/o tmux #3049: Updates GitHub Actions workflow matrices to add vllm to axolotl_extras entries across multiple configurations.
add 2.7.0 torch images back to support vlllm #2885: Modifies CI workflow matrices to include vllm in build matrices and updates setup.py vllm version handling.

Suggested labels

ready to merge

Suggested reviewers

djsaunde
SalmanMohammadi

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	⚠️ Warning	The PR title states 'upgrade vllm to v0.14.0', but the actual changes in setup.py show vllm was updated to v0.13.0, not v0.14.0. This is a factual mismatch between the title and the actual implementation.	Update the PR title to 'upgrade vllm to v0.13.0' to accurately reflect the version change implemented in the code.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7bf6f70 and 12f4e57.

📒 Files selected for processing (3)

.github/workflows/multi-gpu-e2e.yml
.github/workflows/tests.yml
setup.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: PyTest from Source Dist (3.11, 2.9.1)
GitHub Check: PyTest from Source Dist (3.11, 2.8.0)
GitHub Check: PyTest from Source Dist (3.11, 2.9.0)
GitHub Check: PyTest (3.11, 2.9.1)
GitHub Check: PyTest (3.11, 2.9.0)
GitHub Check: PyTest (3.11, 2.8.0)
GitHub Check: test-axolotl-multigpu (128, 12.8.1, 3.11, 2.8.0, fbgemm-gpu, 2, true)

🔇 Additional comments (2)

.github/workflows/multi-gpu-e2e.yml (1)

43-43: LGTM! Consistent addition of vllm to multi-GPU test configuration.

The addition of vllm to the extras list for the PyTorch 2.9.1 test configuration aligns with the version constraint update in setup.py.

.github/workflows/tests.yml (1)

318-318: LGTM! Consistent addition of vllm to single-GPU test configuration.

The addition of vllm to the extras list for the PyTorch 2.9.1 test configuration ensures vllm functionality is tested in the CI pipeline.

coderabbitai · 2026-01-06T15:05:17Z

setup.py

                extras_require_map.pop("fbgemm-gpu")
                extras_require_map["fbgemm-gpu"] = ["fbgemm-gpu-genai==1.4.1"]
-                extras_require_map["vllm"] = ["vllm==0.11.1"]
+                extras_require_map["vllm"] = ["vllm==0.13.0"]


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check vllm version availability on PyPI echo "=== Checking vllm versions on PyPI ===" curl -s https://pypi.org/pypi/vllm/json | jq -r '.releases | keys[]' | grep -E '^0\.(11|12|13)\.' | sort -V

Repository: axolotl-ai-cloud/axolotl

Length of output: 141

🌐 Web query:

vllm 0.13.0 release notes breaking changes compatibility

💡 Result:

Here are the key breaking changes and compatibility notes for vLLM v0.13.0 (summary from the project release/changelog):

Breaking changes / API/config changes

PyTorch / environment dependency changes: vLLM requires newer runtime versions (PyTorch 2.8.0 / updated CUDA toolchain noted as a breaking dependency in the project changelog). [2]

Removed/deprecated V0 features and backends (examples in the changelog: V0 Neuron backend, V0 pooling model support, V0 FlashInfer attention backend; several API deprecations such as prompt_token_ids fallback removal). [2]

Quantization changes: some older/legacy quantization formats/hacks were removed (users must migrate to supported quantization paths). [2]

Config/key rename and removals (vLLM Ascend notes): chunked_prefill_for_mla in additional_config was dropped; dump_config renamed to dump_config_path and changed from dict → string. [1]

vLLM Ascend: vLLM version upgraded to 0.13.0 and support for v0.12.0 was dropped (Ascend-specific). [1]

Compatibility / dependency requirements

vLLM v0.13.0 packages published on PyPI (0.13.0 release artifacts available). [3]

vLLM Ascend compatibility matrix (for the Ascend build of vLLM v0.13.0rc1): Python >=3.10 and <3.12; tied to vLLM v0.13.0; specific Stable CANN / torch_npu versions listed for Ascend deployments. (See the matrix for exact platform bindings). [1]

General project notes: environment updates (CUDA / driver / PyTorch) are required for the breaking dependency changes introduced in recent 0.13.x line — check your CUDA/PyTorch/tooling versions before upgrading. [2]

If you want, I can:

fetch the full v0.13.0 changelog (raw GitHub release/compare) and extract the top 10 breaking/API changes with line-by-line source links; or

check specific compatibility for your platform (CUDA, Python, PyTorch) — tell me your OS, Python and PyTorch/CUDA versions.

Sources:
[1] vLLM Ascend release notes & compatibility matrix (v0.13.0rc1).
[2] vLLM project GitHub release changelog (breaking changes summary).
[3] vLLM 0.13.0 artifacts on PyPI.

🌐 Web query:

vllm 0.13.0 PyTorch 2.9 compatibility

💡 Result:

Short answer: vLLM 0.13.0 is not officially compatible with PyTorch 2.9. vLLM 0.13.0 packages and third‑party plugins reference torch 2.7.x (e.g., 2.7.1), and multiple GitHub issues/threads report breakages and binary/CUDA incompatibilities when using PyTorch 2.9. If you need PyTorch 2.9 you must expect to build vLLM from source and debug CUDA/toolchain mismatches — it’s not a drop‑in supported configuration. [1][2][3]

Sources:

vllm‑musa plugin compatibility matrix (lists vLLM 0.13.0 with torch 2.7.1). [1]

vLLM installation notes (binary builds tied to specific CUDA/PyTorch; recommends building from source for nonstandard PyTorch/CUDA). [2]

vLLM GitHub issues reporting problems and requests around PyTorch 2.9 (users report bugs/incompatibilities). [3]

vllm 0.13.0 is incompatible with PyTorch >= 2.9 — do not merge without resolution.

vllm 0.13.0 officially supports PyTorch 2.7.x, not 2.9+. PyPI builds are compiled against torch 2.7.1, and multiple reports confirm breakages when used with PyTorch 2.9. Building from source is required for PyTorch 2.9 compatibility, which is not a supported or tested configuration.

Additionally, vllm 0.13.0 introduces breaking changes from 0.11.1:

Removed V0 backends and features

Deprecated quantization formats

Config/API changes (e.g., dump_config renamed to dump_config_path)

Minimum PyTorch requirement is 2.8.0

Either pin vllm to 0.12.0 (or stay at 0.11.1), reduce PyTorch to 2.7.x, or document that vllm 0.13.0 requires custom source builds for PyTorch 2.9.

codecov · 2026-01-06T15:09:58Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

github-actions · 2026-01-21T22:21:02Z

📖 Documentation Preview: https://6971554939be4b2227e10f30--resonant-treacle-0fd729.netlify.app

Deployed on Netlify from commit e5fe9b8

coderabbitai bot reviewed Jan 6, 2026

View reviewed changes

salmanmohammadi approved these changes Jan 8, 2026

View reviewed changes

winglian force-pushed the vllm-2026 branch 2 times, most recently from 09d46ad to 0b54d75 Compare January 21, 2026 19:20

winglian added 8 commits January 21, 2026 17:29

use vllm v0.13.0

8790d22

support vllm 0.14.0 for torch 2.9.1

2a088e3

upgrade mistral-common

04c537d

cu129 required for vllm with torch 2.9.1

18ca399

support dockerfile arg for multigpu ci for uv support

29677af

use uv for vllm

5813cd4

upgrade packaging to 26.0

04e2a65

build vllm in ci too

e5fe9b8

winglian force-pushed the vllm-2026 branch from 788fe2e to e5fe9b8 Compare January 21, 2026 22:30

winglian changed the title ~~use vllm v0.13.0~~ upgrade vllm to v0.14.0 Jan 21, 2026

winglian merged commit a531e9d into main Jan 22, 2026
36 checks passed

winglian deleted the vllm-2026 branch January 22, 2026 01:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

upgrade vllm to v0.14.0#3345

upgrade vllm to v0.14.0#3345
winglian merged 8 commits intomainfrom
vllm-2026

winglian commented Jan 6, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 6, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 6, 2026

Uh oh!

codecov bot commented Jan 6, 2026

Uh oh!

github-actions bot commented Jan 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

winglian commented Jan 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jan 6, 2026

Codecov Report

Uh oh!

github-actions bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

winglian commented Jan 6, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 6, 2026 •

edited

Loading

github-actions bot commented Jan 21, 2026 •

edited

Loading