Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/schedule_image_build_and_push.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,12 @@ on:
type: choice
options:
- main
- v0.18.0rc1
- v0.17.0rc1
- v0.16.0rc1
- v0.15.0rc1
- v0.14.0rc1
- v0.13.0rc3
- v0.13.0

jobs:
image_build:
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/schedule_release_code_and_wheel.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,12 @@ on:
type: choice
options:
- main
- v0.18.0rc1
- v0.17.0rc1
- v0.16.0rc1
- v0.15.0rc1
- v0.14.0rc1
- v0.13.0rc3
- v0.13.0

jobs:
build_and_release_code:
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ By using vLLM Ascend plugin, popular open-source models, including Transformer-l
- OS: Linux
- Software:
- Python >= 3.10, < 3.12
- CANN == 8.5.0 (Ascend HDK version refers to [here](https://www.hiascend.com/document/detail/zh/canncommercial/83RC2/releasenote/releasenote_0000.html))
- CANN == 8.5.1 (Ascend HDK version refers to [here](https://www.hiascend.com/document/detail/zh/canncommercial/83RC2/releasenote/releasenote_0000.html))
- PyTorch == 2.9.0, torch-npu == 2.9.0
- vLLM (the same version as vllm-ascend)

Expand All @@ -63,7 +63,7 @@ Please use the following recommended versions to get started quickly:

| Version | Release type | Doc |
|------------|--------------|--------------------------------------|
| v0.17.0rc1 | Latest release candidate | See [QuickStart](https://docs.vllm.ai/projects/ascend/en/latest/quick_start.html) and [Installation](https://docs.vllm.ai/projects/ascend/en/latest/installation.html) for more details |
| v0.18.0rc1 | Latest release candidate | See [QuickStart](https://docs.vllm.ai/projects/ascend/en/latest/quick_start.html) and [Installation](https://docs.vllm.ai/projects/ascend/en/latest/installation.html) for more details |
| v0.13.0 | Latest stable version | See [QuickStart](https://docs.vllm.ai/projects/ascend/en/v0.13.0/quick_start.html) and [Installation](https://docs.vllm.ai/projects/ascend/en/v0.13.0/installation.html) for more details |

## Contributing
Expand All @@ -86,7 +86,7 @@ Below are the maintained branches:

| Branch | Status | Note |
|------------|--------------|--------------------------------------|
| main | Maintained | CI commitment for vLLM main branch and vLLM v0.17.0 tag |
| main | Maintained | CI commitment for vLLM main branch and vLLM v0.18.0 tag |
| v0.7.1-dev | Unmaintained | Only doc fixes are allowed |
| v0.7.3-dev | Maintained | CI commitment for vLLM 0.7.3 version, only bug fixes are allowed, and no new release tags anymore. |
| v0.9.1-dev | Maintained | CI commitment for vLLM 0.9.1 version |
Expand Down
6 changes: 3 additions & 3 deletions README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ vLLM 昇腾插件 (`vllm-ascend`) 是一个由社区维护的让vLLM在Ascend NP
- 操作系统:Linux
- 软件:
- Python >= 3.10, < 3.12
- CANN == 8.5.0 (Ascend HDK 版本参考[这里](https://www.hiascend.com/document/detail/zh/canncommercial/83RC2/releasenote/releasenote_0000.html))
- CANN == 8.5.1 (Ascend HDK 版本参考[这里](https://www.hiascend.com/document/detail/zh/canncommercial/83RC2/releasenote/releasenote_0000.html))
- PyTorch == 2.9.0, torch-npu == 2.9.0
- vLLM (与vllm-ascend版本一致)

Expand All @@ -57,7 +57,7 @@ vLLM 昇腾插件 (`vllm-ascend`) 是一个由社区维护的让vLLM在Ascend NP

| Version | Release type | Doc |
|------------|--------------|--------------------------------------|
|v0.17.0rc1| 最新RC版本 |请查看[快速开始](https://docs.vllm.ai/projects/ascend/en/latest/quick_start.html)和[安装指南](https://docs.vllm.ai/projects/ascend/en/latest/installation.html)了解更多|
|v0.18.0rc1| 最新RC版本 |请查看[快速开始](https://docs.vllm.ai/projects/ascend/en/latest/quick_start.html)和[安装指南](https://docs.vllm.ai/projects/ascend/en/latest/installation.html)了解更多|
|v0.13.0| 最新正式/稳定版本 |[快速开始](https://docs.vllm.ai/projects/ascend/en/v0.13.0/quick_start.html) and [安装指南](https://docs.vllm.ai/projects/ascend/en/v0.13.0/installation.html)了解更多|

## 贡献
Expand All @@ -80,7 +80,7 @@ vllm-ascend有主干分支和开发分支。

| 分支 | 状态 | 备注 |
|------------|------------|---------------------|
| main | Maintained | 基于vLLM main分支和vLLM最新版本(v0.17.0)CI看护 |
| main | Maintained | 基于vLLM main分支和vLLM最新版本(v0.18.0)CI看护 |
| v0.7.1-dev | Unmaintained | 只允许文档修复 |
| v0.7.3-dev | Maintained | 基于vLLM v0.7.3版本CI看护, 只允许Bug修复,不会再发布新版本 |
| v0.9.1-dev | Maintained | 基于vLLM v0.9.1版本CI看护 |
Expand Down
22 changes: 20 additions & 2 deletions docs/source/community/contributors.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,32 @@
| HeXiang Wang| [@whx-sjtu](https://github.com/whx-sjtu) | 2026/01 |

## Contributors
<!-- last_commit: 23bf5d4d48e6ec09e2b4f726279591a1b42f033b -->
<!-- last_commit: 8e3f8bab57cff0a98dc75ad43d8bf5bb4113f34e -->

Every release of vLLM Ascend would not have been possible without the following contributors:

Updated on 2026-03-09:
Updated on 2026-03-25:

| Number | Contributor | Date | Commit ID |
|:------:|:-----------:|:-----:|:---------:|
| 363 | [@GoMarck](https://github.com/GoMarck) | 2026/03/25 | [17da966](https://github.com/vllm-project/vllm-ascend/commit/17da96658f0b53a7e9b5932e64ced69a334f035c) |
| 362 | [@drizzlezyk](https://github.com/drizzlezyk) | 2026/03/24 | [5487946](https://github.com/vllm-project/vllm-ascend/commit/54879467c41784a446aa5b486a391d9bfbf488fa) |
| 361 | [@liuhy1213-cell](https://github.com/liuhy1213-cell) | 2026/03/23 | [fb283b5](https://github.com/vllm-project/vllm-ascend/commit/fb283b5820effe930d7f60952aca48177d710e94) |
| 360 | [@ZhuQi-seu](https://github.com/ZhuQi-seu) | 2026/03/23 | [e942b62](https://github.com/vllm-project/vllm-ascend/commit/e942b62d742ebc5bf128e85bc086d728df8d4935) |
| 359 | [@ksiyuan](https://github.com/ksiyuan) | 2026/03/20 | [a16c991](https://github.com/vllm-project/vllm-ascend/commit/a16c99141b0830240eeff0cbe01bfc3c833c62fb) |
| 358 | [@idouba](https://github.com/idouba) | 2026/03/20 | [f39f566](https://github.com/vllm-project/vllm-ascend/commit/f39f566e22b87ee75bd1205f982e4255a882c3a4) |
| 357 | [@yesyue-w](https://github.com/yesyue-w) | 2026/03/20 | [c860535](https://github.com/vllm-project/vllm-ascend/commit/c860535246cc751b6be7d1da2092e4380013598c) |
| 356 | [@jiangmengyu18](https://github.com/jiangmengyu18) | 2026/03/18 | [305820f](https://github.com/vllm-project/vllm-ascend/commit/305820f1a982ed9597932778891b5da64ecccae9) |
| 355 | [@SparrowMu](https://github.com/SparrowMu) | 2026/03/18 | [fb8e22e](https://github.com/vllm-project/vllm-ascend/commit/fb8e22ec00aef2b2d42a5f2d3ae7267848ec5016) |
| 354 | [@ppppeng](https://github.com/ppppeng) | 2026/03/17 | [a457d0f](https://github.com/vllm-project/vllm-ascend/commit/a457d0f0e8d91060c62d7ff2b1741bfc74d79560) |
| 353 | [@asunxiao](https://github.com/asunxiao) | 2026/03/17 | [a370dfa](https://github.com/vllm-project/vllm-ascend/commit/a370dfa9623e648439b724569931988a852e462e) |
| 352 | [@GGGGua](https://github.com/GGGGua) | 2026/03/16 | [b1a7888](https://github.com/vllm-project/vllm-ascend/commit/b1a78886a928cd7b5881026302fba79609972bd2) |
| 351 | [@bazingazhou233-hub](https://github.com/bazingazhou233-hub) | 2026/03/14 | [9e6c547](https://github.com/vllm-project/vllm-ascend/commit/9e6c547d9808eb5fa532d49102969c91b79be905) |
| 350 | [@tfhddd](https://github.com/tfhddd) | 2026/03/12 | [21fea86](https://github.com/vllm-project/vllm-ascend/commit/21fea86b08edf4a016749a0d637d18cf7017dd2a) |
| 349 | [@ZRJ026](https://github.com/ZRJ026) | 2026/03/10 | [a398fa6](https://github.com/vllm-project/vllm-ascend/commit/a398fa6a0b024f59aaa823c483529bcf2357540f) |
| 348 | [@xmpp777](https://github.com/xmpp777) | 2026/03/10 | [9216e1b](https://github.com/vllm-project/vllm-ascend/commit/9216e1b0505c7e290d8c02cc64cb8817bfdd49f5) |
| 347 | [@wanghuanjun2113](https://github.com/wanghuanjun2113) | 2026/03/09 | [dec04ec](https://github.com/vllm-project/vllm-ascend/commit/dec04ec8d884a45f1946b72dea129bc686cc2f44) |
| 346 | [@liuchenbing2026](https://github.com/liuchenbing2026) | 2026/03/09 | [542258a](https://github.com/vllm-project/vllm-ascend/commit/542258ac9d9229aab4e8822de42443245a93f001) |
| 345 | [@chenxi-hh](https://github.com/chenxi-hh) | 2026/03/09 | [737dfcf](https://github.com/vllm-project/vllm-ascend/commit/737dfcf638eae71d6c24c340dee20ff205f21ed9) |
| 344 | [@xiaocongtou6](https://github.com/xiaocongtou6) | 2026/03/06 | [bc0fd7c](https://github.com/vllm-project/vllm-ascend/commit/bc0fd7ca7217498d5faa91504b0e8c3f822a5cc6) |
| 343 | [@wanghengkang](https://github.com/wanghengkang) | 2026/03/06 | [c49ce18](https://github.com/vllm-project/vllm-ascend/commit/c49ce18ea544970510ebb04fff49a484533fe2a3) |
Expand Down
6 changes: 4 additions & 2 deletions docs/source/community/versioning_policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ The table below is the release compatibility matrix for vLLM Ascend release.

| vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu | Triton Ascend |
|-------------|-------------------|-----------------|-------------|---------------------------------|---------------|
| v0.18.0rc1 | v0.18.0 | >= 3.10, < 3.12 | 8.5.1 | 2.9.0 / 2.9.0 | 3.2.0 |
| v0.17.0rc1 | v0.17.0 | >= 3.10, < 3.12 | 8.5.1 | 2.9.0 / 2.9.0 | 3.2.0 |
| v0.16.0rc1 | v0.16.0 | >= 3.10, < 3.12 | 8.5.1 | 2.9.0 / 2.9.0 | 3.2.0 |
| v0.15.0rc1 | v0.15.0 | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.9.0 | 3.2.0 |
Expand Down Expand Up @@ -59,14 +60,15 @@ For main branch of vLLM Ascend, we usually make it compatible with the latest vL

| vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu |
|-------------|--------------|------------------|-------------|--------------------|
| main | ed359c497a728f08b5b41456c07a688ccd510fbc, v0.18.0 tag | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.9.0 |
| main | ed359c497a728f08b5b41456c07a688ccd510fbc, v0.18.0 tag | >= 3.10, < 3.12 | 8.5.1 | 2.9.0 / 2.9.0 |

## Release cadence

### Release window

| Date | Event |
|------------|-------------------------------------------|
| 2026.03.27 | Release candidates, v0.18.0rc1 |
| 2026.03.15 | Release candidates, v0.17.0rc1 |
| 2026.03.10 | Release candidates, v0.16.0rc1 |
| 2026.02.27 | Release candidates, v0.15.0rc1 |
Expand Down Expand Up @@ -126,7 +128,7 @@ Usually, each minor version of vLLM (such as 0.7) corresponds to a vLLM Ascend v

| Branch | State | Note |
| ---------- | ------------ | -------------------------------------------------------- |
| main | Maintained | CI commitment for vLLM main branch and vLLM 0.16.0 tag |
| main | Maintained | CI commitment for vLLM main branch and vLLM 0.18.0 tag |
| releases/v0.13.0 | Maintained | CI commitment for vLLM 0.13.0 version |
| v0.11.0-dev| Maintained | CI commitment for vLLM 0.11.0 version |
| v0.9.1-dev | Maintained | CI commitment for vLLM 0.9.1 version |
Expand Down
8 changes: 4 additions & 4 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,15 +65,15 @@
# the branch of vllm, used in vllm clone
# - main branch: 'main'
# - vX.Y.Z branch: 'vX.Y.Z'
"vllm_version": "v0.17.0",
"vllm_version": "v0.18.0",
# the branch of vllm-ascend, used in vllm-ascend clone and image tag
# - main branch: 'main'
# - vX.Y.Z branch: latest vllm-ascend release tag
"vllm_ascend_version": "v0.17.0rc1",
"vllm_ascend_version": "v0.18.0rc1",
# the newest release version of vllm-ascend and matched vLLM, used in pip install.
# This value should be updated when cut down release.
"pip_vllm_ascend_version": "0.17.0rc1",
"pip_vllm_version": "0.17.0",
"pip_vllm_ascend_version": "0.18.0rc1",
"pip_vllm_version": "0.18.0",
# CANN image tag
"cann_image_tag": "8.5.1-910b-ubuntu22.04-py3.11",
# vllm version in ci
Expand Down
3 changes: 2 additions & 1 deletion docs/source/faqs.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

## Version Specific FAQs

- [[v0.18.0rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/7633)
- [[v0.17.0rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/7173)
- [[v0.13.0] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/6583)

Expand Down Expand Up @@ -104,7 +105,7 @@ If all above steps are not working, feel free to submit a GitHub issue.

### 7. How vllm-ascend work with vLLM?

`vllm-ascend` is a hardware plugin for vLLM. The version of `vllm-ascend` is the same as the version of `vllm`. For example, if you use `vllm` 0.9.1, you should use vllm-ascend 0.9.1 as well. For the main branch, we ensure that `vllm-ascend` and `vllm` are compatible at every commit.
`vllm-ascend` is a hardware plugin for vLLM. Stable releases usually align with the same vLLM version, while RC releases may use the corresponding vLLM final release version. For example, `vllm-ascend` `v0.18.0rc1` matches vLLM `v0.18.0`. For the main branch, we ensure that `vllm-ascend` and `vllm` are compatible at every commit.

### 8. Does vllm-ascend support Prefill Disaggregation feature?

Expand Down
40 changes: 40 additions & 0 deletions docs/source/user_guide/release_notes.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,45 @@
# Release Notes

## v0.18.0rc1 - 2026.03.27

This is the first release candidate of v0.18.0 for vLLM Ascend. Please follow the [official doc](https://docs.vllm.ai/projects/ascend/en/latest) to get started.

### Highlights

- Balance scheduling is now supported via `VLLM_ASCEND_BALANCE_SCHEDULING` for better data-parallel load balancing. [#7611](https://github.com/vllm-project/vllm-ascend/pull/7611)
- Flash Comm V1 now supports VL models with MLA, removing a previous limitation for multimodal serving. [#7390](https://github.com/vllm-project/vllm-ascend/pull/7390)
- DeepSeek models are now supported on A5 through new MLA operators. [#7232](https://github.com/vllm-project/vllm-ascend/pull/7232)

### Features

- Support separate attention backends for target and draft models in speculative decoding, allowing finer backend tuning per model. [#7342](https://github.com/vllm-project/vllm-ascend/pull/7342)
- VL MoE models now support SP, and `sp_threshold` is removed in favor of `sp_min_token_num` from vLLM. [#7044](https://github.com/vllm-project/vllm-ascend/pull/7044)
- Qwen VL models now support `w8a8_mxfp8` quantization. [#7417](https://github.com/vllm-project/vllm-ascend/pull/7417)
- LayerwiseConnector now supports virtual push on decode nodes in PD deployment. [#7361](https://github.com/vllm-project/vllm-ascend/pull/7361)

### Performance

- Optimized the Qwen3.5 and Qwen3-Next GDN prefill path by prebuilding chunk metadata, reducing host-device synchronization overhead. [#7487](https://github.com/vllm-project/vllm-ascend/pull/7487)
- Simplified the FIA prefill context merge path for better runtime efficiency. [#7293](https://github.com/vllm-project/vllm-ascend/pull/7293)

### Dependencies

- vLLM is upgraded to v0.18.0 for docker and release flows. [#7523](https://github.com/vllm-project/vllm-ascend/pull/7523) [#7502](https://github.com/vllm-project/vllm-ascend/pull/7502)

### Documentation

- Added configuration documentation for `enable_sparse_c8`. [#7600](https://github.com/vllm-project/vllm-ascend/pull/7600)
- Refreshed deployment and model docs for Kimi-K2.5, GLM-4.7, DeepSeek-V3.2, MiniMax-M2.5, and PD disaggregation guides. [#7371](https://github.com/vllm-project/vllm-ascend/pull/7371) [#7403](https://github.com/vllm-project/vllm-ascend/pull/7403) [#7292](https://github.com/vllm-project/vllm-ascend/pull/7292) [#7296](https://github.com/vllm-project/vllm-ascend/pull/7296) [#7300](https://github.com/vllm-project/vllm-ascend/pull/7300)

### Others

- Lowered the log level in PD disaggregation to reduce noisy deployment logs. [#7589](https://github.com/vllm-project/vllm-ascend/pull/7589)
- Fixed a PD separation issue where decode nodes could get stuck because shapes were not aligned across DP nodes. [#7534](https://github.com/vllm-project/vllm-ascend/pull/7534)
- Fixed a regression where hybrid attention plus mamba models on Ascend could start with an incorrect block size after the v0.18.0 upgrade. [#7528](https://github.com/vllm-project/vllm-ascend/pull/7528)
- Fixed multi-instance serving OOM calculation on single-card deployments. [#7427](https://github.com/vllm-project/vllm-ascend/pull/7427)
- Fixed the speculative decoding proposer path for v0.18.0. [#7544](https://github.com/vllm-project/vllm-ascend/pull/7544)
- Fixed DeepSeek v3.1 C8 when overlaying MTP with full decode and full graph modes. [#7571](https://github.com/vllm-project/vllm-ascend/pull/7571)

## v0.17.0rc1 - 2026.03.15

This is the first release candidate of v0.17.0 for vLLM Ascend. Please follow the [official doc](https://docs.vllm.ai/projects/ascend/en/latest) to get started.
Expand Down
Loading