diff --git a/.github/workflows/schedule_image_build_and_push.yaml b/.github/workflows/schedule_image_build_and_push.yaml index 99a8cf99e16..f012a7b7ac2 100644 --- a/.github/workflows/schedule_image_build_and_push.yaml +++ b/.github/workflows/schedule_image_build_and_push.yaml @@ -30,11 +30,12 @@ on: type: choice options: - main + - v0.18.0rc1 - v0.17.0rc1 - v0.16.0rc1 - v0.15.0rc1 - v0.14.0rc1 - - v0.13.0rc3 + - v0.13.0 jobs: image_build: diff --git a/.github/workflows/schedule_release_code_and_wheel.yml b/.github/workflows/schedule_release_code_and_wheel.yml index 8a2eb6a4460..fcc2cc7f9f9 100644 --- a/.github/workflows/schedule_release_code_and_wheel.yml +++ b/.github/workflows/schedule_release_code_and_wheel.yml @@ -33,11 +33,12 @@ on: type: choice options: - main + - v0.18.0rc1 - v0.17.0rc1 - v0.16.0rc1 - v0.15.0rc1 - v0.14.0rc1 - - v0.13.0rc3 + - v0.13.0 jobs: build_and_release_code: diff --git a/README.md b/README.md index fce5d82419d..5a4ca70dbd5 100644 --- a/README.md +++ b/README.md @@ -53,7 +53,7 @@ By using vLLM Ascend plugin, popular open-source models, including Transformer-l - OS: Linux - Software: - Python >= 3.10, < 3.12 - - CANN == 8.5.0 (Ascend HDK version refers to [here](https://www.hiascend.com/document/detail/zh/canncommercial/83RC2/releasenote/releasenote_0000.html)) + - CANN == 8.5.1 (Ascend HDK version refers to [here](https://www.hiascend.com/document/detail/zh/canncommercial/83RC2/releasenote/releasenote_0000.html)) - PyTorch == 2.9.0, torch-npu == 2.9.0 - vLLM (the same version as vllm-ascend) @@ -63,7 +63,7 @@ Please use the following recommended versions to get started quickly: | Version | Release type | Doc | |------------|--------------|--------------------------------------| -| v0.17.0rc1 | Latest release candidate | See [QuickStart](https://docs.vllm.ai/projects/ascend/en/latest/quick_start.html) and [Installation](https://docs.vllm.ai/projects/ascend/en/latest/installation.html) for more details | +| v0.18.0rc1 | Latest release candidate | See [QuickStart](https://docs.vllm.ai/projects/ascend/en/latest/quick_start.html) and [Installation](https://docs.vllm.ai/projects/ascend/en/latest/installation.html) for more details | | v0.13.0 | Latest stable version | See [QuickStart](https://docs.vllm.ai/projects/ascend/en/v0.13.0/quick_start.html) and [Installation](https://docs.vllm.ai/projects/ascend/en/v0.13.0/installation.html) for more details | ## Contributing @@ -86,7 +86,7 @@ Below are the maintained branches: | Branch | Status | Note | |------------|--------------|--------------------------------------| -| main | Maintained | CI commitment for vLLM main branch and vLLM v0.17.0 tag | +| main | Maintained | CI commitment for vLLM main branch and vLLM v0.18.0 tag | | v0.7.1-dev | Unmaintained | Only doc fixes are allowed | | v0.7.3-dev | Maintained | CI commitment for vLLM 0.7.3 version, only bug fixes are allowed, and no new release tags anymore. | | v0.9.1-dev | Maintained | CI commitment for vLLM 0.9.1 version | diff --git a/README.zh.md b/README.zh.md index 4ce3e12775e..fef22b45682 100644 --- a/README.zh.md +++ b/README.zh.md @@ -47,7 +47,7 @@ vLLM 昇腾插件 (`vllm-ascend`) 是一个由社区维护的让vLLM在Ascend NP - 操作系统:Linux - 软件: - Python >= 3.10, < 3.12 - - CANN == 8.5.0 (Ascend HDK 版本参考[这里](https://www.hiascend.com/document/detail/zh/canncommercial/83RC2/releasenote/releasenote_0000.html)) + - CANN == 8.5.1 (Ascend HDK 版本参考[这里](https://www.hiascend.com/document/detail/zh/canncommercial/83RC2/releasenote/releasenote_0000.html)) - PyTorch == 2.9.0, torch-npu == 2.9.0 - vLLM (与vllm-ascend版本一致) @@ -57,7 +57,7 @@ vLLM 昇腾插件 (`vllm-ascend`) 是一个由社区维护的让vLLM在Ascend NP | Version | Release type | Doc | |------------|--------------|--------------------------------------| -|v0.17.0rc1| 最新RC版本 |请查看[快速开始](https://docs.vllm.ai/projects/ascend/en/latest/quick_start.html)和[安装指南](https://docs.vllm.ai/projects/ascend/en/latest/installation.html)了解更多| +|v0.18.0rc1| 最新RC版本 |请查看[快速开始](https://docs.vllm.ai/projects/ascend/en/latest/quick_start.html)和[安装指南](https://docs.vllm.ai/projects/ascend/en/latest/installation.html)了解更多| |v0.13.0| 最新正式/稳定版本 |[快速开始](https://docs.vllm.ai/projects/ascend/en/v0.13.0/quick_start.html) and [安装指南](https://docs.vllm.ai/projects/ascend/en/v0.13.0/installation.html)了解更多| ## 贡献 @@ -80,7 +80,7 @@ vllm-ascend有主干分支和开发分支。 | 分支 | 状态 | 备注 | |------------|------------|---------------------| -| main | Maintained | 基于vLLM main分支和vLLM最新版本(v0.17.0)CI看护 | +| main | Maintained | 基于vLLM main分支和vLLM最新版本(v0.18.0)CI看护 | | v0.7.1-dev | Unmaintained | 只允许文档修复 | | v0.7.3-dev | Maintained | 基于vLLM v0.7.3版本CI看护, 只允许Bug修复,不会再发布新版本 | | v0.9.1-dev | Maintained | 基于vLLM v0.9.1版本CI看护 | diff --git a/docs/source/community/contributors.md b/docs/source/community/contributors.md index fdafc51611b..d08cc8ecbea 100644 --- a/docs/source/community/contributors.md +++ b/docs/source/community/contributors.md @@ -20,14 +20,32 @@ | HeXiang Wang| [@whx-sjtu](https://github.com/whx-sjtu) | 2026/01 | ## Contributors - + Every release of vLLM Ascend would not have been possible without the following contributors: -Updated on 2026-03-09: +Updated on 2026-03-25: | Number | Contributor | Date | Commit ID | |:------:|:-----------:|:-----:|:---------:| +| 363 | [@GoMarck](https://github.com/GoMarck) | 2026/03/25 | [17da966](https://github.com/vllm-project/vllm-ascend/commit/17da96658f0b53a7e9b5932e64ced69a334f035c) | +| 362 | [@drizzlezyk](https://github.com/drizzlezyk) | 2026/03/24 | [5487946](https://github.com/vllm-project/vllm-ascend/commit/54879467c41784a446aa5b486a391d9bfbf488fa) | +| 361 | [@liuhy1213-cell](https://github.com/liuhy1213-cell) | 2026/03/23 | [fb283b5](https://github.com/vllm-project/vllm-ascend/commit/fb283b5820effe930d7f60952aca48177d710e94) | +| 360 | [@ZhuQi-seu](https://github.com/ZhuQi-seu) | 2026/03/23 | [e942b62](https://github.com/vllm-project/vllm-ascend/commit/e942b62d742ebc5bf128e85bc086d728df8d4935) | +| 359 | [@ksiyuan](https://github.com/ksiyuan) | 2026/03/20 | [a16c991](https://github.com/vllm-project/vllm-ascend/commit/a16c99141b0830240eeff0cbe01bfc3c833c62fb) | +| 358 | [@idouba](https://github.com/idouba) | 2026/03/20 | [f39f566](https://github.com/vllm-project/vllm-ascend/commit/f39f566e22b87ee75bd1205f982e4255a882c3a4) | +| 357 | [@yesyue-w](https://github.com/yesyue-w) | 2026/03/20 | [c860535](https://github.com/vllm-project/vllm-ascend/commit/c860535246cc751b6be7d1da2092e4380013598c) | +| 356 | [@jiangmengyu18](https://github.com/jiangmengyu18) | 2026/03/18 | [305820f](https://github.com/vllm-project/vllm-ascend/commit/305820f1a982ed9597932778891b5da64ecccae9) | +| 355 | [@SparrowMu](https://github.com/SparrowMu) | 2026/03/18 | [fb8e22e](https://github.com/vllm-project/vllm-ascend/commit/fb8e22ec00aef2b2d42a5f2d3ae7267848ec5016) | +| 354 | [@ppppeng](https://github.com/ppppeng) | 2026/03/17 | [a457d0f](https://github.com/vllm-project/vllm-ascend/commit/a457d0f0e8d91060c62d7ff2b1741bfc74d79560) | +| 353 | [@asunxiao](https://github.com/asunxiao) | 2026/03/17 | [a370dfa](https://github.com/vllm-project/vllm-ascend/commit/a370dfa9623e648439b724569931988a852e462e) | +| 352 | [@GGGGua](https://github.com/GGGGua) | 2026/03/16 | [b1a7888](https://github.com/vllm-project/vllm-ascend/commit/b1a78886a928cd7b5881026302fba79609972bd2) | +| 351 | [@bazingazhou233-hub](https://github.com/bazingazhou233-hub) | 2026/03/14 | [9e6c547](https://github.com/vllm-project/vllm-ascend/commit/9e6c547d9808eb5fa532d49102969c91b79be905) | +| 350 | [@tfhddd](https://github.com/tfhddd) | 2026/03/12 | [21fea86](https://github.com/vllm-project/vllm-ascend/commit/21fea86b08edf4a016749a0d637d18cf7017dd2a) | +| 349 | [@ZRJ026](https://github.com/ZRJ026) | 2026/03/10 | [a398fa6](https://github.com/vllm-project/vllm-ascend/commit/a398fa6a0b024f59aaa823c483529bcf2357540f) | +| 348 | [@xmpp777](https://github.com/xmpp777) | 2026/03/10 | [9216e1b](https://github.com/vllm-project/vllm-ascend/commit/9216e1b0505c7e290d8c02cc64cb8817bfdd49f5) | +| 347 | [@wanghuanjun2113](https://github.com/wanghuanjun2113) | 2026/03/09 | [dec04ec](https://github.com/vllm-project/vllm-ascend/commit/dec04ec8d884a45f1946b72dea129bc686cc2f44) | +| 346 | [@liuchenbing2026](https://github.com/liuchenbing2026) | 2026/03/09 | [542258a](https://github.com/vllm-project/vllm-ascend/commit/542258ac9d9229aab4e8822de42443245a93f001) | | 345 | [@chenxi-hh](https://github.com/chenxi-hh) | 2026/03/09 | [737dfcf](https://github.com/vllm-project/vllm-ascend/commit/737dfcf638eae71d6c24c340dee20ff205f21ed9) | | 344 | [@xiaocongtou6](https://github.com/xiaocongtou6) | 2026/03/06 | [bc0fd7c](https://github.com/vllm-project/vllm-ascend/commit/bc0fd7ca7217498d5faa91504b0e8c3f822a5cc6) | | 343 | [@wanghengkang](https://github.com/wanghengkang) | 2026/03/06 | [c49ce18](https://github.com/vllm-project/vllm-ascend/commit/c49ce18ea544970510ebb04fff49a484533fe2a3) | diff --git a/docs/source/community/versioning_policy.md b/docs/source/community/versioning_policy.md index 24ada4750cb..fea5ee40e71 100644 --- a/docs/source/community/versioning_policy.md +++ b/docs/source/community/versioning_policy.md @@ -23,6 +23,7 @@ The table below is the release compatibility matrix for vLLM Ascend release. | vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu | Triton Ascend | |-------------|-------------------|-----------------|-------------|---------------------------------|---------------| +| v0.18.0rc1 | v0.18.0 | >= 3.10, < 3.12 | 8.5.1 | 2.9.0 / 2.9.0 | 3.2.0 | | v0.17.0rc1 | v0.17.0 | >= 3.10, < 3.12 | 8.5.1 | 2.9.0 / 2.9.0 | 3.2.0 | | v0.16.0rc1 | v0.16.0 | >= 3.10, < 3.12 | 8.5.1 | 2.9.0 / 2.9.0 | 3.2.0 | | v0.15.0rc1 | v0.15.0 | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.9.0 | 3.2.0 | @@ -59,7 +60,7 @@ For main branch of vLLM Ascend, we usually make it compatible with the latest vL | vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu | |-------------|--------------|------------------|-------------|--------------------| -| main | ed359c497a728f08b5b41456c07a688ccd510fbc, v0.18.0 tag | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.9.0 | +| main | ed359c497a728f08b5b41456c07a688ccd510fbc, v0.18.0 tag | >= 3.10, < 3.12 | 8.5.1 | 2.9.0 / 2.9.0 | ## Release cadence @@ -67,6 +68,7 @@ For main branch of vLLM Ascend, we usually make it compatible with the latest vL | Date | Event | |------------|-------------------------------------------| +| 2026.03.27 | Release candidates, v0.18.0rc1 | | 2026.03.15 | Release candidates, v0.17.0rc1 | | 2026.03.10 | Release candidates, v0.16.0rc1 | | 2026.02.27 | Release candidates, v0.15.0rc1 | @@ -126,7 +128,7 @@ Usually, each minor version of vLLM (such as 0.7) corresponds to a vLLM Ascend v | Branch | State | Note | | ---------- | ------------ | -------------------------------------------------------- | -| main | Maintained | CI commitment for vLLM main branch and vLLM 0.16.0 tag | +| main | Maintained | CI commitment for vLLM main branch and vLLM 0.18.0 tag | | releases/v0.13.0 | Maintained | CI commitment for vLLM 0.13.0 version | | v0.11.0-dev| Maintained | CI commitment for vLLM 0.11.0 version | | v0.9.1-dev | Maintained | CI commitment for vLLM 0.9.1 version | diff --git a/docs/source/conf.py b/docs/source/conf.py index 1042dad701e..e58da70d260 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -65,15 +65,15 @@ # the branch of vllm, used in vllm clone # - main branch: 'main' # - vX.Y.Z branch: 'vX.Y.Z' - "vllm_version": "v0.17.0", + "vllm_version": "v0.18.0", # the branch of vllm-ascend, used in vllm-ascend clone and image tag # - main branch: 'main' # - vX.Y.Z branch: latest vllm-ascend release tag - "vllm_ascend_version": "v0.17.0rc1", + "vllm_ascend_version": "v0.18.0rc1", # the newest release version of vllm-ascend and matched vLLM, used in pip install. # This value should be updated when cut down release. - "pip_vllm_ascend_version": "0.17.0rc1", - "pip_vllm_version": "0.17.0", + "pip_vllm_ascend_version": "0.18.0rc1", + "pip_vllm_version": "0.18.0", # CANN image tag "cann_image_tag": "8.5.1-910b-ubuntu22.04-py3.11", # vllm version in ci diff --git a/docs/source/faqs.md b/docs/source/faqs.md index 8900b514e15..678a8b6dc72 100644 --- a/docs/source/faqs.md +++ b/docs/source/faqs.md @@ -2,6 +2,7 @@ ## Version Specific FAQs +- [[v0.18.0rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/7633) - [[v0.17.0rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/7173) - [[v0.13.0] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/6583) @@ -104,7 +105,7 @@ If all above steps are not working, feel free to submit a GitHub issue. ### 7. How vllm-ascend work with vLLM? -`vllm-ascend` is a hardware plugin for vLLM. The version of `vllm-ascend` is the same as the version of `vllm`. For example, if you use `vllm` 0.9.1, you should use vllm-ascend 0.9.1 as well. For the main branch, we ensure that `vllm-ascend` and `vllm` are compatible at every commit. +`vllm-ascend` is a hardware plugin for vLLM. Stable releases usually align with the same vLLM version, while RC releases may use the corresponding vLLM final release version. For example, `vllm-ascend` `v0.18.0rc1` matches vLLM `v0.18.0`. For the main branch, we ensure that `vllm-ascend` and `vllm` are compatible at every commit. ### 8. Does vllm-ascend support Prefill Disaggregation feature? diff --git a/docs/source/user_guide/release_notes.md b/docs/source/user_guide/release_notes.md index 1dc7f740720..3466b955ef7 100644 --- a/docs/source/user_guide/release_notes.md +++ b/docs/source/user_guide/release_notes.md @@ -1,5 +1,37 @@ # Release Notes +## v0.18.0rc1 - 2026.03.27 + +This is the first release candidate of v0.18.0 for vLLM Ascend. Please follow the [official doc](https://docs.vllm.ai/projects/ascend/en/latest) to get started. + +### Highlights + +- C8(INT8 KV cache) is now supported for GQA attention models, and also supported on DeepSeek-V3.1 with PD disaggregation scenario. [#7474](https://github.com/vllm-project/vllm-ascend/pull/7474), [#7222](https://github.com/vllm-project/vllm-ascend/pull/7222) +- DeepSeek models are now supported on A5 through new MLA operators. [#7232](https://github.com/vllm-project/vllm-ascend/pull/7232) + +### Features + +- Flash Comm V1 now supports VL models with MLA, removing a previous limitation for multimodal serving. [#7390](https://github.com/vllm-project/vllm-ascend/pull/7390) +- Support separate attention backends for target and draft models in speculative decoding, allowing finer backend tuning per model. [#7342](https://github.com/vllm-project/vllm-ascend/pull/7342) +- VL MoE models now support SP, and `sp_threshold` is removed in favor of `sp_min_token_num` from vLLM. [#7044](https://github.com/vllm-project/vllm-ascend/pull/7044) +- Qwen VL models now support `w8a8_mxfp8` quantization. [#7417](https://github.com/vllm-project/vllm-ascend/pull/7417) + +### Performance + +- Optimized the Qwen3.5 and Qwen3-Next GDN prefill path by prebuilding chunk metadata, reducing host-device synchronization overhead. [#7487](https://github.com/vllm-project/vllm-ascend/pull/7487) +- Simplified the FIA prefill context merge path for better runtime efficiency. [#7293](https://github.com/vllm-project/vllm-ascend/pull/7293) + +### Documentation + +- Refreshed deployment and model docs for Kimi-K2.5, GLM-4.7, DeepSeek-V3.2, MiniMax-M2.5, and PD disaggregation guides. [#7371](https://github.com/vllm-project/vllm-ascend/pull/7371) [#7403](https://github.com/vllm-project/vllm-ascend/pull/7403) [#7292](https://github.com/vllm-project/vllm-ascend/pull/7292) [#7296](https://github.com/vllm-project/vllm-ascend/pull/7296) [#7300](https://github.com/vllm-project/vllm-ascend/pull/7300) + +### Others + +- Fixed a PD separation issue where decode nodes could get stuck because shapes were not aligned across DP nodes. [#7534](https://github.com/vllm-project/vllm-ascend/pull/7534) +- Fixed a regression where hybrid attention plus mamba models on Ascend could start with an incorrect block size after the v0.18.0 upgrade. [#7528](https://github.com/vllm-project/vllm-ascend/pull/7528) +- Fixed multi-instance serving OOM calculation on single-card deployments. [#7427](https://github.com/vllm-project/vllm-ascend/pull/7427) +- Fixed DeepSeek v3.1 C8 when overlaying MTP with full decode and full graph modes. [#7571](https://github.com/vllm-project/vllm-ascend/pull/7571) + ## v0.17.0rc1 - 2026.03.15 This is the first release candidate of v0.17.0 for vLLM Ascend. Please follow the [official doc](https://docs.vllm.ai/projects/ascend/en/latest) to get started.