[ReleaseNote] Release note of v0.10.0rc1 (#2225)

MengqingCao · web-flow · commit 4604882a3e2c · 2025-08-07T14:46:49.000+08:00
### What this PR does / why we need it? Release note of v0.10.0rc1 - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@8e8e0b6 --------- Signed-off-by: MengqingCao <cmq0113@163.com>
diff --git a/README.md b/README.md
@@ -37,7 +37,7 @@ By using vLLM Ascend plugin, popular open-source models, including Transformer-l
 
 ## Prerequisites
 
-- Hardware: Atlas 800I A2 Inference series, Atlas A2 Training series
+- Hardware: Atlas 800I A2 Inference series, Atlas A2 Training series, Atlas 800I A3 Inference series, Atlas A3 Training series, Atlas 300I Duo (Experimental)
 - OS: Linux
 - Software:
   * Python >= 3.9, < 3.12
@@ -51,7 +51,7 @@ Please use the following recommended versions to get started quickly:
 
 | Version    | Release type | Doc                                  |
 |------------|--------------|--------------------------------------|
-|v0.9.2rc1|Latest release candidate|[QuickStart](https://vllm-ascend.readthedocs.io/en/latest/quick_start.html) and [Installation](https://vllm-ascend.readthedocs.io/en/latest/installation.html) for more details|
+|v0.10.0rc1|Latest release candidate|[QuickStart](https://vllm-ascend.readthedocs.io/en/latest/quick_start.html) and [Installation](https://vllm-ascend.readthedocs.io/en/latest/installation.html) for more details|
 |v0.9.1rc2|Next stable release|[QuickStart](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/quick_start.html) and [Installation](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/installation.html) for more details|
 |v0.7.3.post1|Latest stable version|[QuickStart](https://vllm-ascend.readthedocs.io/en/stable/quick_start.html) and [Installation](https://vllm-ascend.readthedocs.io/en/stable/installation.html) for more details|
 
@@ -73,7 +73,7 @@ Below is maintained branches:
 
 | Branch     | Status       | Note                                 |
 |------------|--------------|--------------------------------------|
-| main       | Maintained   | CI commitment for vLLM main branch and vLLM 0.9.x branch   |
+| main       | Maintained   | CI commitment for vLLM main branch and vLLM 0.10.x branch   |
 | v0.7.1-dev | Unmaintained | Only doc fixed is allowed |
 | v0.7.3-dev | Maintained   | CI commitment for vLLM 0.7.3 version, only bug fix is allowed and no new release tag any more. |
 | v0.9.1-dev | Maintained   | CI commitment for vLLM 0.9.1 version |
diff --git a/README.zh.md b/README.zh.md
@@ -37,7 +37,7 @@ vLLM 昇腾插件 (`vllm-ascend`) 是一个由社区维护的让vLLM在Ascend NP
 
 ## 准备
 
-- 硬件：Atlas 800I A2 Inference系列、Atlas A2 Training系列
+- 硬件：Atlas 800I A2 Inference系列、Atlas A2 Training系列、Atlas 800I A3 Inference系列、Atlas A3 Training系列、Atlas 300I Duo（实验性支持）
 - 操作系统：Linux
 - 软件：
   * Python >= 3.9, < 3.12
@@ -51,7 +51,7 @@ vLLM 昇腾插件 (`vllm-ascend`) 是一个由社区维护的让vLLM在Ascend NP
 
 | Version    | Release type | Doc                                  |
 |------------|--------------|--------------------------------------|
-|v0.9.2rc1| 最新RC版本 |请查看[快速开始](https://vllm-ascend.readthedocs.io/en/latest/quick_start.html)和[安装指南](https://vllm-ascend.readthedocs.io/en/latest/installation.html)了解更多|
+|v0.10.0rc1| 最新RC版本 |请查看[快速开始](https://vllm-ascend.readthedocs.io/en/latest/quick_start.html)和[安装指南](https://vllm-ascend.readthedocs.io/en/latest/installation.html)了解更多|
 |v0.9.1rc2| 下一个正式/稳定版 |[快速开始](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/quick_start.html) and [安装指南](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/installation.html)了解更多|
 |v0.7.3.post1| 最新正式/稳定版本 |请查看[快速开始](https://vllm-ascend.readthedocs.io/en/stable/quick_start.html)和[安装指南](https://vllm-ascend.readthedocs.io/en/stable/installation.html)了解更多|
 
diff --git a/docs/source/community/contributors.md b/docs/source/community/contributors.md
@@ -17,6 +17,28 @@ Updated on 2025-06-10:
 
 | Number | Contributor | Date | Commit ID |
 |:------:|:-----------:|:-----:|:---------:|
+| 105 | [@SlightwindSec](https://github.com/SlightwindSec) | 2025/8/5 | [f3b50c5](https://github.com/vllm-project/vllm-ascend/commit/f3b50c54e8243ad8ccefb9b033277fbdd382a9c4) |
+| 104 | [@CaveNightingale](https://github.com/CaveNightingale) | 2025/8/4 | [957c7f1](https://github.com/vllm-project/vllm-ascend/commit/957c7f108d5f0aea230220ccdc18d657229e4030) |
+| 103 | [@underfituu](https://github.com/underfituu) | 2025/8/4 | [e38fab0](https://github.com/vllm-project/vllm-ascend/commit/e38fab011d0b81f3a8e40d9bbe263c283dd4129b) |
+| 102 | [@yangqinghao-cmss](https://github.com/yangqinghao-cmss) | 2025/8/1 | [99fa0ac](https://github.com/vllm-project/vllm-ascend/commit/99fa0ac882c79ae9282940125b042a44ea422757) |
+| 101 | [@pjgao](https://github.com/pjgao) | 2025/7/31 | [6192bc9](https://github.com/vllm-project/vllm-ascend/commit/6192bc95c0e47097836e9be1f30f2a0a6fdca088) |
+| 100 | [@Liccol](https://github.com/Liccol) | 2025/7/31 | [7c90ba5](https://github.com/vllm-project/vllm-ascend/commit/7c90ba5fe8e420b891fdd30df050a33e3767835d) |
+| 99 | [@1024daniel](https://github.com/1024daniel) | 2025/7/31 | [db310c6](https://github.com/vllm-project/vllm-ascend/commit/db310c6ec97b056296f7c2348b90c1d96d0b562a) |
+| 98 | [@zhoux77899](https://github.com/zhoux77899) | 2025/7/30 | [4fcca13](https://github.com/vllm-project/vllm-ascend/commit/4fcca137a70c11daa4070ae014288be154715939) |
+| 97 | [@YuanCheng-coder](https://github.com/YuanCheng-coder) | 2025/7/30 | [34dd24a](https://github.com/vllm-project/vllm-ascend/commit/34dd24adf21fb85a2c413292754b1599832efae2) |
+| 96 | [@hongfugui](https://github.com/hongfugui) | 2025/7/30 | [1dbb888](https://github.com/vllm-project/vllm-ascend/commit/1dbb8882759e4326f5706f6e610674423376c2f3) |
+| 95 | [@Irving11-BKN](https://github.com/Irving11-BKN) | 2025/7/29 | [ca8007f](https://github.com/vllm-project/vllm-ascend/commit/ca8007f584141d3a59b2bcbd4f8ba269c9b7e252) |
+| 94 | [@taoxudonghaha](https://github.com/taoxudonghaha) | 2025/7/29 | [540336e](https://github.com/vllm-project/vllm-ascend/commit/540336edc9db09072a9aaa486fbf7ce625da5b9e) |
+| 93 | [@loukong33](https://github.com/loukong33) | 2025/7/28 | [1a25b0a](https://github.com/vllm-project/vllm-ascend/commit/1a25b0a2ddb23bf4d731ebac4503efaf237b191f) |
+| 92 | [@Ronald1995](https://github.com/Ronald1995) | 2025/7/25 | [e561a2c](https://github.com/vllm-project/vllm-ascend/commit/e561a2c6ec4493b490b13a4a9007d8f451ae0d0f) |
+| 91 | [@ZrBac](https://github.com/ZrBac) | 2025/7/24 | [2ffe051](https://github.com/vllm-project/vllm-ascend/commit/2ffe051859d585df8353d1b9eefb64c44078175a) |
+| 90 | [@SunnyLee151064](https://github.com/SunnyLee151064) | 2025/7/24 | [34571ea](https://github.com/vllm-project/vllm-ascend/commit/34571ea5ae69529758edf75f0252f86ccb4c7184) |
+| 89 | [@shiyuan680](https://github.com/shiyuan680) | 2025/7/23 | [ac0bf13](https://github.com/vllm-project/vllm-ascend/commit/ac0bf133f47ead20f18bf71f9be6dbe05fbd218f) |
+| 88 | [@aidoczh](https://github.com/aidoczh) | 2025/7/21 | [c32eea9](https://github.com/vllm-project/vllm-ascend/commit/c32eea96b73d26268070f57ef98416decc98aff7) |
+| 87 | [@nuclearwu](https://github.com/nuclearwu) | 2025/7/20 | [54f2b31](https://github.com/vllm-project/vllm-ascend/commit/54f2b311848badc86371d269140e729012a60f2c) |
+| 86 | [@pkking](https://github.com/pkking) | 2025/7/18 | [3e39d72](https://github.com/vllm-project/vllm-ascend/commit/3e39d7234c0e5c66b184c136c602e87272b5a36e) |
+| 85 | [@lianyiibo](https://github.com/lianyiibo) | 2025/7/18 | [53d2ea3](https://github.com/vllm-project/vllm-ascend/commit/53d2ea3789ffce32bf3ceb055d5582d28eadc6c7) |
+| 84 | [@xudongLi-cmss](https://github.com/xudongLi-cmss) | 2025/7/2 | [7fc1a98](https://github.com/vllm-project/vllm-ascend/commit/7fc1a984890bd930f670deedcb2dda3a46f84576) |
 | 83 | [@ZhengWG](https://github.com/) | 2025/7/7 | [3a469de](https://github.com/vllm-project/vllm-ascend/commit/9c886d0a1f0fc011692090b0395d734c83a469de) |
 | 82 | [@wm901115nwpu](https://github.com/) | 2025/7/7 | [a2a47d4](https://github.com/vllm-project/vllm-ascend/commit/f08c4f15a27f0f27132f4ca7a0c226bf0a2a47d4) |
 | 81 | [@Agonixiaoxiao](https://github.com/) | 2025/7/2 | [6f84576](https://github.com/vllm-project/vllm-ascend/commit/7fc1a984890bd930f670deedcb2dda3a46f84576) |
diff --git a/docs/source/community/versioning_policy.md b/docs/source/community/versioning_policy.md
@@ -22,6 +22,7 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin:
 
 | vLLM Ascend | vLLM         | Python           | Stable CANN | PyTorch/torch_npu  | MindIE Turbo |
 |-------------|--------------|------------------|-------------|--------------------|--------------|
+| v0.10.0rc1  | v0.10.0      | >= 3.9, < 3.12   | 8.2.RC1     | 2.7.1 / 2.7.1.dev20250724            |              |
 | v0.9.2rc1   | v0.9.2       | >= 3.9, < 3.12   | 8.1.RC1     | 2.5.1 / 2.5.1.post1.dev20250619      |              |
 | v0.9.1rc2   | v0.9.1       | >= 3.9, < 3.12   | 8.1.RC1     | 2.5.1 / 2.5.1.post1|              |
 | v0.9.1rc1   | v0.9.1       | >= 3.9, < 3.12   | 8.1.RC1     | 2.5.1 / 2.5.1.post1.dev20250528      |              |
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -65,15 +65,15 @@
     # the branch of vllm, used in vllm clone
     # - main branch: 'main'
     # - vX.Y.Z branch: 'vX.Y.Z'
-    'vllm_version': 'v0.9.2',
+    'vllm_version': 'v0.10.0',
     # the branch of vllm-ascend, used in vllm-ascend clone and image tag
     # - main branch: 'main'
     # - vX.Y.Z branch: latest vllm-ascend release tag
-    'vllm_ascend_version': 'v0.9.2rc1',
+    'vllm_ascend_version': 'v0.10.0rc1',
     # the newest release version of vllm-ascend and matched vLLM, used in pip install.
     # This value should be updated when cut down release.
-    'pip_vllm_ascend_version': "0.9.2rc1",
-    'pip_vllm_version': "0.9.2",
+    'pip_vllm_ascend_version': "0.10.0rc1",
+    'pip_vllm_version': "0.10.0",
     # CANN image tag
     'cann_image_tag': "8.2.rc1-910b-ubuntu22.04-py3.11",
     # vllm version in ci
diff --git a/docs/source/faqs.md b/docs/source/faqs.md
@@ -4,17 +4,19 @@
 
 - [[v0.7.3.post1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/1007)
 - [[v0.9.1rc2] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/1487)
-- [[v0.9.2rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/1742)
+- [[v0.10.0rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/2217)
 
 ## General FAQs
 
 ### 1. What devices are currently supported?
 
-Currently, **ONLY** Atlas A2 series(Ascend-cann-kernels-910b) and Atlas 300I(Ascend-cann-kernels-310p) series are supported:
+Currently, **ONLY** Atlas A2 series(Ascend-cann-kernels-910b)，Atlas A2 series(Atlas-A3-cann-kernels) and Atlas 300I(Ascend-cann-kernels-310p) series are supported:
 
 - Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
 - Atlas 800I A2 Inference series (Atlas 800I A2)
-- Atlas 300I Inference series (Atlas 300I Duo)
+- Atlas A3 Training series (Atlas 800T A3, Atlas 900 A3 SuperPoD, Atlas 9000 A3 SuperPoD)
+- Atlas 800I A3 Inference series (Atlas 800I A3)
+- [Experimental] Atlas 300I Inference series (Atlas 300I Duo)
 
 Below series are NOT supported yet:
 - Atlas 200I A2 (Ascend-cann-kernels-310b) unplanned yet
diff --git a/docs/source/quick_start.md b/docs/source/quick_start.md
@@ -5,6 +5,9 @@
 ### Supported Devices
 - Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
 - Atlas 800I A2 Inference series (Atlas 800I A2)
+- Atlas A3 Training series (Atlas 800T A3, Atlas 900 A3 SuperPoD, Atlas 9000 A3 SuperPoD)
+- Atlas 800I A3 Inference series (Atlas 800I A3)
+- [Experimental] Atlas 300I Inference series (Atlas 300I Duo)
 
 ## Setup environment using container
 
diff --git a/docs/source/tutorials/multi_node.md b/docs/source/tutorials/multi_node.md
@@ -6,8 +6,9 @@ vLLM-Ascend now supports Data Parallel (DP) deployment, enabling model weights t
 Each DP rank is deployed as a separate “core engine” process which communicates with front-end process(es) via ZMQ sockets. Data Parallel can be combined with Tensor Parallel, in which case each DP engine owns a number of per-NPU worker processes equal to the TP size.
 
 For Mixture-of-Experts (MoE) models — especially advanced architectures like DeepSeek that utilize Multi-head Latent Attention (MLA) — a hybrid parallelism approach is recommended:
-    - Use **Data Parallelism (DP)** for attention layers, which are replicated across devices and handle separate batches.
-    - Use **Expert or Tensor Parallelism (EP/TP)** for expert layers, which are sharded across devices to distribute the computation.
+
+- Use **Data Parallelism (DP)** for attention layers, which are replicated across devices and handle separate batches.
+- Use **Expert or Tensor Parallelism (EP/TP)** for expert layers, which are sharded across devices to distribute the computation.
 
 This division enables attention layers to be replicated across Data Parallel (DP) ranks, enabling them to process different batches independently. Meanwhile, expert layers are partitioned (sharded) across devices using Expert or Tensor Parallelism(DP*TP), maximizing hardware utilization and efficiency.
 
diff --git a/docs/source/user_guide/release_notes.md b/docs/source/user_guide/release_notes.md
@@ -1,5 +1,61 @@
 # Release note
 
+## v0.10.0rc1 - 2025.08.07
+
+This is the 1st release candidate of v0.10.0 for vLLM Ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/) to get started. V0 is completely removed from this version.
+
+### Highlights
+* Disaggregate prefill works with V1 engine now. You can take a try with DeepSeek model [#950](https://github.com/vllm-project/vllm-ascend/pull/950), following this [tutorial](https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/README.md).
+* W4A8 quantization method is supported for dense and MoE model now. [#2060](https://github.com/vllm-project/vllm-ascend/pull/2060) [#2172](https://github.com/vllm-project/vllm-ascend/pull/2172)
+
+### Core
+* Ascend PyTorch adapter (torch_npu) has been upgraded to `2.7.1.dev20250724`. [#1562](https://github.com/vllm-project/vllm-ascend/pull/1562) And CANN hase been upgraded to `8.2.RC1`. [#1653](https://github.com/vllm-project/vllm-ascend/pull/1653) Don’t forget to update them in your environment or using the latest images.
+* vLLM Ascend works on Atlas 800I A3 now, and the image on A3 will be released from this version on. [#1582](https://github.com/vllm-project/vllm-ascend/pull/1582)
+* Kimi-K2 with w8a8 quantization, Qwen3-Coder and GLM-4.5 is supported in vLLM Ascend, please following this [tutorial](https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_node_kimi.md.html) to have a try. [#2162](https://github.com/vllm-project/vllm-ascend/pull/2162)
+* Pipeline Parallelism is supported in V1 now. [#1800](https://github.com/vllm-project/vllm-ascend/pull/1800)
+* Prefix cache feature now work with the Ascend Scheduler. [#1446](https://github.com/vllm-project/vllm-ascend/pull/1446)
+* Torchair graph mode works with tp > 4 now. [#1508](https://github.com/vllm-project/vllm-ascend/issues/1508)
+* MTP support torchair graph mode now [#2145](https://github.com/vllm-project/vllm-ascend/pull/2145)
+
+## Other
+
+* Bug fixes:
+    * Fix functional problem of multi-modality models like Qwen2-audio with Aclgraph. [#1803](https://github.com/vllm-project/vllm-ascend/pull/1803)
+    * Fix the process group creating error with external launch scenario. [#1681](https://github.com/vllm-project/vllm-ascend/pull/1681)
+    * Fix the functional problem with guided decoding. [#2022](https://github.com/vllm-project/vllm-ascend/pull/2022)
+    * Fix the accuracy issue with common MoE models in DP scenario. [#1856](https://github.com/vllm-project/vllm-ascend/pull/1856)
+* Performance improved through a lot of prs:
+    * Caching sin/cos instead of calculate it every layer. [#1890](https://github.com/vllm-project/vllm-ascend/pull/1890)
+    * Improve shared expert multi-stream parallelism [#1891](https://github.com/vllm-project/vllm-ascend/pull/1891)
+    * Implement the fusion of allreduce and matmul in prefill phase when tp is enabled. Enable this feature by setting `VLLM_ASCEND_ENABLE_MATMUL_ALLREDUCE` to `1`. [#1926](https://github.com/vllm-project/vllm-ascend/pull/1926)
+    * Optimize Quantized MoE Performance by Reducing All2All Communication. [#2195](https://github.com/vllm-project/vllm-ascend/pull/2195)
+    * Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance [#1806](https://github.com/vllm-project/vllm-ascend/pull/1806)
+    * Use multicast to avoid padding decode request to prefill size [#1555](https://github.com/vllm-project/vllm-ascend/pull/1555)
+    * The performance of LoRA has been improved. [#1884](https://github.com/vllm-project/vllm-ascend/pull/1884)
+* A batch of refactoring prs to enhance the code architecture:
+    * Torchair model runner refactor [#2205](https://github.com/vllm-project/vllm-ascend/pull/2205)
+    * Refactoring forward_context and model_runner_v1. [#1979](https://github.com/vllm-project/vllm-ascend/pull/1979)
+    * Refactor AscendMetaData Comments. [#1967](https://github.com/vllm-project/vllm-ascend/pull/1967)
+    * Refactor torchair utils. [#1892](https://github.com/vllm-project/vllm-ascend/pull/1892)
+    * Refactor torchair worker. [#1885](https://github.com/vllm-project/vllm-ascend/pull/1885)
+    * Register activation customop instead of overwrite forward_oot. [#1841](https://github.com/vllm-project/vllm-ascend/pull/1841)
+* Parameters changes:
+    * `expert_tensor_parallel_size` in `additional_config` is removed now, and the EP and TP is aligned with vLLM now. [#1681](https://github.com/vllm-project/vllm-ascend/pull/1681)
+    * Add `VLLM_ASCEND_MLA_PA` in environ variables, use this to enable mla paged attention operator for deepseek mla decode.
+    * Add `VLLM_ASCEND_ENABLE_MATMUL_ALLREDUCE` in environ variables, enable `MatmulAllReduce` fusion kernel when tensor parallel is enabled. This feature is supported in A2, and eager mode will get better performance.
+    * Add `VLLM_ASCEND_ENABLE_MOE_ALL2ALL_SEQ` in environ variables, Whether to enable moe all2all seq, this provides a basic framework on the basis of alltoall for easy expansion.
+
+* UT coverage reached 76.34% after a batch of prs followed by this rfc: [#1298](https://github.com/vllm-project/vllm-ascend/issues/1298)
+* Sequence Parallelism works for Qwen3 MoE. [#2209](https://github.com/vllm-project/vllm-ascend/issues/2209)
+* Chinese online document is added now. [#1870](https://github.com/vllm-project/vllm-ascend/issues/1870)
+
+### Known Issues
+* Aclgraph could not work with DP + EP currently, the mainly gap is the number of npu stream that Aclgraph needed to capture graph is not enough. [#2229](https://github.com/vllm-project/vllm-ascend/issues/2229)
+* There is an accuracy issue on W8A8 dynamic quantized DeepSeek with multistream enabled. This will be fixed in the next release. [#2232](https://github.com/vllm-project/vllm-ascend/issues/2232)
+* In Qwen3 MoE, SP cannot be incorporated into the Aclgraph. [#2246](https://github.com/vllm-project/vllm-ascend/issues/2246)
+* MTP not support V1 scheduler currently, will fix it in Q3. [#2254](https://github.com/vllm-project/vllm-ascend/issues/2254)
+* When running MTP with DP > 1, we need to disable metrics logger due to some issue on vLLM. [#2254](https://github.com/vllm-project/vllm-ascend/issues/2254)
+
 ## v0.9.1rc2 - 2025.08.04
 This is the 2nd release candidate of v0.9.1 for vLLM Ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/) to get started.
 
diff --git a/docs/source/user_guide/support_matrix/supported_features.md b/docs/source/user_guide/support_matrix/supported_features.md