vllm-project
diff --git a/‎docs/source/assets/eplb_swift_balancer.png
-58 KB b/‎docs/source/assets/eplb_swift_balancer.png
-58 KB
diff --git a/‎docs/source/assets/multi_node_dp.png
115 KB b/‎docs/source/assets/multi_node_dp.png
115 KB
diff --git a/‎docs/source/community/contributors.md
Lines changed: 19 additions & 1 deletion b/‎docs/source/community/contributors.md
Lines changed: 19 additions & 1 deletion
diff --git a/‎docs/source/community/governance.md
Lines changed: 2 additions & 2 deletions b/‎docs/source/community/governance.md
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/source/community/user_stories/index.md
Lines changed: 19 additions & 0 deletions b/‎docs/source/community/user_stories/index.md
Lines changed: 19 additions & 0 deletions
diff --git a/‎docs/source/community/user_stories/llamafactory.md
Lines changed: 19 additions & 0 deletions b/‎docs/source/community/user_stories/llamafactory.md
Lines changed: 19 additions & 0 deletions
diff --git a/‎docs/source/developer_guide/versioning_policy.md renamed to ‎docs/source/community/versioning_policy.md
Lines changed: 6 additions & 2 deletions b/‎docs/source/developer_guide/versioning_policy.md renamed to ‎docs/source/community/versioning_policy.md
Lines changed: 6 additions & 2 deletions
diff --git a/‎docs/source/developer_guide/contributing.md renamed to ‎docs/source/developer_guide/contribution/index.md
Lines changed: 44 additions & 46 deletions b/‎docs/source/developer_guide/contributing.md renamed to ‎docs/source/developer_guide/contribution/index.md
Lines changed: 44 additions & 46 deletions
@@ -7,15 +7,33 @@
 | Xiyuan Wang| [@wangxiyuan](https://github.com/wangxiyuan) | 2025/01 |
 | Yikun Jiang| [@Yikun](https://github.com/Yikun) | 2025/02 |
 | Yi Gan| [@ganyi1996ppo](https://github.com/ganyi1996ppo) | 2025/02 |
+| Shoujian Zheng| [@jianzs](https://github.com/jianzs) | 2025/06 |
 
 ## Contributors
 
 vLLM Ascend every release would not have been possible without the following contributors:
 
-Updated on 2025-06-10:
+Updated on 2025-07-07:
 
 | Number | Contributor | Date | Commit ID |
 |:------:|:-----------:|:-----:|:---------:|
+| 83 | [@ZhengWG](https://github.com/) | 2025/7/7 | [3a469de](https://github.com/vllm-project/vllm-ascend/commit/9c886d0a1f0fc011692090b0395d734c83a469de) |
+| 82 | [@wm901115nwpu](https://github.com/) | 2025/7/7 | [a2a47d4](https://github.com/vllm-project/vllm-ascend/commit/f08c4f15a27f0f27132f4ca7a0c226bf0a2a47d4) |
+| 81 | [@Agonixiaoxiao](https://github.com/) | 2025/7/2 | [6f84576](https://github.com/vllm-project/vllm-ascend/commit/7fc1a984890bd930f670deedcb2dda3a46f84576) |
+| 80 | [@zhanghw0354](https://github.com/zhanghw0354) | 2025/7/2 | [d3df9a5](https://github.com/vllm-project/vllm-ascend/commit/9fb3d558e5b57a3c97ee5e11b9f5dba6ad3df9a5) |
+| 79 | [@GDzhu01](https://github.com/GDzhu01) | 2025/6/28 | [de256ac](https://github.com/vllm-project/vllm-ascend/commit/b308a7a25897b88d4a23a9e3d583f4ec6de256ac) |
+| 78 | [@leo-pony](https://github.com/leo-pony) | 2025/6/26 | [3f2a5f2](https://github.com/vllm-project/vllm-ascend/commit/10253449120307e3b45f99d82218ba53e3f2a5f2) |
+| 77 | [@zeshengzong](https://github.com/zeshengzong) | 2025/6/26 | [3ee25aa](https://github.com/vllm-project/vllm-ascend/commit/192dbbcc6e244a8471d3c00033dc637233ee25aa) |
+| 76 | [@sharonyunyun](https://github.com/sharonyunyun) | 2025/6/25 | [2dd8666](https://github.com/vllm-project/vllm-ascend/commit/941269a6c5bbc79f6c1b6abd4680dc5802dd8666) |
+| 75 | [@Pr0Wh1teGivee](https://github.com/Pr0Wh1teGivee) | 2025/6/25 | [c65dd40](https://github.com/vllm-project/vllm-ascend/commit/2fda60464c287fe456b4a2f27e63996edc65dd40) |
+| 74 | [@xleoken](https://github.com/xleoken) | 2025/6/23 | [c604de0](https://github.com/vllm-project/vllm-ascend/commit/4447e53d7ad5edcda978ca6b0a3a26a73c604de0) |
+| 73 | [@lyj-jjj](https://github.com/lyj-jjj) | 2025/6/23 | [5cbd74e](https://github.com/vllm-project/vllm-ascend/commit/5177bef87a21331dcca11159d3d1438075cbd74e) |
+| 72 | [@farawayboat](https://github.com/farawayboat)| 2025/6/21 | [bc7d392](https://github.com/vllm-project/vllm-ascend/commit/097e7149f75c0806774bc68207f0f6270bc7d392)
+| 71 | [@yuancaoyaoHW](https://github.com/yuancaoyaoHW) | 2025/6/20 | [7aa0b94](https://github.com/vllm-project/vllm-ascend/commit/00ae250f3ced68317bc91c93dc1f1a0977aa0b94)
+| 70 | [@songshanhu07](https://github.com/songshanhu07) | 2025/6/18 | [5e1de1f](https://github.com/vllm-project/vllm-ascend/commit/2a70dbbdb8f55002de3313e17dfd595e1de1f)
+| 69 | [@wangyanhui-cmss](https://github.com/wangyanhui-cmss) | 2025/6/12| [40c9e88](https://github.com/vllm-project/vllm-ascend/commit/2a5fb4014b863cee6abc3009f5bc5340c9e88) |
+| 68 | [@chenwaner](https://github.com/chenwaner) | 2025/6/11 | [c696169](https://github.com/vllm-project/vllm-ascend/commit/e46dc142bf1180453c64226d76854fc1ec696169) |
+| 67 | [@yzim](https://github.com/yzim) | 2025/6/11 | [aaf701b](https://github.com/vllm-project/vllm-ascend/commit/4153a5091b698c2270d160409e7fee73baaf701b) |
 | 66 | [@Yuxiao-Xu](https://github.com/Yuxiao-Xu) | 2025/6/9 | [6b853f1](https://github.com/vllm-project/vllm-ascend/commit/6b853f15fe69ba335d2745ebcf14a164d0bcc505) |
 | 65 | [@ChenTaoyu-SJTU](https://github.com/ChenTaoyu-SJTU) | 2025/6/7 | [20dedba](https://github.com/vllm-project/vllm-ascend/commit/20dedba5d1fc84b7ae8b49f9ce3e3649389e2193) |
 | 64 | [@zxdukki](https://github.com/zxdukki) | 2025/6/7 | [87ebaef](https://github.com/vllm-project/vllm-ascend/commit/87ebaef4e4e519988f27a6aa378f614642202ecf) |
 
@@ -1,7 +1,7 @@
 # Governance
 
 ## Mission
-As a vital component of vLLM, the vLLM Ascend project is dedicated to providing an easy, fast, and cheap LLM Serving for Everyone on Ascend NPU, and to actively contribute to the enrichment of vLLM. 
+As a vital component of vLLM, the vLLM Ascend project is dedicated to providing an easy, fast, and cheap LLM Serving for Everyone on Ascend NPU, and to actively contribute to the enrichment of vLLM.
 
 ## Principles
 vLLM Ascend follows the vLLM community's code of conduct：[vLLM - CODE OF CONDUCT](https://github.com/vllm-project/vllm/blob/main/CODE_OF_CONDUCT.md)
@@ -13,7 +13,7 @@ vLLM Ascend is an open-source project under the vLLM community, where the author
 
     **Responsibility:** Help new contributors on boarding, handle and respond to community questions, review RFCs, code
 
-    **Requirements:** Complete at least 1 contribution. Contributor is someone who consistently and actively participates in a project, included but not limited to issue/review/commits/community involvement. 
+    **Requirements:** Complete at least 1 contribution. Contributor is someone who consistently and actively participates in a project, included but not limited to issue/review/commits/community involvement.
 
     Contributors will be empowered [vllm-project/vllm-ascend](https://github.com/vllm-project/vllm-ascend) Github repo `Triage` permissions (`Can read and clone this repository. Can also manage issues and pull requests`) to help community developers collaborate more efficiently.
 
 
@@ -0,0 +1,19 @@
+# User Stories
+
+Read case studies on how users and developers solves real, everyday problems with vLLM Ascend
+
+- [LLaMA-Factory](./llamafactory.md) is an easy-to-use and efficient platform for training and fine-tuning large language models, it supports vLLM Ascend to speed up inference since [LLaMA-Factory#7739](https://github.com/hiyouga/LLaMA-Factory/pull/7739), gain 2x performance enhancement of inference.
+
+- [Huggingface/trl](https://github.com/huggingface/trl) is a cutting-edge library designed for post-training foundation models using advanced techniques like SFT, PPO and DPO, it uses vLLM Ascend since [v0.17.0](https://github.com/huggingface/trl/releases/tag/v0.17.0) to support RLHF on Ascend NPU.
+
+- [MindIE Turbo](https://pypi.org/project/mindie-turbo) is an LLM inference engine acceleration plug-in library developed by Huawei on Ascend hardware, which includes self-developed large language model optimization algorithms and optimizations related to the inference engine framework. It supports vLLM Ascend since [2.0rc1](https://www.hiascend.com/document/detail/zh/mindie/20RC1/AcceleratePlugin/turbodev/mindie-turbo-0001.html).
+
+- [GPUStack](https://github.com/gpustack/gpustack) is an open-source GPU cluster manager for running AI models. It supports vLLM Ascend since [v0.6.2](https://github.com/gpustack/gpustack/releases/tag/v0.6.2), see more GPUStack performance evaluation info on [link](https://mp.weixin.qq.com/s/pkytJVjcH9_OnffnsFGaew).
+
+- [verl](https://github.com/volcengine/verl) is a flexible, efficient and production-ready RL training library for large language models (LLMs), uses vLLM Ascend since [v0.4.0](https://github.com/volcengine/verl/releases/tag/v0.4.0), see more info on [verl x Ascend Quickstart](https://verl.readthedocs.io/en/latest/ascend_tutorial/ascend_quick_start.html).
+
+:::{toctree}
+:caption: More details
+:maxdepth: 1
+llamafactory
+:::
@@ -0,0 +1,19 @@
+# LLaMA-Factory
+
+**About / Introduction**
+
+[LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) is an easy-to-use and efficient platform for training and fine-tuning large language models. With LLaMA-Factory, you can fine-tune hundreds of pre-trained models locally without writing any code.
+
+LLaMA-Facotory users need to evaluate and inference the model after fine-tuning the model.
+
+**The Business Challenge**
+
+LLaMA-Factory used transformers to perform inference on Ascend NPU, but the speed was slow.
+
+**Solving Challenges and Benefits with vLLM Ascend**
+
+With the joint efforts of LLaMA-Factory and vLLM Ascend ([LLaMA-Factory#7739](https://github.com/hiyouga/LLaMA-Factory/pull/7739)), the performance of LLaMA-Factory in the model inference stage has been significantly improved. According to the test results, the inference speed of LLaMA-Factory has been increased to 2x compared to the transformers version.
+
+**Learn more**
+
+See more about LLaMA-Factory and how it uses vLLM Ascend for inference on the Ascend NPU in the following documentation: [LLaMA-Factory Ascend NPU Inference](https://llamafactory.readthedocs.io/en/latest/advanced/npu_inference.html).
@@ -22,6 +22,8 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin:
 
 | vLLM Ascend | vLLM         | Python           | Stable CANN | PyTorch/torch_npu  | MindIE Turbo |
 |-------------|--------------|------------------|-------------|--------------------|--------------|
+| v0.9.2rc1   | v0.9.2       | >= 3.9, < 3.12   | 8.1.RC1     | 2.5.1 / 2.5.1.post1.dev20250619      |              |
+| v0.9.1rc1   | v0.9.1       | >= 3.9, < 3.12   | 8.1.RC1     | 2.5.1 / 2.5.1.post1.dev20250528      |              |
 | v0.9.0rc2   | v0.9.0       | >= 3.9, < 3.12   | 8.1.RC1     | 2.5.1 / 2.5.1      |              |
 | v0.9.0rc1   | v0.9.0       | >= 3.9, < 3.12   | 8.1.RC1     | 2.5.1 / 2.5.1      |              |
 | v0.8.5rc1   | v0.8.5.post1 | >= 3.9, < 3.12   | 8.1.RC1     | 2.5.1 / 2.5.1      |              |
@@ -35,6 +37,8 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin:
 
 | Date       | Event                                     |
 |------------|-------------------------------------------|
+| 2025.07.11 | Release candidates, v0.9.2rc1             |
+| 2025.06.22 | Release candidates, v0.9.1rc1             |
 | 2025.06.10 | Release candidates, v0.9.0rc2             |
 | 2025.06.09 | Release candidates, v0.9.0rc1             |
 | 2025.05.29 | v0.7.x post release, v0.7.3.post1         |
@@ -72,8 +76,8 @@ Usually, each minor version of vLLM (such as 0.7) will correspond to a vLLM Asce
 
 | Branch     | Status       | Note                                 |
 |------------|--------------|--------------------------------------|
-| main       | Maintained   | CI commitment for vLLM main branch and vLLM 0.9.x branch   |
-| v0.9.1-dev | Maintained   | CI commitment for vLLM 0.9.0 and 0.9.1 version |
+| main       | Maintained   | CI commitment for vLLM main branch and vLLM 0.9.2 branch   |
+| v0.9.1-dev | Maintained   | CI commitment for vLLM 0.9.1 version |
 | v0.7.3-dev | Maintained   | CI commitment for vLLM 0.7.3 version |
 | v0.7.1-dev | Unmaintained | Replaced by v0.7.3-dev               |
 
 
@@ -4,82 +4,74 @@
 It's recommended to set up a local development environment to build and test
 before you submit a PR.
 
-### Prepare environment and build
+### Setup development environment
 
 Theoretically, the vllm-ascend build is only supported on Linux because
 `vllm-ascend` dependency `torch_npu` only supports Linux.
 
 But you can still set up dev env on Linux/Windows/macOS for linting and basic
 test as following commands:
 
+#### Run lint locally
+
 ```bash
 # Choose a base dir (~/vllm-project/) and set up venv
 cd ~/vllm-project/
 python3 -m venv .venv
 source ./.venv/bin/activate
 
-# Clone vllm code and install
-git clone https://github.com/vllm-project/vllm.git
-cd vllm
-pip install -r requirements/build.txt
-VLLM_TARGET_DEVICE="empty" pip install .
-cd ..
-
 # Clone vllm-ascend and install
 git clone https://github.com/vllm-project/vllm-ascend.git
 cd vllm-ascend
-# install system requirement
-apt install -y gcc g++ cmake libnuma-dev
-# install project requirement
-pip install -r requirements-dev.txt
-
-# Then you can run lint and mypy test
-bash format.sh
 
-# Build:
-# - only supported on Linux (torch_npu available)
-# pip install -e .
-# - build without deps for debugging in other OS
-# pip install -e . --no-deps
-# - build without custom ops
-# COMPILE_CUSTOM_KERNELS=0 pip install -e .
+# Install lint requirement and enable pre-commit hook
+pip install -r requirements-lint.txt
 
-# Commit changed files using `-s`
-git commit -sm "your commit info"
+# Run lint (You need install pre-commits deps via proxy network at first time)
+bash format.sh
 ```
 
-### Testing
+#### Run CI locally
 
-Although vllm-ascend CI provide integration test on [Ascend](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml), you can run it
-locally. The simplest way to run these integration tests locally is through a container:
-
-```bash
-# Under Ascend NPU environment
-git clone https://github.com/vllm-project/vllm-ascend.git
-cd vllm-ascend
+After complete "Run lint" setup, you can run CI locally:
 
-export IMAGE=vllm-ascend-dev-image
-export CONTAINER_NAME=vllm-ascend-dev
-export DEVICE=/dev/davinci1
+```{code-block} bash
+   :substitutions:
 
-# The first build will take about 10 mins (10MB/s) to download the base image and packages
-docker build -t $IMAGE -f ./Dockerfile .
-# You can also specify the mirror repo via setting VLLM_REPO to speedup
-# docker build -t $IMAGE -f ./Dockerfile . --build-arg VLLM_REPO=https://gitee.com/mirrors/vllm
+cd ~/vllm-project/
 
-docker run --rm --name $CONTAINER_NAME --network host --device $DEVICE \
-           --device /dev/davinci_manager --device /dev/devmm_svm \
-           --device /dev/hisi_hdc -v /usr/local/dcmi:/usr/local/dcmi \
-           -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-           -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-           -ti $IMAGE bash
+# Run CI need vLLM installed
+git clone --branch |vllm_version| https://github.com/vllm-project/vllm.git
+cd vllm
+pip install -r requirements/build.txt
+VLLM_TARGET_DEVICE="empty" pip install .
+cd ..
 
+# Install requirements
 cd vllm-ascend
+# For Linux:
 pip install -r requirements-dev.txt
+# For non Linux:
+cat requirements-dev.txt | grep -Ev '^#|^--|^$|^-r' | while read PACKAGE; do pip install "$PACKAGE"; done
+cat requirements.txt | grep -Ev '^#|^--|^$|^-r' | while read PACKAGE; do pip install "$PACKAGE"; done
+
+# Run ci:
+bash format.sh ci
+```
+
+#### Submit the commit
 
-pytest tests/
+```bash
+# Commit changed files using `-s`
+git commit -sm "your commit info"
 ```
 
+🎉 Congratulations! You have completed the development environment setup.
+
+### Test locally
+
+You can refer to [Testing](./testing.md) doc to help you setup testing environment and running tests locally.
+
 ## DCO and Signed-off-by
 
 When contributing changes to this project, you must agree to the DCO. Commits must include a `Signed-off-by:` header which certifies agreement with the terms of the DCO.
@@ -111,3 +103,9 @@ If the PR spans more than one category, please include all relevant prefixes.
 
 You may find more information about contributing to vLLM Ascend backend plugin on [<u>docs.vllm.ai</u>](https://docs.vllm.ai/en/latest/contributing/overview.html).
 If you find any problem when contributing, you can feel free to submit a PR to improve the doc to help other developers.
+
+:::{toctree}
+:caption: Index
+:maxdepth: 1
+testing
+:::