You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2025-05-12-hardware-plugin.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ author: "The Ascend Team on vLLM"
5
5
image: /assets/logos/vllm-logo-only-light.png
6
6
---
7
7
8
-
Since December 2024, through the joint efforts of the vLLM community and the Ascend team on vLLM, we have completed the [Hardware Pluggable RFC]((https://github.com/vllm-project/vllm/issues/11162)). This proposal allows hardware integration into vLLM in a decoupled manner, enabling rapid and modular support for different hardware platforms.
8
+
Since December 2024, through the joint efforts of the vLLM community and the Ascend team on vLLM, we have completed the [Hardware Pluggable RFC](https://github.com/vllm-project/vllm/issues/11162). This proposal allows hardware integration into vLLM in a decoupled manner, enabling rapid and modular support for different hardware platforms.
9
9
10
10
---
11
11
@@ -34,17 +34,17 @@ Before introducing the vLLM Hardware Plugin, let's first look at two prerequisit
34
34
35
35
Based on these RFCs, we proposed [[RFC] Hardware Pluggable](https://github.com/vllm-project/vllm/issues/11162), which integrates the `Platform` module into vLLM as a plugin. Additionally, we refactored `Executor`, `Worker`, `ModelRunner`, `AttentionBackend`, and `Communicator` to support hardware plugins more flexibly.
36
36
37
-
Currently, vLLM community has successfully implemented the Platform module introduced in the RFC. The functionality is validated through the [vllm-project/vllm-ascend](https://github.com/vllm-project/vllm-ascend) and [vllm-project/vllm-spyre](https://github.com/vllm-project/vllm-spyre) projects. Using this plugin mechanism, we successfully integrated vLLM with the Ascend NPU and IBM Spyre backends.
37
+
Currently, the vLLM community has successfully implemented the Platform module introduced in the RFC. The functionality is validated through the [vllm-project/vllm-ascend](https://github.com/vllm-project/vllm-ascend) and [vllm-project/vllm-spyre](https://github.com/vllm-project/vllm-spyre) projects. Using this plugin mechanism, we successfully integrated vLLM with the Ascend NPU and IBM Spyre backends.
38
38
39
39
---
40
40
41
41
## How to Integrate a New Backend via vLLM Hardware Plugin Mechanism
42
42
43
-
This section will dive into integrating a New Backend via the Hardware Plugin in both developer and user perspective.
43
+
This section will dive into integrating a new backend via the hardware plugin in both developer and user perspective.
44
44
45
45
### Developer Perspective
46
46
47
-
To integrate a new backend into vLLM using the Hardware Plugin, follow these steps:
47
+
To integrate a new backend into vLLM using the hardware plugin, follow these steps:
48
48
49
49
#### Step 1: Create a New Project and Initialize the Platform
50
50
@@ -67,7 +67,7 @@ Each of these classes has a corresponding base class in vLLM. Again, you can ref
67
67
68
68
#### Step 3: Register the Plugin
69
69
70
-
Register the plugin in `setup.py` using entrypoint mechanism of python:
70
+
Register the plugin in `setup.py` using the entrypoint mechanism of python:
71
71
72
72
```python
73
73
setup(
@@ -85,7 +85,7 @@ Refer to [`setup.py`](https://github.com/vllm-project/vllm-ascend/blob/72a43a61d
85
85
86
86
### User Perspective
87
87
88
-
Only need to install vllm and your plugin before running, taking [vllm-ascend](https://github.com/vllm-project/vllm-ascend) as an example:
88
+
Users only need to install vllm and your plugin before running, taking [vllm-ascend](https://github.com/vllm-project/vllm-ascend) as an example:
89
89
90
90
```bash
91
91
pip install vllm vllm-ascend
@@ -117,4 +117,4 @@ We encourage everyone to try out this new feature! If you have any questions, jo
117
117
118
118
## Acknowledgements
119
119
120
-
This flexible hardware backend plugin mechanism would not have been possible without the efforts contributed by a lot of vLLM contributors. Thus we are deeply grateful to the vLLM maintainers, including [Kaichao You](https://github.com/youkaichao), [Simon Mo](https://github.com/simon-mo), [Cyrus Leung](https://github.com/DarkLight1337), [Robert Shaw](https://github.com/robertgshaw2-redhat), [Michael Goin](https://github.com/mgoin) and [Jie Li](https://github.com/jeejeelee) for related refactor, deep discussion and quick review, [Xiyuan Wang](https://github.com/wangxiyuan), [Shanshan Shen](https://github.com/shen-shanshan), [Chenguang Li](https://github.com/noemotiovon) and [Mengqing Cao](https://github.com/MengqingCao) from the Ascend team on vLLM for mechanism design and implementation, [Joe Runde](https://github.com/joerunde) and [Yannick Schnider](https://github.com/yannicks1) from the Spyre team on vLLM for pluggable scheduler design and implementation, and other contributors, including [yancong](https://github.com/ice-tong) for extendable quantization method design and implementation, [Aviv Keshet](https://github.com/akeshet) for extendable `SamplingParams`.
120
+
This flexible hardware backend plugin mechanism would not have been possible without the efforts of many vLLM contributors. Thus we are deeply grateful to the vLLM maintainers, including [Kaichao You](https://github.com/youkaichao), [Simon Mo](https://github.com/simon-mo), [Cyrus Leung](https://github.com/DarkLight1337), [Robert Shaw](https://github.com/robertgshaw2-redhat), [Michael Goin](https://github.com/mgoin) and [Jie Li](https://github.com/jeejeelee) for related refactor, deep discussion and quick review, [Xiyuan Wang](https://github.com/wangxiyuan), [Shanshan Shen](https://github.com/shen-shanshan), [Chenguang Li](https://github.com/noemotiovon) and [Mengqing Cao](https://github.com/MengqingCao) from the Ascend team on vLLM for mechanism design and implementation, [Joe Runde](https://github.com/joerunde) and [Yannick Schnider](https://github.com/yannicks1) from the Spyre team on vLLM for pluggable scheduler design and implementation, and other contributors, including [yancong](https://github.com/ice-tong) for extendable quantization method design and implementation, [Aviv Keshet](https://github.com/akeshet) for extendable `SamplingParams`.
0 commit comments