Skip to content

Commit 28e4a40

Browse files
committed
docs: refine readme to improve readability.
1 parent 21962a3 commit 28e4a40

30 files changed

+900
-1268
lines changed

README.md

Lines changed: 8 additions & 102 deletions
Original file line numberDiff line numberDiff line change
@@ -79,108 +79,13 @@ limitations under the License. -->
7979

8080
---
8181

82-
## 3. Code Architecture
83-
```
84-
├── xllm/
85-
| : main source folder
86-
│ ├── api_service/ # code for api services
87-
│ ├── core/
88-
│ │ : xllm core features folder
89-
│ │ ├── common/
90-
│ │ ├── distributed_runtime/ # code for distributed and pd serving
91-
│ │ ├── framework/ # code for execution orchestration
92-
│ │ ├── kernels/ # adaption for npu kernels adaption
93-
│ │ ├── layers/ # model layers impl
94-
│ │ ├── platform/ # adaption for various platform
95-
│ │ ├── runtime/ # code for worker and executor
96-
│ │ ├── scheduler/ # code for batch and pd scheduler
97-
│ │ └── util/
98-
│ ├── function_call # code for tool call parser
99-
│ ├── models/ # models impl
100-
│ ├── processors/ # code for vlm pre-processing
101-
│ ├── proto/ # communication protocol
102-
│ ├── pybind/ # code for python bind
103-
| └── server/ # xLLM server
104-
├── examples/ # examples of calling xLLM
105-
├── tools/ # code for npu time generations
106-
└── xllm.cpp # entrypoint of xLLM
107-
```
108-
109-
Supported models list:
110-
- DeepSeek-V3/R1
111-
- DeepSeek-R1-Distill-Qwen
112-
- Kimi-k2
113-
- Llama2/3
114-
- MiniCPM-V
115-
- MiMo-VL
116-
- Qwen2/2.5/QwQ
117-
- Qwen2.5-VL
118-
- Qwen3 / Qwen3-MoE
119-
- Qwen3-VL / Qwen3-VL-MoE
120-
- GLM4.5 / GLM4.6 / GLM-4.6V / GLM-4.7
121-
- VLM-R1
122-
123-
---
124-
125-
## 4. Quick Start
126-
#### Installation
127-
First, download the image we provide:
128-
```bash
129-
# A2 x86
130-
docker pull xllm/xllm-ai:xllm-dev-hb-rc2-x86
131-
# A2 arm
132-
docker pull xllm/xllm-ai:xllm-dev-hb-rc2-arm
133-
# A3 arm
134-
docker pull xllm/xllm-ai:xllm-dev-hc-rc2-arm
135-
# or
136-
# A2 x86
137-
docker pull quay.io/jd_xllm/xllm-ai:xllm-dev-hb-rc2-x86
138-
# A2 arm
139-
docker pull quay.io/jd_xllm/xllm-ai:xllm-dev-hb-rc2-arm
140-
# A3 arm
141-
docker pull quay.io/jd_xllm/xllm-ai:xllm-dev-hc-rc2-arm
142-
```
143-
Then create the corresponding container:
144-
```bash
145-
sudo docker run -it --ipc=host -u 0 --privileged --name mydocker --network=host --device=/dev/davinci0 --device=/dev/davinci_manager --device=/dev/devmm_svm --device=/dev/hisi_hdc -v /var/queue_schedule:/var/queue_schedule -v /usr/local/Ascend/driver:/usr/local/Ascend/driver -v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/ -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi -v /usr/local/sbin/:/usr/local/sbin/ -v /var/log/npu/conf/slog/slog.conf:/var/log/npu/conf/slog/slog.conf -v /var/log/npu/slog/:/var/log/npu/slog -v /export/home:/export/home -w /export/home -v ~/.ssh:/root/.ssh -v /var/log/npu/profiling/:/var/log/npu/profiling -v /var/log/npu/dump/:/var/log/npu/dump -v /home/:/home/ -v /runtime/:/runtime/ -v /etc/hccn.conf:/etc/hccn.conf xllm/xllm-ai:xllm-dev-hb-rc2-x86
146-
```
82+
## 3. Quick Start
14783

148-
Install official repo and submodules:
149-
```bash
150-
git clone https://github.com/jd-opensource/xllm
151-
cd xllm
152-
git submodule init
153-
git submodule update
154-
```
155-
The compilation depends on [vcpkg](https://github.com/microsoft/vcpkg). The Docker image already includes VCPKG_ROOT preconfigured. If you want to manually set it up, you can:
156-
```bash
157-
git clone https://gitcode.com/xLLM-AI/vcpkg.git
158-
cd vcpkg && git checkout ffc42e97c866ce9692f5c441394832b86548422c
159-
export VCPKG_ROOT=/your/path/to/vcpkg
160-
```
161-
162-
#### Compilation
163-
When compiling, generate executable files `build/xllm/core/server/xllm` under `build/`:
164-
```bash
165-
python setup.py build
166-
```
167-
Or, compile directly using the following command to generate the whl package under `dist/`:
168-
```bash
169-
python setup.py bdist_wheel
170-
```
171-
172-
#### Launch
173-
Run the following command to start xLLM engine:
174-
```bash
175-
./build/xllm/core/server/xllm \ # launch xllm server
176-
--model=/path/to/your/llm \ # model path(to replace with your own path)
177-
--port=9977 \ # set service port to 9977
178-
--max_memory_utilization 0.90 # set the maximal utilization of device memory
179-
```
84+
Please refer to [Quick Start](docs/en/getting_started/quick_start.md) for more details. Besides, please check the model support status at [Model Support List](docs/en/supported_models.md).
18085

18186
---
18287

183-
## 5. Contributing
88+
## 4. Contributing
18489
There are several ways you can contribute to xLLM:
18590

18691
1. Reporting Issues (Bugs & Errors)
@@ -200,14 +105,14 @@ If you have problems about development, please check our document: **[Document](
200105

201106
---
202107

203-
## 6. Community & Support
108+
## 5. Community & Support
204109
If you encounter any issues along the way, you are welcomed to submit reproducible steps and log snippets in the project's Issues area, or contact the xLLM Core team directly via your internal Slack. In addition, we have established official WeChat groups. You can access the following QR code to join. Welcome to contact us!
205110

206111
<div align="center">
207112
<img src="docs/assets/wechat_qrcode.jpg" alt="qrcode3" width="50%" />
208113
</div>
209114

210-
## 7. Acknowledgment
115+
## 6. Acknowledgment
211116

212117
This project was made possible thanks to the following open-source projects:
213118
- [ScaleLLM](https://github.com/vectorch-ai/ScaleLLM) - xLLM draws inspiration from ScaleLLM's graph construction method and references its runtime execution.
@@ -217,6 +122,7 @@ This project was made possible thanks to the following open-source projects:
217122
- [safetensors](https://github.com/huggingface/safetensors) - xLLM relies on the C binding safetensors capability.
218123
- [Partial JSON Parser](https://github.com/promplate/partial-json-parser) - Implement xLLM's C++ JSON parser with insights from Python and Go implementations.
219124
- [concurrentqueue](https://github.com/cameron314/concurrentqueue) - A fast multi-producer, multi-consumer lock-free concurrent queue for C++11.
125+
- [Flashinfer](https://github.com/flashinfer-ai/flashinfer) - High-performance NVIDIA GPU kernels.
220126

221127

222128
Thanks to the following collaborating university laboratories:
@@ -235,13 +141,13 @@ Thanks to all the following [developers](https://github.com/jd-opensource/xllm/g
235141

236142
---
237143

238-
## 8. License
144+
## 7. License
239145
[Apache License](LICENSE)
240146

241147
#### xLLM is provided by JD.com
242148
#### Thanks for your Contributions!
243149

244-
## 9. Citation
150+
## 8. Citation
245151

246152
If you think this repository is helpful to you, welcome to cite us:
247153
```

README_zh.md

Lines changed: 8 additions & 104 deletions
Original file line numberDiff line numberDiff line change
@@ -74,112 +74,15 @@ xLLM 提供了强大的智能计算能力,通过硬件系统的算力优化与
7474
- 投机推理优化,多核并行提升效率;
7575
- MoE专家的动态负载均衡,实现专家分布的高效调整。
7676

77-
7877
---
7978

80-
## 3. 代码结构
81-
```
82-
├── xllm/
83-
| : 主代码目录
84-
│ ├── api_service/ # api服务化实现
85-
│ ├── core/
86-
│ │ : xllm核心功能代码目录
87-
│ │ ├── common/
88-
│ │ ├── distributed_runtime/ # 分布式PD服务实现
89-
│ │ ├── framework/ # 引擎执行模块实现
90-
│ │ ├── kernels/ # 国产芯片kernels适配实现
91-
│ │ ├── layers/ # 模型层实现
92-
│ │ ├── platform/ # 多平台兼容层
93-
│ │ ├── runtime/ # worker/executor角色实现
94-
│ │ ├── scheduler/ # 批调度与PD调度实现
95-
│ │ └── util/
96-
│ ├── function_call # function call实现
97-
│ ├── models/ # 模型实现
98-
│ ├── processors/ # 多模态模型预处理实现
99-
│ ├── proto/ # 通信协议
100-
│ ├── pybind/ # python接口
101-
| └── server/ # xLLM服务实例
102-
├── examples/ # 服务调用示例
103-
├── tools/ # NPU Timeline生成工具
104-
└── xllm.cpp # xLLM启动入口
105-
```
79+
## 3. 快速开始
10680

107-
当前支持模型列表:
108-
- DeepSeek-V3/R1
109-
- DeepSeek-R1-Distill-Qwen
110-
- Kimi-k2
111-
- Llama2/3
112-
- MiniCPM-V
113-
- MiMo-VL
114-
- Qwen2/2.5/QwQ
115-
- Qwen2.5-VL
116-
- Qwen3 / Qwen3-MoE
117-
- Qwen3-VL / Qwen3-VL-MoE
118-
- GLM-4.5 / GLM-4.6 / GLM-4.6V / GLM-4.7
119-
- VLM-R1
120-
121-
---
122-
123-
124-
## 4. 快速开始
125-
#### 安装
126-
首先下载我们提供的镜像:
127-
```bash
128-
# A2 x86
129-
docker pull quay.io/jd_xllm/xllm-ai:xllm-dev-hb-rc2-x86
130-
# A2 arm
131-
docker pull quay.io/jd_xllm/xllm-ai:xllm-dev-hb-rc2-arm
132-
# A3 arm
133-
docker pull quay.io/jd_xllm/xllm-ai:xllm-dev-hc-rc2-arm
134-
# 或者
135-
# A2 x86
136-
docker pull xllm/xllm-ai:xllm-dev-hb-rc2-x86
137-
# A2 arm
138-
docker pull xllm/xllm-ai:xllm-dev-hb-rc2-arm
139-
# A3 arm
140-
docker pull xllm/xllm-ai:xllm-dev-hc-rc2-arm
141-
```
142-
然后创建对应的容器
143-
```bash
144-
sudo docker run -it --ipc=host -u 0 --privileged --name mydocker --network=host --device=/dev/davinci0 --device=/dev/davinci_manager --device=/dev/devmm_svm --device=/dev/hisi_hdc -v /var/queue_schedule:/var/queue_schedule -v /usr/local/Ascend/driver:/usr/local/Ascend/driver -v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/ -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi -v /usr/local/sbin/:/usr/local/sbin/ -v /var/log/npu/conf/slog/slog.conf:/var/log/npu/conf/slog/slog.conf -v /var/log/npu/slog/:/var/log/npu/slog -v /export/home:/export/home -w /export/home -v ~/.ssh:/root/.ssh -v /var/log/npu/profiling/:/var/log/npu/profiling -v /var/log/npu/dump/:/var/log/npu/dump -v /home/:/home/ -v /runtime/:/runtime/ -v /etc/hccn.conf:/etc/hccn.conf xllm/xllm-ai:xllm-dev-hb-rc2-x86
145-
```
146-
147-
下载官方仓库与模块依赖:
148-
```bash
149-
git clone https://github.com/jd-opensource/xllm
150-
cd xllm
151-
git submodule init
152-
git submodule update
153-
```
154-
编译依赖[vcpkg](https://github.com/microsoft/vcpkg),镜像中已经提前配置完成。如果您想要手动配置,可以执行如下命令:
155-
```bash
156-
git clone https://gitcode.com/xLLM-AI/vcpkg.git
157-
cd vcpkg && git checkout ffc42e97c866ce9692f5c441394832b86548422c
158-
export VCPKG_ROOT=/your/path/to/vcpkg
159-
```
160-
161-
#### 编译
162-
执行编译,在`build/`下生成可执行文件`build/xllm/core/server/xllm`
163-
```bash
164-
python setup.py build
165-
```
166-
或直接用以下命令编译在`dist/`下生成whl包:
167-
```bash
168-
python setup.py bdist_wheel
169-
```
170-
171-
#### 执行
172-
运行例如如下命令启动xllm引擎:
173-
```bash
174-
./build/xllm/core/server/xllm \ # 启动 xllm 服务器程序
175-
--model=/path/to/your/llm \ # 指定模型路径(需替换为实际路径)
176-
--port=9977 \ # 设置服务端口为 9977
177-
--max_memory_utilization 0.90 # 设置最大内存利用率为 90
178-
```
81+
请参考[快速开始文档](docs/zh/getting_started/quick_start.md)。此外,请在[模型支持列表](docs/zh/supported_models.md)查看模型支持情况。
17982

18083
---
18184

182-
## 5. 成为贡献者
85+
## 4. 成为贡献者
18386
您可以通过以下方法为 xLLM 作出贡献:
18487

18588
1. 在Issue中报告问题
@@ -199,7 +102,7 @@ python setup.py bdist_wheel
199102

200103
---
201104

202-
## 6. 社区支持
105+
## 5. 社区支持
203106
如果你在xLLM的开发或使用过程中遇到任何问题,欢迎在项目的Issue区域提交可复现的步骤或日志片段。
204107
如果您有企业内部Slack,请直接联系xLLM Core团队。另外,我们建立了官方微信群,可以访问以下二维码加入。欢迎沟通和联系我们:
205108

@@ -209,7 +112,7 @@ python setup.py bdist_wheel
209112

210113
---
211114

212-
## 7. 致谢
115+
## 6. 致谢
213116
本项目的实现得益于以下开源项目:
214117

215118
- [ScaleLLM](https://github.com/vectorch-ai/ScaleLLM) - 采用了ScaleLLM中构图方式和借鉴Runtime执行。
@@ -219,6 +122,7 @@ python setup.py bdist_wheel
219122
- [safetensors](https://github.com/huggingface/safetensors) - 依赖其c binding safetensors能力。
220123
- [Partial JSON Parser](https://github.com/promplate/partial-json-parser) - xLLM的C++版本JSON解析器,参考Python与Go实现的设计思路。
221124
- [concurrentqueue](https://github.com/cameron314/concurrentqueue) - 高性能无锁Queue.
125+
- [Flashinfer](https://github.com/flashinfer-ai/flashinfer) - 高性能NVIDIA GPU算子。
222126

223127
感谢以下合作的高校实验室:
224128

@@ -236,14 +140,14 @@ python setup.py bdist_wheel
236140

237141
---
238142

239-
## 8. 许可证
143+
## 7. 许可证
240144

241145
[Apache License](LICENSE)
242146

243147
#### xLLM 由 JD.com 提供
244148
#### 感谢您对xLLM的关心与贡献!
245149

246-
## 9. 引用
150+
## 8. 引用
247151

248152
如果你觉得这个仓库对你有帮助,欢迎引用我们:
249153
```

docs/en/dev_guide/code-arch.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Code Architecture
2+
3+
```
4+
├── xllm/
5+
| : main source folder
6+
│ ├── api_service/ # code for api services
7+
│ ├── core/
8+
│ │ : xllm core features folder
9+
│ │ ├── common/
10+
│ │ ├── distributed_runtime/ # code for distributed and pd serving
11+
│ │ ├── framework/ # code for execution orchestration
12+
│ │ ├── kernels/ # adaption for npu kernels adaption
13+
│ │ ├── layers/ # model layers impl
14+
│ │ ├── platform/ # adaption for various platform
15+
│ │ ├── runtime/ # code for worker and executor
16+
│ │ ├── scheduler/ # code for batch and pd scheduler
17+
│ │ └── util/
18+
│ ├── function_call # code for tool call parser
19+
│ ├── models/ # models impl
20+
│ ├── processors/ # code for vlm pre-processing
21+
│ ├── proto/ # communication protocol
22+
│ ├── pybind/ # code for python bind
23+
| └── server/ # xLLM server
24+
├── examples/ # examples of calling xLLM
25+
├── tools/ # code for npu time generations
26+
└── xllm.cpp # entrypoint of xLLM
27+
```

docs/en/features/disagg_pd.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,8 @@ ENABLE_DECODE_RESPONSE_TO_SERVICE=true ./xllm_master_serving --etcd_addr="127.0.
3838
3. Start xLLM
3939
- Taking Qwen2-7B as an example
4040
- Start Prefill Instance
41-
``` shell linenums="1" hl_lines="10"
42-
./xllm --model=Qwen2-7B-Instruct \
41+
```bash
42+
/path/to/xllm --model=Qwen2-7B-Instruct \
4343
--port=8010 \
4444
--devices="npu:0" \
4545
--master_node_addr="127.0.0.1:18888" \
@@ -54,8 +54,8 @@ ENABLE_DECODE_RESPONSE_TO_SERVICE=true ./xllm_master_serving --etcd_addr="127.0.
5454
--nnodes=1
5555
```
5656
- Start Decode Instance
57-
```shell linenums="1" hl_lines="11"
58-
./xllm --model=Qwen2-7B-Instruct \
57+
```bash
58+
/path/to/xllm --model=Qwen2-7B-Instruct \
5959
--port=8020 \
6060
--devices="npu:1" \
6161
--master_node_addr="127.0.0.1:18898" \

0 commit comments

Comments
 (0)