Skip to content

Commit ca5fa46

Browse files
committed
docs: refine readme to improve readability.
1 parent 21962a3 commit ca5fa46

28 files changed

+832
-1189
lines changed

README.md

Lines changed: 3 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -106,77 +106,13 @@ limitations under the License. -->
106106
└── xllm.cpp # entrypoint of xLLM
107107
```
108108

109-
Supported models list:
110-
- DeepSeek-V3/R1
111-
- DeepSeek-R1-Distill-Qwen
112-
- Kimi-k2
113-
- Llama2/3
114-
- MiniCPM-V
115-
- MiMo-VL
116-
- Qwen2/2.5/QwQ
117-
- Qwen2.5-VL
118-
- Qwen3 / Qwen3-MoE
119-
- Qwen3-VL / Qwen3-VL-MoE
120-
- GLM4.5 / GLM4.6 / GLM-4.6V / GLM-4.7
121-
- VLM-R1
109+
Please check the model support status at [Model Support List](docs/en/model_list.md).
122110

123111
---
124112

125113
## 4. Quick Start
126-
#### Installation
127-
First, download the image we provide:
128-
```bash
129-
# A2 x86
130-
docker pull xllm/xllm-ai:xllm-dev-hb-rc2-x86
131-
# A2 arm
132-
docker pull xllm/xllm-ai:xllm-dev-hb-rc2-arm
133-
# A3 arm
134-
docker pull xllm/xllm-ai:xllm-dev-hc-rc2-arm
135-
# or
136-
# A2 x86
137-
docker pull quay.io/jd_xllm/xllm-ai:xllm-dev-hb-rc2-x86
138-
# A2 arm
139-
docker pull quay.io/jd_xllm/xllm-ai:xllm-dev-hb-rc2-arm
140-
# A3 arm
141-
docker pull quay.io/jd_xllm/xllm-ai:xllm-dev-hc-rc2-arm
142-
```
143-
Then create the corresponding container:
144-
```bash
145-
sudo docker run -it --ipc=host -u 0 --privileged --name mydocker --network=host --device=/dev/davinci0 --device=/dev/davinci_manager --device=/dev/devmm_svm --device=/dev/hisi_hdc -v /var/queue_schedule:/var/queue_schedule -v /usr/local/Ascend/driver:/usr/local/Ascend/driver -v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/ -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi -v /usr/local/sbin/:/usr/local/sbin/ -v /var/log/npu/conf/slog/slog.conf:/var/log/npu/conf/slog/slog.conf -v /var/log/npu/slog/:/var/log/npu/slog -v /export/home:/export/home -w /export/home -v ~/.ssh:/root/.ssh -v /var/log/npu/profiling/:/var/log/npu/profiling -v /var/log/npu/dump/:/var/log/npu/dump -v /home/:/home/ -v /runtime/:/runtime/ -v /etc/hccn.conf:/etc/hccn.conf xllm/xllm-ai:xllm-dev-hb-rc2-x86
146-
```
147-
148-
Install official repo and submodules:
149-
```bash
150-
git clone https://github.com/jd-opensource/xllm
151-
cd xllm
152-
git submodule init
153-
git submodule update
154-
```
155-
The compilation depends on [vcpkg](https://github.com/microsoft/vcpkg). The Docker image already includes VCPKG_ROOT preconfigured. If you want to manually set it up, you can:
156-
```bash
157-
git clone https://gitcode.com/xLLM-AI/vcpkg.git
158-
cd vcpkg && git checkout ffc42e97c866ce9692f5c441394832b86548422c
159-
export VCPKG_ROOT=/your/path/to/vcpkg
160-
```
161114

162-
#### Compilation
163-
When compiling, generate executable files `build/xllm/core/server/xllm` under `build/`:
164-
```bash
165-
python setup.py build
166-
```
167-
Or, compile directly using the following command to generate the whl package under `dist/`:
168-
```bash
169-
python setup.py bdist_wheel
170-
```
171-
172-
#### Launch
173-
Run the following command to start xLLM engine:
174-
```bash
175-
./build/xllm/core/server/xllm \ # launch xllm server
176-
--model=/path/to/your/llm \ # model path(to replace with your own path)
177-
--port=9977 \ # set service port to 9977
178-
--max_memory_utilization 0.90 # set the maximal utilization of device memory
179-
```
115+
Please refer to [Quick Start](docs/en/getting_started/quick_start.md) for more details.
180116

181117
---
182118

@@ -217,6 +153,7 @@ This project was made possible thanks to the following open-source projects:
217153
- [safetensors](https://github.com/huggingface/safetensors) - xLLM relies on the C binding safetensors capability.
218154
- [Partial JSON Parser](https://github.com/promplate/partial-json-parser) - Implement xLLM's C++ JSON parser with insights from Python and Go implementations.
219155
- [concurrentqueue](https://github.com/cameron314/concurrentqueue) - A fast multi-producer, multi-consumer lock-free concurrent queue for C++11.
156+
- [Flashinfer](https://github.com/flashinfer-ai/flashinfer) - High-performance NVIDIA GPU kernels.
220157

221158

222159
Thanks to the following collaborating university laboratories:

README_zh.md

Lines changed: 3 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -104,78 +104,13 @@ xLLM 提供了强大的智能计算能力,通过硬件系统的算力优化与
104104
└── xllm.cpp # xLLM启动入口
105105
```
106106

107-
当前支持模型列表:
108-
- DeepSeek-V3/R1
109-
- DeepSeek-R1-Distill-Qwen
110-
- Kimi-k2
111-
- Llama2/3
112-
- MiniCPM-V
113-
- MiMo-VL
114-
- Qwen2/2.5/QwQ
115-
- Qwen2.5-VL
116-
- Qwen3 / Qwen3-MoE
117-
- Qwen3-VL / Qwen3-VL-MoE
118-
- GLM-4.5 / GLM-4.6 / GLM-4.6V / GLM-4.7
119-
- VLM-R1
107+
请在[模型支持列表](docs/zh/model_list.md)查看模型支持情况。
120108

121109
---
122110

123-
124111
## 4. 快速开始
125-
#### 安装
126-
首先下载我们提供的镜像:
127-
```bash
128-
# A2 x86
129-
docker pull quay.io/jd_xllm/xllm-ai:xllm-dev-hb-rc2-x86
130-
# A2 arm
131-
docker pull quay.io/jd_xllm/xllm-ai:xllm-dev-hb-rc2-arm
132-
# A3 arm
133-
docker pull quay.io/jd_xllm/xllm-ai:xllm-dev-hc-rc2-arm
134-
# 或者
135-
# A2 x86
136-
docker pull xllm/xllm-ai:xllm-dev-hb-rc2-x86
137-
# A2 arm
138-
docker pull xllm/xllm-ai:xllm-dev-hb-rc2-arm
139-
# A3 arm
140-
docker pull xllm/xllm-ai:xllm-dev-hc-rc2-arm
141-
```
142-
然后创建对应的容器
143-
```bash
144-
sudo docker run -it --ipc=host -u 0 --privileged --name mydocker --network=host --device=/dev/davinci0 --device=/dev/davinci_manager --device=/dev/devmm_svm --device=/dev/hisi_hdc -v /var/queue_schedule:/var/queue_schedule -v /usr/local/Ascend/driver:/usr/local/Ascend/driver -v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/ -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi -v /usr/local/sbin/:/usr/local/sbin/ -v /var/log/npu/conf/slog/slog.conf:/var/log/npu/conf/slog/slog.conf -v /var/log/npu/slog/:/var/log/npu/slog -v /export/home:/export/home -w /export/home -v ~/.ssh:/root/.ssh -v /var/log/npu/profiling/:/var/log/npu/profiling -v /var/log/npu/dump/:/var/log/npu/dump -v /home/:/home/ -v /runtime/:/runtime/ -v /etc/hccn.conf:/etc/hccn.conf xllm/xllm-ai:xllm-dev-hb-rc2-x86
145-
```
146-
147-
下载官方仓库与模块依赖:
148-
```bash
149-
git clone https://github.com/jd-opensource/xllm
150-
cd xllm
151-
git submodule init
152-
git submodule update
153-
```
154-
编译依赖[vcpkg](https://github.com/microsoft/vcpkg),镜像中已经提前配置完成。如果您想要手动配置,可以执行如下命令:
155-
```bash
156-
git clone https://gitcode.com/xLLM-AI/vcpkg.git
157-
cd vcpkg && git checkout ffc42e97c866ce9692f5c441394832b86548422c
158-
export VCPKG_ROOT=/your/path/to/vcpkg
159-
```
160112

161-
#### 编译
162-
执行编译,在`build/`下生成可执行文件`build/xllm/core/server/xllm`
163-
```bash
164-
python setup.py build
165-
```
166-
或直接用以下命令编译在`dist/`下生成whl包:
167-
```bash
168-
python setup.py bdist_wheel
169-
```
170-
171-
#### 执行
172-
运行例如如下命令启动xllm引擎:
173-
```bash
174-
./build/xllm/core/server/xllm \ # 启动 xllm 服务器程序
175-
--model=/path/to/your/llm \ # 指定模型路径(需替换为实际路径)
176-
--port=9977 \ # 设置服务端口为 9977
177-
--max_memory_utilization 0.90 # 设置最大内存利用率为 90
178-
```
113+
请参考[快速开始文档](docs/zh/getting_started/quick_start.md)
179114

180115
---
181116

@@ -219,6 +154,7 @@ python setup.py bdist_wheel
219154
- [safetensors](https://github.com/huggingface/safetensors) - 依赖其c binding safetensors能力。
220155
- [Partial JSON Parser](https://github.com/promplate/partial-json-parser) - xLLM的C++版本JSON解析器,参考Python与Go实现的设计思路。
221156
- [concurrentqueue](https://github.com/cameron314/concurrentqueue) - 高性能无锁Queue.
157+
- [Flashinfer](https://github.com/flashinfer-ai/flashinfer) - 高性能NVIDIA GPU算子。
222158

223159
感谢以下合作的高校实验室:
224160

docs/en/features/disagg_pd.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,8 @@ ENABLE_DECODE_RESPONSE_TO_SERVICE=true ./xllm_master_serving --etcd_addr="127.0.
3838
3. Start xLLM
3939
- Taking Qwen2-7B as an example
4040
- Start Prefill Instance
41-
``` shell linenums="1" hl_lines="10"
42-
./xllm --model=Qwen2-7B-Instruct \
41+
```bash
42+
/path/to/xllm --model=Qwen2-7B-Instruct \
4343
--port=8010 \
4444
--devices="npu:0" \
4545
--master_node_addr="127.0.0.1:18888" \
@@ -54,8 +54,8 @@ ENABLE_DECODE_RESPONSE_TO_SERVICE=true ./xllm_master_serving --etcd_addr="127.0.
5454
--nnodes=1
5555
```
5656
- Start Decode Instance
57-
```shell linenums="1" hl_lines="11"
58-
./xllm --model=Qwen2-7B-Instruct \
57+
```bash
58+
/path/to/xllm --model=Qwen2-7B-Instruct \
5959
--port=8020 \
6060
--devices="npu:1" \
6161
--master_node_addr="127.0.0.1:18898" \

docs/en/getting_started/compile.md

Lines changed: 0 additions & 58 deletions
This file was deleted.

docs/en/getting_started/disagg_pd.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -58,8 +58,8 @@ ENABLE_DECODE_RESPONSE_TO_SERVICE=true ./xllm_master_serving --etcd_addr="127.0.
5858
Taking Qwen2-7B as an example:
5959

6060
- Start Prefill Instance
61-
``` shell linenums="1" hl_lines="3 9 10"
62-
./xllm --model=path/to/Qwen2-7B-Instruct \
61+
```bash
62+
/path/to/xllm --model=path/to/Qwen2-7B-Instruct \
6363
--port=8010 \
6464
--devices="npu:0" \
6565
--master_node_addr="127.0.0.1:18888" \
@@ -74,8 +74,8 @@ Taking Qwen2-7B as an example:
7474
--nnodes=1
7575
```
7676
- Start Decode Instance
77-
``` shell linenums="1" hl_lines="3 9 10"
78-
./xllm --model=path/to/Qwen2-7B-Instruct \
77+
```bash
78+
/path/to/xllm --model=path/to/Qwen2-7B-Instruct \
7979
--port=8020 \
8080
--devices="npu:1" \
8181
--master_node_addr="127.0.0.1:18898" \

0 commit comments

Comments
 (0)