Skip to content

Commit 2e76454

Browse files
weizhehuang0827liutongxuan
authored andcommitted
docs: remove figure and update mkdoc config.
1 parent fc21707 commit 2e76454

File tree

6 files changed

+9
-23
lines changed

6 files changed

+9
-23
lines changed
-383 KB
Binary file not shown.

docs/assets/Qwen3_performance.png

-742 KB
Binary file not shown.

docs/en/features/overview.md

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# Overall Architecture
22

3+
## Backgroud
4+
35
In recent years, with the groundbreaking progress of large language models (LLMs) ranging from tens of billions to trillions of parameters (such as GPT, Claude, DeepSeek, LLaMA, etc.) in the fields of natural language processing and multimodal interaction, the industry has an urgent need for efficient inference engines and service systems. How to reduce cluster inference costs and improve inference efficiency has become a key challenge for achieving large-scale commercial deployment.
46

57
Although a number of optimization engines for large model inference have emerged, several technical bottlenecks remain in practical deployment:
@@ -47,12 +49,3 @@ xLLM implements expert weight updates based on historical expert load statistics
4749
### Multimodal Support
4850

4951
xLLM provides comprehensive support for various multimodal models, including Qwen2-VL and MiniCPMV.
50-
51-
## Performance Results
52-
53-
![1](../../assets/DeepSeek-R1_performance.png)
54-
The figure above shows a comparison of throughput for the DeepSeek-R1-w8a8 model across different inference frameworks under benchmark conditions. Across different combinations of prompt length and output length ([2048,2048] and [2500,1500]) and TPOT settings (50ms and 100ms), xLLM consistently demonstrates the highest throughput. Specifically, under various experimental conditions, xLLM achieves a **throughput increase ranging from 5.6x to 15.7x** compared to vLLM.
55-
56-
![2](../../assets/Qwen3_performance.png)
57-
58-
The figure above shows a comparison of throughput for various versions of the Qwen3 model across different inference frameworks under benchmark conditions. The input and output lengths are both set to 2048, and TOPT is set to 50ms. The results indicate that xLLM consistently delivers the best throughput, both across different versions of the Qwen3 model and as the number of accelerator cards changes. Specifically, compared to vLLM, xLLM achieves an **average performance improvement ranging from 27% to 186%**.

docs/zh/features/overview.md

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# 整体架构
22

3+
## 背景
4+
35
近年来,随着百亿至万亿参数规模的大语言模型(如GPT、Claude、DeepSeek、LLaMA等)在自然语言处理和多模态交互领域取得突破性进展,产业界对高效推理引擎与服务体系的构建提出了迫切需求。如何降低集群推理成本、提升推理效率已成为实现规模化商业落地的关键挑战。
46

57
尽管当前已涌现出一批面向大模型推理的优化引擎,但在实际部署过程中仍面临诸多技术瓶颈:
@@ -48,12 +50,3 @@ xLLM针对MoE模型实现了基于历史专家负载统计的专家权重更新
4850
### 多模态支持
4951

5052
xLLM对包括Qwen2-VL,MiniCPMV在内的多种多模态模型提供全面的支持。
51-
52-
## 性能效果
53-
54-
![1](../../assets/DeepSeek-R1_performance.png)
55-
上图展示了不同推理框架在benchmark下对DeepSeek-R1-w8a8模型的吞吐量比较。在不同的提示长度与输出长度组合([2048,2048][2500,1500])以及TPOT(50ms和100ms)设置下,xLLM始终表现出最高的吞吐量。具体来说,在不同实验设置条件下,xLLM相比vLLM的 **吞吐量增长5.6倍至15.7倍**
56-
57-
![2](../../assets/Qwen3_performance.png)
58-
59-
上图展示了不同推理框架在benchmark下,针对Qwen3模型各版本的吞吐量对比。图中的输入和输出长度均设为2048,TOPT为50ms。从结果可以看出,无论是对于Qwen3模型的不同版本,还是随着加速卡数量的变化,xLLM始终表现出最优的吞吐量。具体来说,xLLM相对于vLLM,其平均性能 **提升幅度可达到27%-186%**

mkdocs_en.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ site_url: !ENV READTHEDOCS_CANONICAL_URL
55
repo_name: jd-opensource/xllm
66
repo_url: https://github.com/jd-opensource/xllm
77

8-
edit_uri: edit/main/docs/
9-
8+
edit_uri: edit/main/docs/en
9+
use_directory_urls: true
1010
# Copyright
1111
copyright: Copyright © 2025 xLLM Team
1212
# docs_dir: docs/en
@@ -99,7 +99,7 @@ extra_css:
9999
# Additional configuration
100100
extra:
101101
source:
102-
path: https://github.com/jd-opensource/xllm/blob/main/docs/
102+
path: https://github.com/jd-opensource/xllm/blob/main/docs/en/
103103
status:
104104
new: Recently added
105105
deprecated: Deprecated

mkdocs_zh.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ site_url: !ENV READTHEDOCS_CANONICAL_URL
55
repo_name: jd-opensource/xllm
66
repo_url: https://github.com/jd-opensource/xllm
77

8-
edit_uri: edit/main/docs/
8+
edit_uri: edit/main/docs/zh
99
use_directory_urls: true
1010
# Copyright
1111
copyright: Copyright © 2025 xLLM Team
@@ -100,7 +100,7 @@ extra_css:
100100
# Additional configuration
101101
extra:
102102
source:
103-
path: https://github.com/jd-opensource/xllm/blob/main/docs/
103+
path: https://github.com/jd-opensource/xllm/blob/main/docs/zh/
104104

105105
status:
106106
new: Recently added

0 commit comments

Comments
 (0)