docs: remove figure and update mkdoc config.

weizhehuang0827 · liutongxuan · commit 2e764544975c · 2025-09-05T12:03:44.000+08:00
diff --git a/docs/assets/DeepSeek-R1_performance.png b/docs/assets/DeepSeek-R1_performance.png
diff --git a/docs/assets/Qwen3_performance.png b/docs/assets/Qwen3_performance.png
diff --git a/docs/en/features/overview.md b/docs/en/features/overview.md
@@ -1,5 +1,7 @@
 # Overall Architecture
 
+## Backgroud
+
 In recent years, with the groundbreaking progress of large language models (LLMs) ranging from tens of billions to trillions of parameters (such as GPT, Claude, DeepSeek, LLaMA, etc.) in the fields of natural language processing and multimodal interaction, the industry has an urgent need for efficient inference engines and service systems. How to reduce cluster inference costs and improve inference efficiency has become a key challenge for achieving large-scale commercial deployment.
 
 Although a number of optimization engines for large model inference have emerged, several technical bottlenecks remain in practical deployment:
@@ -47,12 +49,3 @@ xLLM implements expert weight updates based on historical expert load statistics
 ### Multimodal Support
 
 xLLM provides comprehensive support for various multimodal models, including Qwen2-VL and MiniCPMV.
-
-## Performance Results
-
-![1](../../assets/DeepSeek-R1_performance.png)
-The figure above shows a comparison of throughput for the DeepSeek-R1-w8a8 model across different inference frameworks under benchmark conditions. Across different combinations of prompt length and output length ([2048,2048] and [2500,1500]) and TPOT settings (50ms and 100ms), xLLM consistently demonstrates the highest throughput. Specifically, under various experimental conditions, xLLM achieves a **throughput increase ranging from 5.6x to 15.7x** compared to vLLM.
-
-![2](../../assets/Qwen3_performance.png)
-
-The figure above shows a comparison of throughput for various versions of the Qwen3 model across different inference frameworks under benchmark conditions. The input and output lengths are both set to 2048, and TOPT is set to 50ms. The results indicate that xLLM consistently delivers the best throughput, both across different versions of the Qwen3 model and as the number of accelerator cards changes. Specifically, compared to vLLM, xLLM achieves an **average performance improvement ranging from 27% to 186%**.
diff --git a/docs/zh/features/overview.md b/docs/zh/features/overview.md
@@ -1,5 +1,7 @@
 # 整体架构
 
+## 背景
+
 近年来，随着百亿至万亿参数规模的大语言模型（如GPT、Claude、DeepSeek、LLaMA等）在自然语言处理和多模态交互领域取得突破性进展，产业界对高效推理引擎与服务体系的构建提出了迫切需求。如何降低集群推理成本、提升推理效率已成为实现规模化商业落地的关键挑战。
 
 尽管当前已涌现出一批面向大模型推理的优化引擎，但在实际部署过程中仍面临诸多技术瓶颈：
@@ -48,12 +50,3 @@ xLLM针对MoE模型实现了基于历史专家负载统计的专家权重更新
 ### 多模态支持
 
 xLLM对包括Qwen2-VL，MiniCPMV在内的多种多模态模型提供全面的支持。
-
-## 性能效果
-
-![1](../../assets/DeepSeek-R1_performance.png)
-上图展示了不同推理框架在benchmark下对DeepSeek-R1-w8a8模型的吞吐量比较。在不同的提示长度与输出长度组合（[2048,2048]和[2500,1500]）以及TPOT（50ms和100ms）设置下，xLLM始终表现出最高的吞吐量。具体来说，在不同实验设置条件下，xLLM相比vLLM的 **吞吐量增长5.6倍至15.7倍**。
-
-![2](../../assets/Qwen3_performance.png)
-
-上图展示了不同推理框架在benchmark下，针对Qwen3模型各版本的吞吐量对比。图中的输入和输出长度均设为2048，TOPT为50ms。从结果可以看出，无论是对于Qwen3模型的不同版本，还是随着加速卡数量的变化，xLLM始终表现出最优的吞吐量。具体来说，xLLM相对于vLLM，其平均性能 **提升幅度可达到27%-186%**。
diff --git a/mkdocs_en.yml b/mkdocs_en.yml
@@ -5,8 +5,8 @@ site_url: !ENV READTHEDOCS_CANONICAL_URL
 repo_name: jd-opensource/xllm
 repo_url: https://github.com/jd-opensource/xllm
 
-edit_uri: edit/main/docs/
-
+edit_uri: edit/main/docs/en
+use_directory_urls: true
 # Copyright
 copyright: Copyright &copy; 2025 xLLM Team
 # docs_dir: docs/en
@@ -99,7 +99,7 @@ extra_css:
 # Additional configuration
 extra:
   source:
-    path: https://github.com/jd-opensource/xllm/blob/main/docs/
+    path: https://github.com/jd-opensource/xllm/blob/main/docs/en/
   status:
     new: Recently added
     deprecated: Deprecated
diff --git a/mkdocs_zh.yml b/mkdocs_zh.yml
@@ -5,7 +5,7 @@ site_url: !ENV READTHEDOCS_CANONICAL_URL
 repo_name: jd-opensource/xllm
 repo_url: https://github.com/jd-opensource/xllm
 
-edit_uri: edit/main/docs/
+edit_uri: edit/main/docs/zh
 use_directory_urls: true
 # Copyright
 copyright: Copyright &copy; 2025 xLLM Team
@@ -100,7 +100,7 @@ extra_css:
 # Additional configuration
 extra:
   source:
-    path: https://github.com/jd-opensource/xllm/blob/main/docs/
+    path: https://github.com/jd-opensource/xllm/blob/main/docs/zh/
 
   status:
     new: Recently added