jd-opensource
diff --git a/‎README.md‎
Lines changed: 3 additions & 66 deletions b/‎README.md‎
Lines changed: 3 additions & 66 deletions
diff --git a/‎README_zh.md‎
Lines changed: 3 additions & 67 deletions b/‎README_zh.md‎
Lines changed: 3 additions & 67 deletions
diff --git a/‎docs/en/features/disagg_pd.md‎
Lines changed: 4 additions & 4 deletions b/‎docs/en/features/disagg_pd.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/en/getting_started/compile.md‎
Lines changed: 0 additions & 58 deletions b/‎docs/en/getting_started/compile.md‎
Lines changed: 0 additions & 58 deletions
diff --git a/‎docs/en/getting_started/disagg_pd.md‎
Lines changed: 4 additions & 4 deletions b/‎docs/en/getting_started/disagg_pd.md‎
Lines changed: 4 additions & 4 deletions
@@ -106,77 +106,13 @@ limitations under the License. -->
 └── xllm.cpp                       # entrypoint of xLLM
 ```
 
-Supported models list:
-- DeepSeek-V3/R1
-- DeepSeek-R1-Distill-Qwen
-- Kimi-k2
-- Llama2/3
-- MiniCPM-V
-- MiMo-VL
-- Qwen2/2.5/QwQ
-- Qwen2.5-VL
-- Qwen3 / Qwen3-MoE
-- Qwen3-VL / Qwen3-VL-MoE
-- GLM4.5 / GLM4.6 / GLM-4.6V / GLM-4.7
-- VLM-R1
+Please check the model support status at [Model Support List](docs/en/model_list.md).
 
 ---
 
 ## 4. Quick Start
-#### Installation
-First, download the image we provide:
-```bash
-# A2 x86
-docker pull xllm/xllm-ai:xllm-dev-hb-rc2-x86
-# A2 arm
-docker pull xllm/xllm-ai:xllm-dev-hb-rc2-arm
-# A3 arm
-docker pull xllm/xllm-ai:xllm-dev-hc-rc2-arm
-# or
-# A2 x86
-docker pull quay.io/jd_xllm/xllm-ai:xllm-dev-hb-rc2-x86
-# A2 arm
-docker pull quay.io/jd_xllm/xllm-ai:xllm-dev-hb-rc2-arm
-# A3 arm
-docker pull quay.io/jd_xllm/xllm-ai:xllm-dev-hc-rc2-arm
-```
-Then create the corresponding container:
-```bash
-sudo docker run -it --ipc=host -u 0 --privileged --name mydocker --network=host  --device=/dev/davinci0  --device=/dev/davinci_manager --device=/dev/devmm_svm --device=/dev/hisi_hdc -v /var/queue_schedule:/var/queue_schedule -v /usr/local/Ascend/driver:/usr/local/Ascend/driver -v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/ -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi -v /usr/local/sbin/:/usr/local/sbin/ -v /var/log/npu/conf/slog/slog.conf:/var/log/npu/conf/slog/slog.conf -v /var/log/npu/slog/:/var/log/npu/slog -v /export/home:/export/home -w /export/home -v ~/.ssh:/root/.ssh  -v /var/log/npu/profiling/:/var/log/npu/profiling -v /var/log/npu/dump/:/var/log/npu/dump -v /home/:/home/  -v /runtime/:/runtime/ -v /etc/hccn.conf:/etc/hccn.conf xllm/xllm-ai:xllm-dev-hb-rc2-x86
-```
-
-Install official repo and submodules：
-```bash
-git clone https://github.com/jd-opensource/xllm
-cd xllm 
-git submodule init
-git submodule update
-```
-The compilation depends on [vcpkg](https://github.com/microsoft/vcpkg). The Docker image already includes VCPKG_ROOT preconfigured. If you want to manually set it up, you can:
-```bash
-git clone https://gitcode.com/xLLM-AI/vcpkg.git
-cd vcpkg && git checkout ffc42e97c866ce9692f5c441394832b86548422c
-export VCPKG_ROOT=/your/path/to/vcpkg
-```
 
-#### Compilation
-When compiling, generate executable files `build/xllm/core/server/xllm` under `build/`:
-```bash
-python setup.py build
-```
-Or, compile directly using the following command to generate the whl package under `dist/`:
-```bash
-python setup.py bdist_wheel
-```
-
-#### Launch
-Run the following command to start xLLM engine: 
-```bash
-./build/xllm/core/server/xllm \    # launch xllm server
-    --model=/path/to/your/llm  \   # model path（to replace with your own path）
-    --port=9977 \                  # set service port to 9977
-    --max_memory_utilization 0.90  # set the maximal utilization of device memory
-```
+Please refer to [Quick Start](docs/en/getting_started/quick_start.md) for more details.
 
 --- 
 
@@ -217,6 +153,7 @@ This project was made possible thanks to the following open-source projects:
 - [safetensors](https://github.com/huggingface/safetensors) - xLLM relies on the C binding safetensors capability.
 - [Partial JSON Parser](https://github.com/promplate/partial-json-parser) - Implement xLLM's C++ JSON parser with insights from Python and Go implementations.
 - [concurrentqueue](https://github.com/cameron314/concurrentqueue) - A fast multi-producer, multi-consumer lock-free concurrent queue for C++11.
+- [Flashinfer](https://github.com/flashinfer-ai/flashinfer) - High-performance NVIDIA GPU kernels.
 
 
 Thanks to the following collaborating university laboratories:
 
@@ -104,78 +104,13 @@ xLLM 提供了强大的智能计算能力，通过硬件系统的算力优化与
 └── xllm.cpp                       # xLLM启动入口
 ```
 
-当前支持模型列表：
-- DeepSeek-V3/R1
-- DeepSeek-R1-Distill-Qwen
-- Kimi-k2
-- Llama2/3
-- MiniCPM-V
-- MiMo-VL
-- Qwen2/2.5/QwQ
-- Qwen2.5-VL
-- Qwen3 / Qwen3-MoE
-- Qwen3-VL / Qwen3-VL-MoE
-- GLM-4.5 / GLM-4.6 / GLM-4.6V / GLM-4.7
-- VLM-R1
+请在[模型支持列表](docs/zh/model_list.md)查看模型支持情况。
 
 ---
 
-
 ## 4. 快速开始
-#### 安装
-首先下载我们提供的镜像：
-```bash
-# A2 x86
-docker pull quay.io/jd_xllm/xllm-ai:xllm-dev-hb-rc2-x86
-# A2 arm
-docker pull quay.io/jd_xllm/xllm-ai:xllm-dev-hb-rc2-arm
-# A3 arm
-docker pull quay.io/jd_xllm/xllm-ai:xllm-dev-hc-rc2-arm
-# 或者
-# A2 x86
-docker pull xllm/xllm-ai:xllm-dev-hb-rc2-x86
-# A2 arm
-docker pull xllm/xllm-ai:xllm-dev-hb-rc2-arm
-# A3 arm
-docker pull xllm/xllm-ai:xllm-dev-hc-rc2-arm
-```
-然后创建对应的容器
-```bash
-sudo docker run -it --ipc=host -u 0 --privileged --name mydocker --network=host  --device=/dev/davinci0  --device=/dev/davinci_manager --device=/dev/devmm_svm --device=/dev/hisi_hdc -v /var/queue_schedule:/var/queue_schedule -v /usr/local/Ascend/driver:/usr/local/Ascend/driver -v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/ -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi -v /usr/local/sbin/:/usr/local/sbin/ -v /var/log/npu/conf/slog/slog.conf:/var/log/npu/conf/slog/slog.conf -v /var/log/npu/slog/:/var/log/npu/slog -v /export/home:/export/home -w /export/home -v ~/.ssh:/root/.ssh  -v /var/log/npu/profiling/:/var/log/npu/profiling -v /var/log/npu/dump/:/var/log/npu/dump -v /home/:/home/  -v /runtime/:/runtime/ -v /etc/hccn.conf:/etc/hccn.conf xllm/xllm-ai:xllm-dev-hb-rc2-x86
-```
-
-下载官方仓库与模块依赖：
-```bash
-git clone https://github.com/jd-opensource/xllm
-cd xllm 
-git submodule init
-git submodule update
-```
-编译依赖[vcpkg](https://github.com/microsoft/vcpkg)，镜像中已经提前配置完成。如果您想要手动配置，可以执行如下命令:
-```bash
-git clone https://gitcode.com/xLLM-AI/vcpkg.git
-cd vcpkg && git checkout ffc42e97c866ce9692f5c441394832b86548422c
-export VCPKG_ROOT=/your/path/to/vcpkg
-```
 
-#### 编译
-执行编译，在`build/`下生成可执行文件`build/xllm/core/server/xllm`：
-```bash
-python setup.py build
-```
-或直接用以下命令编译在`dist/`下生成whl包:
-```bash
-python setup.py bdist_wheel
-```
-
-#### 执行
-运行例如如下命令启动xllm引擎：
-```bash
-./build/xllm/core/server/xllm \    # 启动 xllm 服务器程序
-    --model=/path/to/your/llm  \   # 指定模型路径（需替换为实际路径）
-    --port=9977 \                  # 设置服务端口为 9977
-    --max_memory_utilization 0.90  # 设置最大内存利用率为 90
-```
+请参考[快速开始文档](docs/zh/getting_started/quick_start.md)。
 
 ---
 
@@ -219,6 +154,7 @@ python setup.py bdist_wheel
 - [safetensors](https://github.com/huggingface/safetensors) - 依赖其c binding safetensors能力。
 - [Partial JSON Parser](https://github.com/promplate/partial-json-parser) - xLLM的C++版本JSON解析器，参考Python与Go实现的设计思路。
 - [concurrentqueue](https://github.com/cameron314/concurrentqueue) - 高性能无锁Queue.
+- [Flashinfer](https://github.com/flashinfer-ai/flashinfer) - 高性能NVIDIA GPU算子。
 
 感谢以下合作的高校实验室：
 
 
@@ -38,8 +38,8 @@ ENABLE_DECODE_RESPONSE_TO_SERVICE=true ./xllm_master_serving --etcd_addr="127.0.
 3. Start xLLM  
 - Taking Qwen2-7B as an example  
     - Start Prefill Instance
-        ``` shell linenums="1" hl_lines="10"
-        ./xllm --model=Qwen2-7B-Instruct \
+        ```bash
+        /path/to/xllm --model=Qwen2-7B-Instruct \
                --port=8010 \
                --devices="npu:0" \
                --master_node_addr="127.0.0.1:18888" \
@@ -54,8 +54,8 @@ ENABLE_DECODE_RESPONSE_TO_SERVICE=true ./xllm_master_serving --etcd_addr="127.0.
                --nnodes=1
         ```
     - Start Decode Instance 
-        ```shell linenums="1" hl_lines="11"  
-        ./xllm --model=Qwen2-7B-Instruct \
+        ```bash 
+        /path/to/xllm --model=Qwen2-7B-Instruct \
                --port=8020 \
                --devices="npu:1" \
                --master_node_addr="127.0.0.1:18898" \
 
@@ -58,8 +58,8 @@ ENABLE_DECODE_RESPONSE_TO_SERVICE=true ./xllm_master_serving --etcd_addr="127.0.
 Taking Qwen2-7B as an example:
 
 - Start Prefill Instance
-    ``` shell linenums="1" hl_lines="3 9 10"
-    ./xllm --model=path/to/Qwen2-7B-Instruct \
+    ```bash
+    /path/to/xllm --model=path/to/Qwen2-7B-Instruct \
            --port=8010 \
            --devices="npu:0" \
            --master_node_addr="127.0.0.1:18888" \
@@ -74,8 +74,8 @@ Taking Qwen2-7B as an example:
            --nnodes=1
     ```
 - Start Decode Instance 
-    ``` shell linenums="1" hl_lines="3 9 10"
-    ./xllm --model=path/to/Qwen2-7B-Instruct \
+    ```bash
+    /path/to/xllm --model=path/to/Qwen2-7B-Instruct \
            --port=8020 \
            --devices="npu:1" \
            --master_node_addr="127.0.0.1:18898" \