ModelTC
diff --git a/‎README.md‎
Lines changed: 4 additions & 1 deletion b/‎README.md‎
Lines changed: 4 additions & 1 deletion
diff --git a/‎docker/cuda_version_12.8.0/Dockerfile‎
Lines changed: 3 additions & 3 deletions b/‎docker/cuda_version_12.8.0/Dockerfile‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docker/cuda_version_12.8.0/Dockerfile.deepep‎
Lines changed: 3 additions & 4 deletions b/‎docker/cuda_version_12.8.0/Dockerfile.deepep‎
Lines changed: 3 additions & 4 deletions
diff --git a/‎docker/cuda_version_12.8.0/Dockerfile.nixl‎
Lines changed: 5 additions & 4 deletions b/‎docker/cuda_version_12.8.0/Dockerfile.nixl‎
Lines changed: 5 additions & 4 deletions
diff --git a/‎docker/cuda_version_12.8.0/Dockerfile.nixl.deepep‎
Lines changed: 5 additions & 4 deletions b/‎docker/cuda_version_12.8.0/Dockerfile.nixl.deepep‎
Lines changed: 5 additions & 4 deletions
diff --git a/‎docker/cuda_version_12.8.0/Dockerfile.nixl.deepep.cache‎
Lines changed: 124 additions & 0 deletions b/‎docker/cuda_version_12.8.0/Dockerfile.nixl.deepep.cache‎
Lines changed: 124 additions & 0 deletions
diff --git a/‎docs/CN/source/getting_started/benchmark.rst‎
Lines changed: 6 additions & 6 deletions b/‎docs/CN/source/getting_started/benchmark.rst‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎docs/CN/source/index.rst‎
Lines changed: 1 addition & 0 deletions b/‎docs/CN/source/index.rst‎
Lines changed: 1 addition & 0 deletions
@@ -20,6 +20,9 @@ LightLLM is a Python-based LLM (Large Language Model) inference and serving fram
 
 [English Docs](https://lightllm-en.readthedocs.io/en/latest/) | [中文文档](https://lightllm-cn.readthedocs.io/en/latest/) | [Blogs](https://modeltc.github.io/lightllm-blog/)
 
+## Tech Blogs
+- [2025/11] 🚀 Prefix KV Cache Transfer between DP rankers is now supported! Check out the technical deep dive in our [blog post](https://light-ai.top/lightllm-blog/2025/11/18/dp_kv_fetch.html).
+
 ## News
 - [2025/09] 🔥 LightLLM [v1.1.0](https://www.light-ai.top/lightllm-blog/2025/09/03/lightllm.html) release!
 - [2025/08] Pre $^3$ achieves the outstanding paper award of [ACL2025](https://2025.aclweb.org/program/awards/).
@@ -36,7 +39,7 @@ LightLLM is a Python-based LLM (Large Language Model) inference and serving fram
 
 ## Performance
 
-Learn more in the release blogs: [v1.0.0 blog](https://www.light-ai.top/lightllm-blog//by%20mtc%20team/2025/02/16/lightllm/).
+Learn more in the release blogs: [v1.1.0 blog](https://www.light-ai.top/lightllm-blog/2025/09/03/lightllm.html).
 
 ## FAQ
 
 
@@ -33,11 +33,11 @@ RUN case ${TARGETPLATFORM} in \
 
 WORKDIR /root
 
+RUN pip install --no-cache-dir vllm==0.11.0 --pre --extra-index-url https://wheels.vllm.ai/nightly
+
 COPY ./requirements.txt /lightllm/requirements.txt
 RUN pip install -U pip
-RUN --mount=type=cache,target=/root/.cache/pip pip install -r /lightllm/requirements.txt --ignore-installed --extra-index-url https://download.pytorch.org/whl/cu128
-
-RUN pip install --no-cache-dir vllm --pre --extra-index-url https://wheels.vllm.ai/nightly 
+RUN --mount=type=cache,target=/root/.cache/pip pip install -r /lightllm/requirements.txt --extra-index-url https://download.pytorch.org/whl/cu128
 
 # TODO: offline compile
 # RUN git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v .
 
@@ -33,12 +33,11 @@ RUN case ${TARGETPLATFORM} in \
 
 WORKDIR /root
 
+RUN pip install --no-cache-dir vllm==0.11.0 --pre --extra-index-url https://wheels.vllm.ai/nightly
+
 COPY ./requirements.txt /lightllm/requirements.txt
 RUN pip install -U pip
-RUN --mount=type=cache,target=/root/.cache/pip pip install -r /lightllm/requirements.txt --ignore-installed --extra-index-url https://download.pytorch.org/whl/cu128
-
-RUN pip install --no-cache-dir vllm --pre --extra-index-url https://wheels.vllm.ai/nightly 
-
+RUN --mount=type=cache,target=/root/.cache/pip pip install -r /lightllm/requirements.txt --extra-index-url https://download.pytorch.org/whl/cu128
 # TODO: offline compile
 # RUN git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v .
 
 
@@ -33,11 +33,12 @@ RUN case ${TARGETPLATFORM} in \
 
 WORKDIR /root
 
+RUN --mount=type=cache,target=/root/.cache/pip pip install vllm==0.11.0 --pre --extra-index-url https://wheels.vllm.ai/nightly
+
 COPY ./requirements.txt /lightllm/requirements.txt
-RUN --mount=type=cache,target=/root/.cache/pip pip install -r /lightllm/requirements.txt --ignore-installed --extra-index-url https://download.pytorch.org/whl/cu128
+RUN --mount=type=cache,target=/root/.cache/pip pip install -r /lightllm/requirements.txt --extra-index-url https://download.pytorch.org/whl/cu128
 
-RUN --mount=type=cache,target=/root/.cache/pip pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
-RUN --mount=type=cache,target=/root/.cache/pip git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel &&  pip install --no-deps -v .
+RUN --mount=type=cache,target=/root/.cache/pip pip install --no-deps -v 'git+https://github.com/ModelTC/LightKernel.git@07f2f62af5deb41f10a22660f9f42dba9273361e#egg=lightllm_kernel'
 
 RUN apt-get update && apt-get install -y libnuma-dev wget devscripts debhelper dh-make build-essential dkms
 RUN apt-get install -y ibverbs-providers infiniband-diags perftest rdma-core libibverbs-dev librdmacm-dev
@@ -77,7 +78,7 @@ RUN apt-get update && apt-get install -y cmake automake autotools-dev  libtool l
     make -j install-strip &&        \
     ldconfig;
 
-RUN apt-get update && apt-get install -y  pkg-config tmux net-tools ;  \
+RUN apt-get update && apt-get install -y  pkg-config tmux net-tools libaio-dev ;  \
     cd /usr/local/src; \
     pip install --upgrade meson pybind11 patchelf; \
     git clone https://github.com/ai-dynamo/nixl.git -b main && \
 
@@ -35,11 +35,12 @@ RUN case ${TARGETPLATFORM} in \
 
 WORKDIR /root
 
+RUN --mount=type=cache,target=/root/.cache/pip pip install vllm==0.11.0 --pre --extra-index-url https://wheels.vllm.ai/nightly
+
 COPY ./requirements.txt /lightllm/requirements.txt
-RUN --mount=type=cache,target=/root/.cache/pip pip install -r /lightllm/requirements.txt --ignore-installed --extra-index-url https://download.pytorch.org/whl/cu128
+RUN --mount=type=cache,target=/root/.cache/pip pip install -r /lightllm/requirements.txt --extra-index-url https://download.pytorch.org/whl/cu128
 
-RUN --mount=type=cache,target=/root/.cache/pip pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
-RUN --mount=type=cache,target=/root/.cache/pip git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel &&  pip install --no-deps -v .
+RUN --mount=type=cache,target=/root/.cache/pip pip install --no-deps -v 'git+https://github.com/ModelTC/LightKernel.git@07f2f62af5deb41f10a22660f9f42dba9273361e#egg=lightllm_kernel'
 
 RUN apt-get update && apt-get install -y libnuma-dev wget devscripts debhelper dh-make build-essential dkms
 RUN apt-get install -y ibverbs-providers infiniband-diags perftest rdma-core libibverbs-dev librdmacm-dev
@@ -104,7 +105,7 @@ RUN apt-get update && apt-get install -y cmake automake autotools-dev  libtool l
     make -j install-strip &&        \
     ldconfig;
 
-RUN apt-get update && apt-get install -y  pkg-config tmux net-tools ;  \
+RUN apt-get update && apt-get install -y  pkg-config tmux net-tools libaio-dev ;  \
     cd /usr/local/src; \
     pip install --upgrade meson pybind11 patchelf; \
     git clone https://github.com/ai-dynamo/nixl.git -b main && \
 
@@ -0,0 +1,124 @@
+ARG CUDA_VERSION=12.8.0
+FROM nvidia/cuda:${CUDA_VERSION}-cudnn-devel-ubuntu22.04
+
+ARG PYTHON_VERSION=3.10
+ARG MAMBA_VERSION=24.7.1-0
+ARG TARGETPLATFORM
+
+ENV PATH=/opt/conda/bin:$PATH \
+    CONDA_PREFIX=/opt/conda
+
+RUN chmod 777 -R /tmp && apt-get update --allow-insecure-repositories && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
+    ca-certificates \
+    libssl-dev \
+    curl \
+    g++ \
+    make \
+    git && \
+    rm -rf /var/lib/apt/lists/*
+
+RUN case ${TARGETPLATFORM} in \
+    "linux/arm64")  MAMBA_ARCH=aarch64  ;; \
+    *)              MAMBA_ARCH=x86_64   ;; \
+    esac && \
+    curl -fsSL -o ~/mambaforge.sh -v "https://github.com/conda-forge/miniforge/releases/download/${MAMBA_VERSION}/Mambaforge-${MAMBA_VERSION}-Linux-${MAMBA_ARCH}.sh" && \
+    bash ~/mambaforge.sh -b -p /opt/conda && \
+    rm ~/mambaforge.sh
+
+RUN case ${TARGETPLATFORM} in \
+    "linux/arm64")  exit 1 ;; \
+    *)              /opt/conda/bin/conda update -y conda &&  \
+    /opt/conda/bin/conda install -y "python=${PYTHON_VERSION}" && \
+    /opt/conda/bin/conda install -y boost ;; \
+    esac && \
+    /opt/conda/bin/conda clean -ya
+
+
+WORKDIR /root
+
+RUN --mount=type=cache,target=/root/.cache/pip pip install vllm==0.11.0 --pre --extra-index-url https://wheels.vllm.ai/nightly
+
+COPY ./requirements.txt /lightllm/requirements.txt
+RUN --mount=type=cache,target=/root/.cache/pip pip install -r /lightllm/requirements.txt --extra-index-url https://download.pytorch.org/whl/cu128
+
+RUN --mount=type=cache,target=/root/.cache/pip pip install --no-deps -v 'git+https://github.com/ModelTC/LightKernel.git@07f2f62af5deb41f10a22660f9f42dba9273361e#egg=lightllm_kernel'
+RUN --mount=type=cache,target=/root/.cache/pip pip install --no-deps -v 'git+https://github.com/ModelTC/LightMem.git@5900baf92d85ef4dbda6124093506b0af906011a#egg=light_mem'
+
+RUN apt-get update && apt-get install -y libnuma-dev wget devscripts debhelper dh-make build-essential dkms
+RUN apt-get install -y ibverbs-providers infiniband-diags perftest rdma-core libibverbs-dev librdmacm-dev
+
+ENV CUDA_HOME=/usr/local/cuda \
+    GDRCOPY_HOME=/usr/src/gdrdrv-2.4.4/
+
+RUN mkdir -p /tmp/gdrcopy && cd /tmp \
+ && git clone https://github.com/NVIDIA/gdrcopy.git -b v2.4.4 \
+ && cd gdrcopy/packages \
+ && CUDA=/usr/local/cuda ./build-deb-packages.sh \
+ && dpkg -i gdrdrv-dkms_*.deb libgdrapi_*.deb gdrcopy-tests_*.deb gdrcopy_*.deb \
+ && cd / && rm -rf /tmp/gdrcopy
+
+ # Fix DeepEP IBGDA symlink
+RUN ln -sf /usr/lib/x86_64-linux-gnu/libmlx5.so.1 /usr/lib/x86_64-linux-gnu/libmlx5.so
+
+RUN wget https://developer.download.nvidia.com/compute/redist/nvshmem/3.3.9/source/nvshmem_src_cuda12-all-all-3.3.9.tar.gz \
+ && tar -xf nvshmem_src_cuda12-all-all-3.3.9.tar.gz && mv nvshmem_src nvshmem \
+ && cd nvshmem \
+ && rm -f /root/nvshmem_src_cuda12-all-all-3.3.9.tar.gz \
+ && NVSHMEM_SHMEM_SUPPORT=0 \
+    NVSHMEM_UCX_SUPPORT=0 \
+    NVSHMEM_USE_NCCL=0 \
+    NVSHMEM_MPI_SUPPORT=0 \
+    NVSHMEM_IBGDA_SUPPORT=1 \
+    NVSHMEM_PMIX_SUPPORT=0 \
+    NVSHMEM_TIMEOUT_DEVICE_POLLING=0 \
+    NVSHMEM_USE_GDRCOPY=1 \
+    cmake -S . -B build/ -DCMAKE_INSTALL_PREFIX=/root/nvshmem/install -DCMAKE_CUDA_ARCHITECTURES=90 \
+ && cmake --build build --target install -j64
+
+ARG DEEPEP_COMMIT=b6ce310bb0b75079682d09bc2ebc063a074fbd58
+RUN git clone https://github.com/deepseek-ai/DeepEP.git && cd DeepEP && git checkout ${DEEPEP_COMMIT} && cd ..
+
+WORKDIR /root/DeepEP
+ENV NVSHMEM_DIR=/root/nvshmem/install
+RUN NVSHMEM_DIR=/root/nvshmem/install python setup.py install
+
+RUN apt-get update && apt-get install -y cmake automake autotools-dev  libtool libz-dev && \
+    DEBIAN_FRONTEND=noninteractive apt-get -y install --reinstall libibverbs-dev rdma-core ibverbs-utils libibumad-dev; \
+    rm -rf /usr/lib/ucx && \
+    rm -rf /opt/hpcx/ucx && \
+    cd /usr/local/src && \
+    git clone https://github.com/openucx/ucx.git && \
+    cd ucx && 			     \
+    git checkout v1.19.x &&	     \
+    ./autogen.sh && ./configure     \
+    --enable-shared             \
+    --disable-static            \
+    --disable-doxygen-doc       \
+    --enable-optimizations      \
+    --enable-cma                \
+    --enable-devel-headers      \
+    --with-cuda=/usr/local/cuda \
+    --with-verbs=yes                \
+    --with-dm                   \
+    --with-gdrcopy=/usr/local   \
+    --with-efa                  \
+    --enable-mt &&              \
+    make -j &&                      \
+    make -j install-strip &&        \
+    ldconfig;
+
+RUN apt-get update && apt-get install -y  pkg-config tmux net-tools libaio-dev ;  \
+    cd /usr/local/src; \
+    pip install --upgrade meson pybind11 patchelf; \
+    git clone https://github.com/ai-dynamo/nixl.git -b main && \
+    cd nixl && \
+    rm -rf build && \
+    mkdir build && \
+    meson setup build/ --prefix=/usr/local/nixl --buildtype=release && \
+    cd build && \
+    ninja && \
+    ninja install && \
+    cd .. && pip install . --no-deps;
+
+COPY . /lightllm
+RUN pip install -e /lightllm --no-cache-dir
@@ -4,7 +4,7 @@ Benchmark 测试指南
 LightLLM 提供了全面的性能测试工具，包括服务端性能测试和静态推理性能测试。本文档将详细介绍如何使用这些工具进行性能评估。
 
 服务端性能测试 (Service Benchmark)
----------------------------------
+-----------------------------------
 
 服务端性能测试主要用于评估 LightLLM 在真实服务场景下的性能表现，包括吞吐量、延迟等关键指标。
 
@@ -55,7 +55,7 @@ QPS (Queries Per Second) 测试是评估服务端性能的核心工具，支持
 - decode_token_time P{25,50,75,90,95,99,100}: 解码 token 延迟百分位数
 
 固定并发测试 (benchmark_client.py)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 用于评估不同客户端并发数下的性能表现。
 
@@ -73,7 +73,7 @@ QPS (Queries Per Second) 测试是评估服务端性能的核心工具，支持
         --server_api lightllm
 
 ShareGPT 数据集测试 (benchmark_sharegpt.py)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 使用 ShareGPT 真实对话数据进行性能测试。
 
@@ -131,7 +131,7 @@ Prompt Cache 测试
 - ``--num_users``: 用户数
 
 静态推理性能测试 (Static Inference Benchmark)
---------------------------------------------
+----------------------------------------------
 
 静态推理测试用于评估模型在固定输入条件下的推理性能, 主要评估算子的优劣
 模型推理测试 (model_infer.py)
@@ -178,7 +178,7 @@ Prompt Cache 测试
 - 各阶段延迟统计
 
 多结果预测性能测试 (model_infer_mtp.py)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 多结果预测静态性能测试，默认百分百接受率，用来评估多结果预测的极限性能。目前只支持DeepSeek 系列模型
 
@@ -203,7 +203,7 @@ Prompt Cache 测试
 - ``--mtp_draft_model_dir``: 草稿模型路径
 
 Vision Transformer 测试 (test_vit.py)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 用于测试 Vision Transformer 模型的性能。
 
 
@@ -49,6 +49,7 @@ Lightllm 整合了众多的开源方案的优点，包括但不限于 FasterTran
    :caption: 部署教程
 
    DeepSeek R1 部署 <tutorial/deepseek_deployment>
+   多级缓存部署 <tutorial/multi_level_cache_deployment>
    多模态部署 <tutorial/multimodal>
    奖励模型部署 <tutorial/reward_model>
    OpenAI 接口使用 <tutorial/openai>