Skip to content

Commit f6c5d64

Browse files
author
sangchengmeng
committed
Merge branch 'main' into add-qwen3-vl
2 parents 02486eb + aff4049 commit f6c5d64

File tree

109 files changed

+2018
-632
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

109 files changed

+2018
-632
lines changed

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,9 @@ LightLLM is a Python-based LLM (Large Language Model) inference and serving fram
2020

2121
[English Docs](https://lightllm-en.readthedocs.io/en/latest/) | [中文文档](https://lightllm-cn.readthedocs.io/en/latest/) | [Blogs](https://modeltc.github.io/lightllm-blog/)
2222

23+
## Tech Blogs
24+
- [2025/11] 🚀 Prefix KV Cache Transfer between DP rankers is now supported! Check out the technical deep dive in our [blog post](https://light-ai.top/lightllm-blog/2025/11/18/dp_kv_fetch.html).
25+
2326
## News
2427
- [2025/09] 🔥 LightLLM [v1.1.0](https://www.light-ai.top/lightllm-blog/2025/09/03/lightllm.html) release!
2528
- [2025/08] Pre $^3$ achieves the outstanding paper award of [ACL2025](https://2025.aclweb.org/program/awards/).
@@ -36,7 +39,7 @@ LightLLM is a Python-based LLM (Large Language Model) inference and serving fram
3639

3740
## Performance
3841

39-
Learn more in the release blogs: [v1.0.0 blog](https://www.light-ai.top/lightllm-blog//by%20mtc%20team/2025/02/16/lightllm/).
42+
Learn more in the release blogs: [v1.1.0 blog](https://www.light-ai.top/lightllm-blog/2025/09/03/lightllm.html).
4043

4144
## FAQ
4245

docker/cuda_version_12.8.0/Dockerfile

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -33,11 +33,11 @@ RUN case ${TARGETPLATFORM} in \
3333

3434
WORKDIR /root
3535

36+
RUN pip install --no-cache-dir vllm==0.11.0 --pre --extra-index-url https://wheels.vllm.ai/nightly
37+
3638
COPY ./requirements.txt /lightllm/requirements.txt
3739
RUN pip install -U pip
38-
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /lightllm/requirements.txt --ignore-installed --extra-index-url https://download.pytorch.org/whl/cu128
39-
40-
RUN pip install --no-cache-dir vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
40+
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /lightllm/requirements.txt --extra-index-url https://download.pytorch.org/whl/cu128
4141

4242
# TODO: offline compile
4343
# RUN git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v .

docker/cuda_version_12.8.0/Dockerfile.deepep

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,12 +33,11 @@ RUN case ${TARGETPLATFORM} in \
3333

3434
WORKDIR /root
3535

36+
RUN pip install --no-cache-dir vllm==0.11.0 --pre --extra-index-url https://wheels.vllm.ai/nightly
37+
3638
COPY ./requirements.txt /lightllm/requirements.txt
3739
RUN pip install -U pip
38-
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /lightllm/requirements.txt --ignore-installed --extra-index-url https://download.pytorch.org/whl/cu128
39-
40-
RUN pip install --no-cache-dir vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
41-
40+
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /lightllm/requirements.txt --extra-index-url https://download.pytorch.org/whl/cu128
4241
# TODO: offline compile
4342
# RUN git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v .
4443

docker/cuda_version_12.8.0/Dockerfile.nixl

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,11 +33,12 @@ RUN case ${TARGETPLATFORM} in \
3333

3434
WORKDIR /root
3535

36+
RUN --mount=type=cache,target=/root/.cache/pip pip install vllm==0.11.0 --pre --extra-index-url https://wheels.vllm.ai/nightly
37+
3638
COPY ./requirements.txt /lightllm/requirements.txt
37-
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /lightllm/requirements.txt --ignore-installed --extra-index-url https://download.pytorch.org/whl/cu128
39+
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /lightllm/requirements.txt --extra-index-url https://download.pytorch.org/whl/cu128
3840

39-
RUN --mount=type=cache,target=/root/.cache/pip pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
40-
RUN --mount=type=cache,target=/root/.cache/pip git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v .
41+
RUN --mount=type=cache,target=/root/.cache/pip pip install --no-deps -v 'git+https://github.com/ModelTC/LightKernel.git@07f2f62af5deb41f10a22660f9f42dba9273361e#egg=lightllm_kernel'
4142

4243
RUN apt-get update && apt-get install -y libnuma-dev wget devscripts debhelper dh-make build-essential dkms
4344
RUN apt-get install -y ibverbs-providers infiniband-diags perftest rdma-core libibverbs-dev librdmacm-dev
@@ -77,7 +78,7 @@ RUN apt-get update && apt-get install -y cmake automake autotools-dev libtool l
7778
make -j install-strip && \
7879
ldconfig;
7980

80-
RUN apt-get update && apt-get install -y pkg-config tmux net-tools ; \
81+
RUN apt-get update && apt-get install -y pkg-config tmux net-tools libaio-dev ; \
8182
cd /usr/local/src; \
8283
pip install --upgrade meson pybind11 patchelf; \
8384
git clone https://github.com/ai-dynamo/nixl.git -b main && \

docker/cuda_version_12.8.0/Dockerfile.nixl.deepep

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35,11 +35,12 @@ RUN case ${TARGETPLATFORM} in \
3535

3636
WORKDIR /root
3737

38+
RUN --mount=type=cache,target=/root/.cache/pip pip install vllm==0.11.0 --pre --extra-index-url https://wheels.vllm.ai/nightly
39+
3840
COPY ./requirements.txt /lightllm/requirements.txt
39-
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /lightllm/requirements.txt --ignore-installed --extra-index-url https://download.pytorch.org/whl/cu128
41+
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /lightllm/requirements.txt --extra-index-url https://download.pytorch.org/whl/cu128
4042

41-
RUN --mount=type=cache,target=/root/.cache/pip pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
42-
RUN --mount=type=cache,target=/root/.cache/pip git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v .
43+
RUN --mount=type=cache,target=/root/.cache/pip pip install --no-deps -v 'git+https://github.com/ModelTC/LightKernel.git@07f2f62af5deb41f10a22660f9f42dba9273361e#egg=lightllm_kernel'
4344

4445
RUN apt-get update && apt-get install -y libnuma-dev wget devscripts debhelper dh-make build-essential dkms
4546
RUN apt-get install -y ibverbs-providers infiniband-diags perftest rdma-core libibverbs-dev librdmacm-dev
@@ -104,7 +105,7 @@ RUN apt-get update && apt-get install -y cmake automake autotools-dev libtool l
104105
make -j install-strip && \
105106
ldconfig;
106107

107-
RUN apt-get update && apt-get install -y pkg-config tmux net-tools ; \
108+
RUN apt-get update && apt-get install -y pkg-config tmux net-tools libaio-dev ; \
108109
cd /usr/local/src; \
109110
pip install --upgrade meson pybind11 patchelf; \
110111
git clone https://github.com/ai-dynamo/nixl.git -b main && \
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
ARG CUDA_VERSION=12.8.0
2+
FROM nvidia/cuda:${CUDA_VERSION}-cudnn-devel-ubuntu22.04
3+
4+
ARG PYTHON_VERSION=3.10
5+
ARG MAMBA_VERSION=24.7.1-0
6+
ARG TARGETPLATFORM
7+
8+
ENV PATH=/opt/conda/bin:$PATH \
9+
CONDA_PREFIX=/opt/conda
10+
11+
RUN chmod 777 -R /tmp && apt-get update --allow-insecure-repositories && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
12+
ca-certificates \
13+
libssl-dev \
14+
curl \
15+
g++ \
16+
make \
17+
git && \
18+
rm -rf /var/lib/apt/lists/*
19+
20+
RUN case ${TARGETPLATFORM} in \
21+
"linux/arm64") MAMBA_ARCH=aarch64 ;; \
22+
*) MAMBA_ARCH=x86_64 ;; \
23+
esac && \
24+
curl -fsSL -o ~/mambaforge.sh -v "https://github.com/conda-forge/miniforge/releases/download/${MAMBA_VERSION}/Mambaforge-${MAMBA_VERSION}-Linux-${MAMBA_ARCH}.sh" && \
25+
bash ~/mambaforge.sh -b -p /opt/conda && \
26+
rm ~/mambaforge.sh
27+
28+
RUN case ${TARGETPLATFORM} in \
29+
"linux/arm64") exit 1 ;; \
30+
*) /opt/conda/bin/conda update -y conda && \
31+
/opt/conda/bin/conda install -y "python=${PYTHON_VERSION}" && \
32+
/opt/conda/bin/conda install -y boost ;; \
33+
esac && \
34+
/opt/conda/bin/conda clean -ya
35+
36+
37+
WORKDIR /root
38+
39+
RUN --mount=type=cache,target=/root/.cache/pip pip install vllm==0.11.0 --pre --extra-index-url https://wheels.vllm.ai/nightly
40+
41+
COPY ./requirements.txt /lightllm/requirements.txt
42+
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /lightllm/requirements.txt --extra-index-url https://download.pytorch.org/whl/cu128
43+
44+
RUN --mount=type=cache,target=/root/.cache/pip pip install --no-deps -v 'git+https://github.com/ModelTC/LightKernel.git@07f2f62af5deb41f10a22660f9f42dba9273361e#egg=lightllm_kernel'
45+
RUN --mount=type=cache,target=/root/.cache/pip pip install --no-deps -v 'git+https://github.com/ModelTC/LightMem.git@5900baf92d85ef4dbda6124093506b0af906011a#egg=light_mem'
46+
47+
RUN apt-get update && apt-get install -y libnuma-dev wget devscripts debhelper dh-make build-essential dkms
48+
RUN apt-get install -y ibverbs-providers infiniband-diags perftest rdma-core libibverbs-dev librdmacm-dev
49+
50+
ENV CUDA_HOME=/usr/local/cuda \
51+
GDRCOPY_HOME=/usr/src/gdrdrv-2.4.4/
52+
53+
RUN mkdir -p /tmp/gdrcopy && cd /tmp \
54+
&& git clone https://github.com/NVIDIA/gdrcopy.git -b v2.4.4 \
55+
&& cd gdrcopy/packages \
56+
&& CUDA=/usr/local/cuda ./build-deb-packages.sh \
57+
&& dpkg -i gdrdrv-dkms_*.deb libgdrapi_*.deb gdrcopy-tests_*.deb gdrcopy_*.deb \
58+
&& cd / && rm -rf /tmp/gdrcopy
59+
60+
# Fix DeepEP IBGDA symlink
61+
RUN ln -sf /usr/lib/x86_64-linux-gnu/libmlx5.so.1 /usr/lib/x86_64-linux-gnu/libmlx5.so
62+
63+
RUN wget https://developer.download.nvidia.com/compute/redist/nvshmem/3.3.9/source/nvshmem_src_cuda12-all-all-3.3.9.tar.gz \
64+
&& tar -xf nvshmem_src_cuda12-all-all-3.3.9.tar.gz && mv nvshmem_src nvshmem \
65+
&& cd nvshmem \
66+
&& rm -f /root/nvshmem_src_cuda12-all-all-3.3.9.tar.gz \
67+
&& NVSHMEM_SHMEM_SUPPORT=0 \
68+
NVSHMEM_UCX_SUPPORT=0 \
69+
NVSHMEM_USE_NCCL=0 \
70+
NVSHMEM_MPI_SUPPORT=0 \
71+
NVSHMEM_IBGDA_SUPPORT=1 \
72+
NVSHMEM_PMIX_SUPPORT=0 \
73+
NVSHMEM_TIMEOUT_DEVICE_POLLING=0 \
74+
NVSHMEM_USE_GDRCOPY=1 \
75+
cmake -S . -B build/ -DCMAKE_INSTALL_PREFIX=/root/nvshmem/install -DCMAKE_CUDA_ARCHITECTURES=90 \
76+
&& cmake --build build --target install -j64
77+
78+
ARG DEEPEP_COMMIT=b6ce310bb0b75079682d09bc2ebc063a074fbd58
79+
RUN git clone https://github.com/deepseek-ai/DeepEP.git && cd DeepEP && git checkout ${DEEPEP_COMMIT} && cd ..
80+
81+
WORKDIR /root/DeepEP
82+
ENV NVSHMEM_DIR=/root/nvshmem/install
83+
RUN NVSHMEM_DIR=/root/nvshmem/install python setup.py install
84+
85+
RUN apt-get update && apt-get install -y cmake automake autotools-dev libtool libz-dev && \
86+
DEBIAN_FRONTEND=noninteractive apt-get -y install --reinstall libibverbs-dev rdma-core ibverbs-utils libibumad-dev; \
87+
rm -rf /usr/lib/ucx && \
88+
rm -rf /opt/hpcx/ucx && \
89+
cd /usr/local/src && \
90+
git clone https://github.com/openucx/ucx.git && \
91+
cd ucx && \
92+
git checkout v1.19.x && \
93+
./autogen.sh && ./configure \
94+
--enable-shared \
95+
--disable-static \
96+
--disable-doxygen-doc \
97+
--enable-optimizations \
98+
--enable-cma \
99+
--enable-devel-headers \
100+
--with-cuda=/usr/local/cuda \
101+
--with-verbs=yes \
102+
--with-dm \
103+
--with-gdrcopy=/usr/local \
104+
--with-efa \
105+
--enable-mt && \
106+
make -j && \
107+
make -j install-strip && \
108+
ldconfig;
109+
110+
RUN apt-get update && apt-get install -y pkg-config tmux net-tools libaio-dev ; \
111+
cd /usr/local/src; \
112+
pip install --upgrade meson pybind11 patchelf; \
113+
git clone https://github.com/ai-dynamo/nixl.git -b main && \
114+
cd nixl && \
115+
rm -rf build && \
116+
mkdir build && \
117+
meson setup build/ --prefix=/usr/local/nixl --buildtype=release && \
118+
cd build && \
119+
ninja && \
120+
ninja install && \
121+
cd .. && pip install . --no-deps;
122+
123+
COPY . /lightllm
124+
RUN pip install -e /lightllm --no-cache-dir

docs/CN/source/getting_started/benchmark.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Benchmark 测试指南
44
LightLLM 提供了全面的性能测试工具,包括服务端性能测试和静态推理性能测试。本文档将详细介绍如何使用这些工具进行性能评估。
55

66
服务端性能测试 (Service Benchmark)
7-
---------------------------------
7+
-----------------------------------
88

99
服务端性能测试主要用于评估 LightLLM 在真实服务场景下的性能表现,包括吞吐量、延迟等关键指标。
1010

@@ -55,7 +55,7 @@ QPS (Queries Per Second) 测试是评估服务端性能的核心工具,支持
5555
- decode_token_time P{25,50,75,90,95,99,100}: 解码 token 延迟百分位数
5656

5757
固定并发测试 (benchmark_client.py)
58-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
58+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5959

6060
用于评估不同客户端并发数下的性能表现。
6161

@@ -73,7 +73,7 @@ QPS (Queries Per Second) 测试是评估服务端性能的核心工具,支持
7373
--server_api lightllm
7474
7575
ShareGPT 数据集测试 (benchmark_sharegpt.py)
76-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
76+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
7777

7878
使用 ShareGPT 真实对话数据进行性能测试。
7979

@@ -131,7 +131,7 @@ Prompt Cache 测试
131131
- ``--num_users``: 用户数
132132

133133
静态推理性能测试 (Static Inference Benchmark)
134-
--------------------------------------------
134+
----------------------------------------------
135135

136136
静态推理测试用于评估模型在固定输入条件下的推理性能, 主要评估算子的优劣
137137
模型推理测试 (model_infer.py)
@@ -178,7 +178,7 @@ Prompt Cache 测试
178178
- 各阶段延迟统计
179179

180180
多结果预测性能测试 (model_infer_mtp.py)
181-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
181+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
182182

183183
多结果预测静态性能测试,默认百分百接受率,用来评估多结果预测的极限性能。目前只支持DeepSeek 系列模型
184184

@@ -203,7 +203,7 @@ Prompt Cache 测试
203203
- ``--mtp_draft_model_dir``: 草稿模型路径
204204

205205
Vision Transformer 测试 (test_vit.py)
206-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
206+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
207207

208208
用于测试 Vision Transformer 模型的性能。
209209

docs/CN/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ Lightllm 整合了众多的开源方案的优点,包括但不限于 FasterTran
4949
:caption: 部署教程
5050

5151
DeepSeek R1 部署 <tutorial/deepseek_deployment>
52+
多级缓存部署 <tutorial/multi_level_cache_deployment>
5253
多模态部署 <tutorial/multimodal>
5354
奖励模型部署 <tutorial/reward_model>
5455
OpenAI 接口使用 <tutorial/openai>

0 commit comments

Comments
 (0)