-
Notifications
You must be signed in to change notification settings - Fork 293
update dockerfile #981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update dockerfile #981
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,10 +1,8 @@ | ||
| FROM nvcr.io/nvidia/tritonserver:24.04-py3-min as base | ||
| ARG PYTORCH_VERSION=2.6.0 | ||
| ARG PYTHON_VERSION=3.9 | ||
| ARG CUDA_VERSION=12.4 | ||
| ARG MAMBA_VERSION=23.1.0-1 | ||
| ARG CUDA_VERSION=12.6.1 | ||
| FROM nvidia/cuda:${CUDA_VERSION}-cudnn-devel-ubuntu22.04 | ||
| ARG PYTHON_VERSION=3.10 | ||
| ARG MAMBA_VERSION=24.7.1-0 | ||
| ARG TARGETPLATFORM | ||
|
|
||
| ENV PATH=/opt/conda/bin:$PATH \ | ||
| CONDA_PREFIX=/opt/conda | ||
|
|
||
|
|
@@ -21,7 +19,7 @@ RUN case ${TARGETPLATFORM} in \ | |
| "linux/arm64") MAMBA_ARCH=aarch64 ;; \ | ||
| *) MAMBA_ARCH=x86_64 ;; \ | ||
| esac && \ | ||
| curl -fsSL -o ~/mambaforge.sh -v "https://github.com/conda-forge/miniforge/releases/download/${MAMBA_VERSION}/Mambaforge-${MAMBA_VERSION}-Linux-${MAMBA_ARCH}.sh" && \ | ||
| curl -fsSL -o ~/mambaforge.sh "https://github.com/conda-forge/miniforge/releases/download/${MAMBA_VERSION}/Mambaforge-${MAMBA_VERSION}-Linux-${MAMBA_ARCH}.sh" && \ | ||
| bash ~/mambaforge.sh -b -p /opt/conda && \ | ||
| rm ~/mambaforge.sh | ||
|
|
||
|
|
@@ -36,11 +34,14 @@ RUN case ${TARGETPLATFORM} in \ | |
| WORKDIR /root | ||
|
|
||
| COPY ./requirements.txt /lightllm/requirements.txt | ||
| RUN pip install -r /lightllm/requirements.txt --no-cache-dir --ignore-installed --extra-index-url https://download.pytorch.org/whl/cu124 | ||
| RUN pip install -U pip | ||
| RUN pip install -r /lightllm/requirements.txt --no-cache-dir | ||
|
|
||
| RUN pip install --no-cache-dir vllm --pre --extra-index-url https://wheels.vllm.ai/nightly | ||
|
|
||
| RUN pip install --no-cache-dir https://github.com/ModelTC/flash-attn-3-build/releases/download/v2.7.4.post1/flash_attn-3.0.0b1-cp39-cp39-linux_x86_64.whl | ||
| RUN git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v . | ||
|
|
||
| RUN pip install --no-cache-dir nvidia-nccl-cu12==2.25.1 # for allreduce hang issues in multinode H100 | ||
| RUN apt-get update && apt-get install -y libnuma-dev # for sgl_kernel | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To reduce the Docker image size, it's recommended to clean up the apt cache after installing packages. This should be done in the same |
||
|
|
||
| COPY . /lightllm | ||
| RUN pip install -e /lightllm --no-cache-dir | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,10 +1,8 @@ | ||
| FROM nvcr.io/nvidia/tritonserver:24.04-py3-min as base | ||
| ARG PYTORCH_VERSION=2.6.0 | ||
| ARG PYTHON_VERSION=3.9 | ||
| ARG CUDA_VERSION=12.4 | ||
| ARG MAMBA_VERSION=23.1.0-1 | ||
| ARG CUDA_VERSION=12.6.1 | ||
| FROM nvidia/cuda:${CUDA_VERSION}-cudnn-devel-ubuntu22.04 | ||
| ARG PYTHON_VERSION=3.10 | ||
| ARG MAMBA_VERSION=24.7.1-0 | ||
| ARG TARGETPLATFORM | ||
|
|
||
| ENV PATH=/opt/conda/bin:$PATH \ | ||
| CONDA_PREFIX=/opt/conda | ||
|
|
||
|
|
@@ -21,7 +19,7 @@ RUN case ${TARGETPLATFORM} in \ | |
| "linux/arm64") MAMBA_ARCH=aarch64 ;; \ | ||
| *) MAMBA_ARCH=x86_64 ;; \ | ||
| esac && \ | ||
| curl -fsSL -o ~/mambaforge.sh -v "https://github.com/conda-forge/miniforge/releases/download/${MAMBA_VERSION}/Mambaforge-${MAMBA_VERSION}-Linux-${MAMBA_ARCH}.sh" && \ | ||
| curl -fsSL -o ~/mambaforge.sh "https://github.com/conda-forge/miniforge/releases/download/${MAMBA_VERSION}/Mambaforge-${MAMBA_VERSION}-Linux-${MAMBA_ARCH}.sh" && \ | ||
| bash ~/mambaforge.sh -b -p /opt/conda && \ | ||
| rm ~/mambaforge.sh | ||
|
|
||
|
|
@@ -36,39 +34,46 @@ RUN case ${TARGETPLATFORM} in \ | |
| WORKDIR /root | ||
|
|
||
| COPY ./requirements.txt /lightllm/requirements.txt | ||
| RUN pip install -r /lightllm/requirements.txt --no-cache-dir --ignore-installed --extra-index-url https://download.pytorch.org/whl/cu124 | ||
| RUN pip install -U pip | ||
| RUN pip install -r /lightllm/requirements.txt --no-cache-dir | ||
|
|
||
| RUN pip install --no-cache-dir https://github.com/ModelTC/flash-attn-3-build/releases/download/v2.7.4.post1/flash_attn-3.0.0b1-cp39-cp39-linux_x86_64.whl | ||
| RUN pip install --no-cache-dir vllm --pre --extra-index-url https://wheels.vllm.ai/nightly | ||
|
|
||
| RUN pip install --no-cache-dir nvidia-nccl-cu12==2.25.1 # for allreduce hang issues in multinode H100 | ||
| RUN git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v . | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The cloned |
||
|
|
||
| RUN git clone --recursive https://github.com/deepseek-ai/DeepGEMM.git | ||
| RUN cd DeepGEMM && python setup.py install | ||
| RUN apt-get update && apt-get install -y libnuma-dev wget devscripts debhelper dh-make build-essential dkms | ||
| RUN apt-get install -y ibverbs-providers infiniband-diags perftest rdma-core libibverbs-dev librdmacm-dev | ||
|
Comment on lines
+44
to
+45
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These two
|
||
|
|
||
| WORKDIR /root | ||
| RUN git clone https://github.com/deepseek-ai/DeepEP.git | ||
| ENV CUDA_HOME=/usr/local/cuda \ | ||
| GDRCOPY_HOME=/usr/src/gdrdrv-2.4.4/ | ||
|
|
||
| # NVSHMEM | ||
| RUN wget https://developer.download.nvidia.com/compute/redist/nvshmem/3.2.5/source/nvshmem_src_3.2.5-1.txz | ||
| RUN tar -xf nvshmem_src_3.2.5-1.txz \ | ||
| && mv nvshmem_src nvshmem | ||
| RUN mkdir -p /tmp/gdrcopy && cd /tmp \ | ||
| && git clone https://github.com/NVIDIA/gdrcopy.git -b v2.4.4 \ | ||
| && cd gdrcopy/packages \ | ||
| && CUDA=/usr/local/cuda ./build-deb-packages.sh \ | ||
| && dpkg -i gdrdrv-dkms_*.deb libgdrapi_*.deb gdrcopy-tests_*.deb gdrcopy_*.deb \ | ||
| && cd / && rm -rf /tmp/gdrcopy | ||
|
|
||
| WORKDIR /root/nvshmem | ||
| RUN git apply /root/DeepEP/third-party/nvshmem.patch | ||
| # Fix DeepEP IBGDA symlink | ||
| RUN ln -sf /usr/lib/x86_64-linux-gnu/libmlx5.so.1 /usr/lib/x86_64-linux-gnu/libmlx5.so | ||
|
|
||
| WORKDIR /root/nvshmem | ||
| ENV CUDA_HOME=/usr/local/cuda | ||
| RUN NVSHMEM_SHMEM_SUPPORT=0 \ | ||
| RUN wget https://developer.download.nvidia.com/compute/redist/nvshmem/3.3.9/source/nvshmem_src_cuda12-all-all-3.3.9.tar.gz \ | ||
| && tar -xf nvshmem_src_cuda12-all-all-3.3.9.tar.gz && mv nvshmem_src nvshmem \ | ||
| && cd nvshmem \ | ||
| && rm -f /root/nvshmem_src_cuda12-all-all-3.3.9.tar.gz \ | ||
| && NVSHMEM_SHMEM_SUPPORT=0 \ | ||
| NVSHMEM_UCX_SUPPORT=0 \ | ||
| NVSHMEM_USE_NCCL=0 \ | ||
| NVSHMEM_MPI_SUPPORT=0 \ | ||
| NVSHMEM_IBGDA_SUPPORT=1 \ | ||
| NVSHMEM_PMIX_SUPPORT=0 \ | ||
| NVSHMEM_TIMEOUT_DEVICE_POLLING=0 \ | ||
| NVSHMEM_USE_GDRCOPY=1 \ | ||
| cmake -S . -B build/ -DCMAKE_INSTALL_PREFIX=/root/nvshmem/install -DCMAKE_CUDA_ARCHITECTURES=90 -DMLX5_lib=/usr/lib/x86_64-linux-gnu/libmlx5.so.1 \ | ||
| && cd build \ | ||
| && make install -j64 | ||
| cmake -S . -B build/ -DCMAKE_INSTALL_PREFIX=/root/nvshmem/install -DCMAKE_CUDA_ARCHITECTURES=90 \ | ||
| && cmake --build build --target install -j64 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| ARG DEEPEP_COMMIT=b6ce310bb0b75079682d09bc2ebc063a074fbd58 | ||
| RUN git clone https://github.com/deepseek-ai/DeepEP.git && cd DeepEP && git checkout ${DEEPEP_COMMIT} && cd .. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| WORKDIR /root/DeepEP | ||
| ENV NVSHMEM_DIR=/root/nvshmem/install | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -152,7 +152,7 @@ def _flash_attention_triton_fwd( | |||||
|
|
||||||
| _flash_attn_v3_available = False | ||||||
| try: | ||||||
| from flash_attn_interface import _flash_attn_forward | ||||||
| from sgl_kernel.flash_attn import flash_attn_varlen_func | ||||||
|
|
||||||
| _flash_attn_v3_available = True | ||||||
|
|
||||||
|
|
@@ -166,36 +166,43 @@ def flash_attention_v3_fwd( | |||||
| ): | ||||||
| head_dim = q.shape[-1] | ||||||
| softmax_scale = head_dim ** -0.5 | ||||||
| _flash_attn_forward( | ||||||
| window_size = (-1, -1) | ||||||
| torch.ops.sgl_kernel.fwd.default( | ||||||
| q, | ||||||
| k, | ||||||
| v, | ||||||
| None, | ||||||
| None, # k_new, v_new | ||||||
| None, # k_new | ||||||
| None, # v_new | ||||||
| None, # qv | ||||||
| o, # out | ||||||
| cu_seqlens, | ||||||
| cu_seqlens, | ||||||
| None, # cu_seqlens_q/k/k_new | ||||||
| None, | ||||||
| None, # seqused_q/k | ||||||
| max_seqlen, | ||||||
| max_seqlen, # max_seqlen_q/k | ||||||
| None, | ||||||
| None, # cu_seqlens_k_new | ||||||
| None, | ||||||
| None, # page_table, kv_batch_idx, leftpad_k, | ||||||
| None, | ||||||
| None, # rotary_cos/sin | ||||||
| max_seqlen, | ||||||
| max_seqlen, | ||||||
| None, # page_table, | ||||||
| None, # kv_batch_idx | ||||||
| None, # leftpad_k | ||||||
| None, # rotary cos | ||||||
| None, # rotary sin | ||||||
| None, # seqlens_rotary | ||||||
| None, | ||||||
| None, | ||||||
| None, | ||||||
| softmax_scale, | ||||||
| False, # causal | ||||||
| window_size=(-1, -1), | ||||||
| softcap=0.0, | ||||||
| False, | ||||||
| window_size[0], | ||||||
| window_size[1], | ||||||
| 0.0, | ||||||
| is_rotary_interleaved=False, | ||||||
| scheduler_metadata=None, | ||||||
| num_splits=1, | ||||||
| pack_gqa=None, | ||||||
| sm_margin=0, | ||||||
| ) | ||||||
|
|
||||||
| return | ||||||
|
|
||||||
| except ImportError: | ||||||
|
|
@@ -205,10 +212,10 @@ def flash_attention_v3_fwd( | |||||
|
|
||||||
| def flash_attention_fwd(q, k, v, o, cu_seqlens, max_seqlen): | ||||||
| """ | ||||||
| 统一的 Flash Attention 接口。如果 _flash_attn_forward 存在, | ||||||
| 则使用 flash_attention_v3_fwd,否则使用 Triton 版本。 | ||||||
| 统一的 Flash Attention 接口。如果 sgl_kernel 存在, | ||||||
| 则使用 sgl_kernel里的接口,否则使用 Triton 版本。 | ||||||
| """ | ||||||
| if _flash_attn_v3_available and is_hopper(): | ||||||
| if _flash_attn_v3_available and is_hopper() and False: | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The condition
Suggested change
|
||||||
| flash_attention_v3_fwd(q, k, v, o, cu_seqlens, max_seqlen) | ||||||
| else: | ||||||
| _flash_attention_triton_fwd(q, k, v, o, cu_seqlens, max_seqlen) | ||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cloned
LightKernelrepository is not removed after installation. This increases the final Docker image size. It's a good practice to clean up build artifacts within the sameRUNlayer.