Skip to content

Conversation

@shihaobai
Copy link
Collaborator

No description provided.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @shihaobai, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on a significant upgrade and refinement of the project's Docker build environments and core dependencies. The changes aim to modernize the build process by updating base images, CUDA, Python, and Mamba versions, while also optimizing the integration of high-performance kernels like Flash Attention through a shift to vllm and LightKernel.

Highlights

  • Docker Environment Modernization: Upgraded the base CUDA image, CUDA version (12.4 to 12.6.1), Python version (3.9 to 3.10), and Mamba version (23.1.0-1 to 24.7.1-0) across both Dockerfile and Dockerfile.deepep.
  • Optimized Kernel Integration: Revised the installation strategy for optimized kernels, moving from a direct flash-attn wheel to installing vllm and building LightKernel from source within the Docker images.
  • Core Dependency Updates: Updated key Python packages in requirements.txt, including packaging, setuptools, and torch, while streamlining dependencies by removing torchvision, triton, and sympy.
  • DeepEP Build Process Refinement: For Dockerfile.deepep, the build process for DeepEP was significantly enhanced, incorporating a more robust gdrcopy installation, updated NVSHMEM build procedures, and pinning a specific DeepEP commit.
  • Flash Attention Interface Adaptation: Modified flashattention_nopad.py to adapt to the new sgl_kernel interface for Flash Attention, although the new path is currently conditionally disabled.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the Dockerfiles and dependencies. My review focuses on Dockerfile best practices for reducing image size and build time, and I've identified a critical issue in the flash attention code that appears to be a leftover debugging flag. Addressing these points will improve the quality and efficiency of the build process and ensure the code is production-ready.

则使用 sgl_kernel里的接口,否则使用 Triton 版本。
"""
if _flash_attn_v3_available and is_hopper():
if _flash_attn_v3_available and is_hopper() and False:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The condition ... and False permanently disables this code path, forcing the function to always use the Triton kernel implementation. This looks like a temporary debugging change that should not be merged. If the flash_attention_v3_fwd path is not ready, it should be removed or properly guarded. If it is intended to be used, this and False must be removed.

Suggested change
if _flash_attn_v3_available and is_hopper() and False:
if _flash_attn_v3_available and is_hopper():

Comment on lines +44 to +45
RUN apt-get update && apt-get install -y libnuma-dev wget devscripts debhelper dh-make build-essential dkms
RUN apt-get install -y ibverbs-providers infiniband-diags perftest rdma-core libibverbs-dev librdmacm-dev
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

These two RUN instructions for apt-get should be combined into a single layer to optimize the Docker build process and reduce image size. Also, it's a best practice to:

  • Use --no-install-recommends to avoid installing unnecessary packages.
  • Clean up the apt cache with rm -rf /var/lib/apt/lists/* in the same RUN command to prevent caching old package lists and to reduce image size.
RUN apt-get update && apt-get install -y --no-install-recommends \
    libnuma-dev wget devscripts debhelper dh-make build-essential dkms \
    ibverbs-providers infiniband-diags perftest rdma-core libibverbs-dev librdmacm-dev \
    && rm -rf /var/lib/apt/lists/*

RUN pip install --no-cache-dir vllm --pre --extra-index-url https://wheels.vllm.ai/nightly

RUN pip install --no-cache-dir https://github.com/ModelTC/flash-attn-3-build/releases/download/v2.7.4.post1/flash_attn-3.0.0b1-cp39-cp39-linux_x86_64.whl
RUN git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The cloned LightKernel repository is not removed after installation. This increases the final Docker image size. It's a good practice to clean up build artifacts within the same RUN layer.

RUN git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v . && cd .. && rm -rf LightKernel

RUN git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v .

RUN pip install --no-cache-dir nvidia-nccl-cu12==2.25.1 # for allreduce hang issues in multinode H100
RUN apt-get update && apt-get install -y libnuma-dev # for sgl_kernel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To reduce the Docker image size, it's recommended to clean up the apt cache after installing packages. This should be done in the same RUN instruction to be effective. Using --no-install-recommends is also a good practice to avoid pulling in unnecessary packages.

RUN apt-get update && apt-get install -y --no-install-recommends libnuma-dev && rm -rf /var/lib/apt/lists/* # for sgl_kernel

RUN pip install --no-cache-dir vllm --pre --extra-index-url https://wheels.vllm.ai/nightly

RUN pip install --no-cache-dir nvidia-nccl-cu12==2.25.1 # for allreduce hang issues in multinode H100
RUN git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The cloned LightKernel repository is not removed after installation. This increases the final Docker image size. It's a good practice to clean up build artifacts within the same RUN layer.

RUN git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v . && cd .. && rm -rf LightKernel

&& cd build \
&& make install -j64
cmake -S . -B build/ -DCMAKE_INSTALL_PREFIX=/root/nvshmem/install -DCMAKE_CUDA_ARCHITECTURES=90 \
&& cmake --build build --target install -j64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The nvshmem source directory is kept after installation, which increases the image size. It can be removed after the build and install is complete within the same RUN layer to be effective.

 && cmake --build build --target install -j64 && cd /root && rm -rf nvshmem

&& cmake --build build --target install -j64

ARG DEEPEP_COMMIT=b6ce310bb0b75079682d09bc2ebc063a074fbd58
RUN git clone https://github.com/deepseek-ai/DeepEP.git && cd DeepEP && git checkout ${DEEPEP_COMMIT} && cd ..
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The cloned DeepEP repository is not removed after installation, which increases the final Docker image size. Consider removing the repository after it's installed in the subsequent steps. This should be done in the same RUN layer as the installation to be effective.

@shihaobai shihaobai merged commit ff9e387 into main Jul 23, 2025
1 check passed
@shihaobai shihaobai deleted the new_docker branch July 23, 2025 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants