-
Notifications
You must be signed in to change notification settings - Fork 294
update dockerfile #981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update dockerfile #981
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @shihaobai, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request focuses on a significant upgrade and refinement of the project's Docker build environments and core dependencies. The changes aim to modernize the build process by updating base images, CUDA, Python, and Mamba versions, while also optimizing the integration of high-performance kernels like Flash Attention through a shift to vllm and LightKernel.
Highlights
- Docker Environment Modernization: Upgraded the base CUDA image, CUDA version (12.4 to 12.6.1), Python version (3.9 to 3.10), and Mamba version (23.1.0-1 to 24.7.1-0) across both
DockerfileandDockerfile.deepep. - Optimized Kernel Integration: Revised the installation strategy for optimized kernels, moving from a direct
flash-attnwheel to installingvllmand buildingLightKernelfrom source within the Docker images. - Core Dependency Updates: Updated key Python packages in
requirements.txt, includingpackaging,setuptools, andtorch, while streamlining dependencies by removingtorchvision,triton, andsympy. - DeepEP Build Process Refinement: For
Dockerfile.deepep, the build process for DeepEP was significantly enhanced, incorporating a more robustgdrcopyinstallation, updated NVSHMEM build procedures, and pinning a specific DeepEP commit. - Flash Attention Interface Adaptation: Modified
flashattention_nopad.pyto adapt to the newsgl_kernelinterface for Flash Attention, although the new path is currently conditionally disabled.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the Dockerfiles and dependencies. My review focuses on Dockerfile best practices for reducing image size and build time, and I've identified a critical issue in the flash attention code that appears to be a leftover debugging flag. Addressing these points will improve the quality and efficiency of the build process and ensure the code is production-ready.
| 则使用 sgl_kernel里的接口,否则使用 Triton 版本。 | ||
| """ | ||
| if _flash_attn_v3_available and is_hopper(): | ||
| if _flash_attn_v3_available and is_hopper() and False: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition ... and False permanently disables this code path, forcing the function to always use the Triton kernel implementation. This looks like a temporary debugging change that should not be merged. If the flash_attention_v3_fwd path is not ready, it should be removed or properly guarded. If it is intended to be used, this and False must be removed.
| if _flash_attn_v3_available and is_hopper() and False: | |
| if _flash_attn_v3_available and is_hopper(): |
| RUN apt-get update && apt-get install -y libnuma-dev wget devscripts debhelper dh-make build-essential dkms | ||
| RUN apt-get install -y ibverbs-providers infiniband-diags perftest rdma-core libibverbs-dev librdmacm-dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two RUN instructions for apt-get should be combined into a single layer to optimize the Docker build process and reduce image size. Also, it's a best practice to:
- Use
--no-install-recommendsto avoid installing unnecessary packages. - Clean up the apt cache with
rm -rf /var/lib/apt/lists/*in the sameRUNcommand to prevent caching old package lists and to reduce image size.
RUN apt-get update && apt-get install -y --no-install-recommends \
libnuma-dev wget devscripts debhelper dh-make build-essential dkms \
ibverbs-providers infiniband-diags perftest rdma-core libibverbs-dev librdmacm-dev \
&& rm -rf /var/lib/apt/lists/*
| RUN pip install --no-cache-dir vllm --pre --extra-index-url https://wheels.vllm.ai/nightly | ||
|
|
||
| RUN pip install --no-cache-dir https://github.com/ModelTC/flash-attn-3-build/releases/download/v2.7.4.post1/flash_attn-3.0.0b1-cp39-cp39-linux_x86_64.whl | ||
| RUN git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cloned LightKernel repository is not removed after installation. This increases the final Docker image size. It's a good practice to clean up build artifacts within the same RUN layer.
RUN git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v . && cd .. && rm -rf LightKernel
| RUN git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v . | ||
|
|
||
| RUN pip install --no-cache-dir nvidia-nccl-cu12==2.25.1 # for allreduce hang issues in multinode H100 | ||
| RUN apt-get update && apt-get install -y libnuma-dev # for sgl_kernel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reduce the Docker image size, it's recommended to clean up the apt cache after installing packages. This should be done in the same RUN instruction to be effective. Using --no-install-recommends is also a good practice to avoid pulling in unnecessary packages.
RUN apt-get update && apt-get install -y --no-install-recommends libnuma-dev && rm -rf /var/lib/apt/lists/* # for sgl_kernel
| RUN pip install --no-cache-dir vllm --pre --extra-index-url https://wheels.vllm.ai/nightly | ||
|
|
||
| RUN pip install --no-cache-dir nvidia-nccl-cu12==2.25.1 # for allreduce hang issues in multinode H100 | ||
| RUN git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cloned LightKernel repository is not removed after installation. This increases the final Docker image size. It's a good practice to clean up build artifacts within the same RUN layer.
RUN git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v . && cd .. && rm -rf LightKernel
| && cd build \ | ||
| && make install -j64 | ||
| cmake -S . -B build/ -DCMAKE_INSTALL_PREFIX=/root/nvshmem/install -DCMAKE_CUDA_ARCHITECTURES=90 \ | ||
| && cmake --build build --target install -j64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| && cmake --build build --target install -j64 | ||
|
|
||
| ARG DEEPEP_COMMIT=b6ce310bb0b75079682d09bc2ebc063a074fbd58 | ||
| RUN git clone https://github.com/deepseek-ai/DeepEP.git && cd DeepEP && git checkout ${DEEPEP_COMMIT} && cd .. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No description provided.