refactor(docker): consolidate Dockerfiles into one multi-stage file#2806
refactor(docker): consolidate Dockerfiles into one multi-stage file#2806dierksen wants to merge 6 commits intoflashinfer-ai:mainfrom
Conversation
…terized files
Replace Dockerfile.cu{126,128,129,130}[.dev] with a single docker/Dockerfile
and docker/Dockerfile.dev, parameterized via CUDA_BASE_IMAGE, CUDA_VERSION,
NVIDIA_LIB_PATH, and INSTALL_TILELANG ARGs. Per-version values move into the
CI workflow matrix, making it the single source of truth — adding a new CUDA
version now requires adding one block to the matrix instead of copying a file.
Also adds cu131 support, fixes the cu130.dev base image (13.0.0 → 13.0.1),
and backfills TRITON_PTXAS_PATH for cu126 which was previously missing it.
AI-assisted
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ge build Add a shared `base` stage and named `prod`/`dev` targets in a single Dockerfile, replacing the two-file approach. CI uses --target prod; devcontainers use --target dev. ARGs are re-declared per stage per Docker scoping rules. AI-assisted Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The images are for CI testing and local dev, not prod/staging environments. AI-assisted Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Resolved release-ci-docker.yml conflict in favour of the matrix.include approach. Removed docker/Dockerfile.cu131 and docker/Dockerfile.cu131.dev added by flashinfer-ai#2774 — superseded by the parameterized Dockerfile. AI-assisted Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughConsolidates per-CUDA Dockerfiles into a single ARG-driven multi-stage Changes
Sequence Diagram(s)sequenceDiagram
participant GH as GitHub Actions
participant Docker as Docker Build
participant Registry as CUDA Base Image (registry)
participant Image as Multi-stage Dockerfile
GH->>Docker: trigger build (matrix entry + build args)
Docker->>Registry: pull ${CUDA_BASE_IMAGE}
Docker->>Image: run multi-stage build (ARGs: CUDA_BASE_IMAGE, CUDA_VERSION, PYTORCH_INDEX, NVIDIA_LIB_PATH, INSTALL_TILELANG)
Image->>Image: install python/conda, set LD_LIBRARY_PATH & TRITON_PTXAS_PATH, install packages, optional tilelang
Docker->>GH: produce image artifact (target=test) / push
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the Docker build infrastructure by consolidating multiple version-specific Dockerfiles into a single, highly configurable multi-stage Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Ignored Files
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request refactors the docker/Dockerfile into a multi-stage build (base, test, dev) to improve modularity and reduce image size. It introduces arguments for CUDA_BASE_IMAGE, NVIDIA_LIB_PATH, CUDA_VERSION, and INSTALL_TILELANG to make the Dockerfile more flexible and configurable. The test stage now handles Python environment setup, package installation (including conditional tilelang installation), and pip user-site configuration, while the dev stage installs development tools. A review comment suggests an improvement to further reduce image size by cleaning the apt cache after package installations in the base stage.
| curl \ | ||
| git \ | ||
| wget \ | ||
| wget |
There was a problem hiding this comment.
To keep the Docker image size minimal, it's a good practice to clean up the apt cache after installing packages. You can do this by chaining && rm -rf /var/lib/apt/lists/* to your apt-get install command. This will reduce the size of this base layer and all subsequent images built from it.
wget && rm -rf /var/lib/apt/lists/*
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docker/Dockerfile (1)
50-96:⚠️ Potential issue | 🟠 MajorKeep the
devtarget on the same CUDA library path astest.Lines 25-26 prepend the pip-installed NVIDIA libs in
test, butdevnever declaresNVIDIA_LIB_PATHor exports the equivalentLD_LIBRARY_PATH. After this refactor, the local dev image can load a different cuBLAS/cuDNN stack than CI for the sameCUDA_VERSION.🔧 Suggested fix
ARG USERNAME=devuser ARG USER_UID=1003 ARG USER_GID=$USER_UID +ARG NVIDIA_LIB_PATHENV PATH="/home/$USERNAME/conda/bin:$PATH" ENV PATH="/home/$USERNAME/conda/envs/py312/bin:$PATH" +ENV LD_LIBRARY_PATH="/home/${USERNAME}/conda/envs/py312/lib/python3.12/site-packages/${NVIDIA_LIB_PATH}/:$LD_LIBRARY_PATH"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker/Dockerfile` around lines 50 - 96, The dev Dockerfile target is missing the NVIDIA library path environment setup, causing dev to use a different CUDA libs stack than test; update the dev stage (target "dev") to accept ARG CUDA_VERSION, compute or set NVIDIA_LIB_PATH the same way the test stage does, and export LD_LIBRARY_PATH to include $NVIDIA_LIB_PATH (and preserve existing PATH) so pip-installed NVIDIA libs are prepended consistently between dev and test—apply these changes near where ENV PATH is set and where CUDA_VERSION/INSTALL_TILELANG args are handled.
🧹 Nitpick comments (1)
.github/workflows/release-ci-docker.yml (1)
39-59: The CUDA version list is still duplicated in this workflow.
create-manifests(Lines 101-123) andupdate-docker-tags(Lines 135-157) still hardcodecu126-cu131, so adding the next CUDA block here will still require touching multiple sections. Consider driving those consumers from the same source data so this matrix actually becomes the single source of truth.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/release-ci-docker.yml around lines 39 - 59, The CUDA matrix is duplicated across the workflow (the matrix include block shown here and the hardcoded cu126–cu131 lists used by the create-manifests and update-docker-tags jobs), so centralize the CUDA entries into a single source and reference it from both jobs; update the matrix include under strategy.matrix (the block with entries containing cuda, base_image, nvidia_lib_path, install_tilelang) to be the canonical list and modify create-manifests and update-docker-tags to consume that same list (e.g., via a workflow-level matrix, YAML anchors/aliases, or a reusable job/step that reads the centralized list) so adding a new CUDA entry only requires changing the single matrix definition rather than multiple hardcoded sections.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@docker/Dockerfile`:
- Around line 50-96: The dev Dockerfile target is missing the NVIDIA library
path environment setup, causing dev to use a different CUDA libs stack than
test; update the dev stage (target "dev") to accept ARG CUDA_VERSION, compute or
set NVIDIA_LIB_PATH the same way the test stage does, and export LD_LIBRARY_PATH
to include $NVIDIA_LIB_PATH (and preserve existing PATH) so pip-installed NVIDIA
libs are prepended consistently between dev and test—apply these changes near
where ENV PATH is set and where CUDA_VERSION/INSTALL_TILELANG args are handled.
---
Nitpick comments:
In @.github/workflows/release-ci-docker.yml:
- Around line 39-59: The CUDA matrix is duplicated across the workflow (the
matrix include block shown here and the hardcoded cu126–cu131 lists used by the
create-manifests and update-docker-tags jobs), so centralize the CUDA entries
into a single source and reference it from both jobs; update the matrix include
under strategy.matrix (the block with entries containing cuda, base_image,
nvidia_lib_path, install_tilelang) to be the canonical list and modify
create-manifests and update-docker-tags to consume that same list (e.g., via a
workflow-level matrix, YAML anchors/aliases, or a reusable job/step that reads
the centralized list) so adding a new CUDA entry only requires changing the
single matrix definition rather than multiple hardcoded sections.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d44d0849-dc74-47d7-b817-9d0d0cd47b34
📒 Files selected for processing (11)
.github/workflows/release-ci-docker.ymldocker/Dockerfiledocker/Dockerfile.cu126docker/Dockerfile.cu126.devdocker/Dockerfile.cu128docker/Dockerfile.cu128.devdocker/Dockerfile.cu129docker/Dockerfile.cu129.devdocker/Dockerfile.cu130docker/Dockerfile.cu130.devdocker/Dockerfile.cu131
💤 Files with no reviewable changes (9)
- docker/Dockerfile.cu129
- docker/Dockerfile.cu128
- docker/Dockerfile.cu131
- docker/Dockerfile.cu130
- docker/Dockerfile.cu126.dev
- docker/Dockerfile.cu130.dev
- docker/Dockerfile.cu129.dev
- docker/Dockerfile.cu128.dev
- docker/Dockerfile.cu126
…om CUDA version PyTorch doesn't publish cu131 wheels yet, so cu131 images fall back to the cu130 index. A TODO comment marks where to update when cu131 wheels ship. PYTORCH_INDEX defaults to CUDA_VERSION so existing versions are unaffected. AI-assisted Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
docker/Dockerfile (2)
6-9: Consider adding--no-install-recommendsto reduce image size.As flagged by static analysis, adding
--no-install-recommendsprevents installation of recommended but unnecessary packages, reducing image size.🔧 Proposed fix
RUN apt-get update && apt-get install -y \ + --no-install-recommends \ curl \ git \ wget🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker/Dockerfile` around lines 6 - 9, The apt-get install command in the Dockerfile (the RUN layer that installs curl, git, wget) should include --no-install-recommends to avoid pulling recommended packages and reduce image size; update the RUN command that currently runs "apt-get update && apt-get install -y \ curl \ git \ wget" to add --no-install-recommends (and combine with apt-get clean / rm -rf /var/lib/apt/lists if desired) so the image only installs required packages.
50-58: Consider adding--no-install-recommendshere as well.Same recommendation as the base stage.
🔧 Proposed fix
RUN apt-get update && apt-get install -y \ + --no-install-recommends \ clang-format \ clangd-19 \ vim \ zsh \ && rm -rf /var/lib/apt/lists/*🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker/Dockerfile` around lines 50 - 58, In the dev stage (the Dockerfile block beginning with "FROM base AS dev") add the apt option "--no-install-recommends" to the "apt-get install -y" command so the RUN line that currently reads "RUN apt-get update && apt-get install -y clang-format clangd-19 vim zsh && rm -rf /var/lib/apt/lists/*" becomes the non-recommends variant; this mirrors the base stage change and reduces unnecessary recommended packages during image build.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@docker/Dockerfile`:
- Around line 6-9: The apt-get install command in the Dockerfile (the RUN layer
that installs curl, git, wget) should include --no-install-recommends to avoid
pulling recommended packages and reduce image size; update the RUN command that
currently runs "apt-get update && apt-get install -y \ curl \ git \ wget" to add
--no-install-recommends (and combine with apt-get clean / rm -rf
/var/lib/apt/lists if desired) so the image only installs required packages.
- Around line 50-58: In the dev stage (the Dockerfile block beginning with "FROM
base AS dev") add the apt option "--no-install-recommends" to the "apt-get
install -y" command so the RUN line that currently reads "RUN apt-get update &&
apt-get install -y clang-format clangd-19 vim zsh && rm -rf
/var/lib/apt/lists/*" becomes the non-recommends variant; this mirrors the base
stage change and reduces unnecessary recommended packages during image build.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: eb0ff1b3-464e-4711-a2d0-72b3e867ed21
📒 Files selected for processing (2)
.github/workflows/release-ci-docker.ymldocker/Dockerfile
… Dockerfile Point all devcontainer configs at docker/Dockerfile with target=dev and per-version build args (CUDA_BASE_IMAGE, CUDA_VERSION, PYTORCH_INDEX), replacing the deleted per-version Dockerfile.cu*.dev references. AI-assisted Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Dockerfile.cu{126,128,129,130,131}[.dev]) with a singledocker/Dockerfileusing multi-stage builds and DockerARGparameterizationbase(common apt deps),test(CI runner image),dev(local development with non-root user, zsh, clangd, pre-commit)base_image,nvidia_lib_path,install_tilelang) move into the CI workflowmatrix.include— adding a new CUDA version now requires one new block there instead of copying filesFixes along the way
Dockerfile.cu130.devhad wrong base image13.0.0→ fixed to13.0.1(consistent with the test image)Dockerfile.cu126was missingTRITON_PTXAS_PATH→ backfilled for all versionsTest plan
docker buildx build --target dev \ --build-arg CUDA_BASE_IMAGE=nvidia/cuda:13.0.1-devel-ubuntu24.04 \ --build-arg CUDA_VERSION=cu130 \ -f docker/Dockerfile .🤖 Generated with Claude Code
Summary by CodeRabbit