-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Summary
Add image size and bloat detection to the existing env-doctor dockerfile command. Flag oversized CUDA base images, suggest slimmer alternatives, and recommend multi-stage builds when compilation layers aren't needed
at runtime.
Motivation
CUDA Docker images are notoriously bloated — nvidia/cuda:12.4.0-devel-ubuntu22.04 is ~4.5GB while nvidia/cuda:12.4.0-runtime-ubuntu22.04 is ~1.2GB. Many users unknowingly ship devel images to production or use full
CUDA images when they only need the runtime. The current env-doctor dockerfile command validates correctness but doesn't flag optimization opportunities.
Current State
The existing Dockerfile validator (src/env_doctor/validators/dockerfile_validator.py) checks:
- Base image CUDA version compatibility
- Library version mismatches
- Runtime vs devel mismatch (only when compilation packages need devel)
- Driver installation errors
- Deprecated packages
It does NOT check:
- Base image size or bloat
- Whether devel is used unnecessarily (without compilation)
- Multi-stage build opportunities
- Redundant CUDA component installation on top of CUDA base images
- Slimmer alternative base images
Proposed Implementation
New Data: cuda_image_sizes.json
- Estimated sizes for common base image variants:
{ "nvidia/cuda": { "12.4.0-devel-ubuntu22.04": { "size_mb": 4500, "variant": "devel" }, "12.4.0-runtime-ubuntu22.04": { "size_mb": 1200, "variant": "runtime" }, "12.4.0-base-ubuntu22.04": { "size_mb": 250, "variant": "base" } } }
New Validation Methods in DockerfileValidator
- _validate_image_bloat()
- If using devel image without any compilation commands (nvcc, gcc, make, pip install from source), warn and suggest runtime or base variant
- Estimate base image size from lookup table, flag if >2GB without justification
- _validate_redundant_cuda_installs()
- Detect apt-get install cuda-* or conda install cudatoolkit on top of nvidia/cuda base images
- These are redundant and add gigabytes unnecessarily
- _suggest_multi_stage_build()
- If devel image IS needed for compilation, suggest a multi-stage build pattern:
Build stage: compile with devel
FROM nvidia/cuda:12.4.0-devel-ubuntu22.04 AS builder
RUN pip install flash-attn --no-build-isolation
Runtime stage: slim image
FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04
COPY --from=builder /usr/local/lib/python3.11/...
- Only trigger when both compilation packages AND devel image are detected
- _validate_unnecessary_layers()
- Flag apt-get install without --no-install-recommends (common bloat source)
- Flag missing rm -rf /var/lib/apt/lists/* after apt operations
- Flag pip install without --no-cache-dir
Integration
- Add new checks to existing DockerfileValidator.validate() pipeline
- Results appear as WARNING or INFO level alongside existing checks
- Include estimated size savings in recommendations
Acceptance Criteria
- Detects unnecessary devel image usage and suggests runtime/base alternatives
- Flags redundant CUDA package installations on CUDA base images
- Suggests multi-stage builds when compilation is needed but runtime doesn't need devel
- Flags common Dockerfile bloat patterns (missing --no-cache-dir, --no-install-recommends)
- Estimated size savings shown in recommendations
- Integrates cleanly into existing env-doctor dockerfile output
- No false positives: devel image with compilation commands is NOT flagged as bloated