-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Open
Labels
Description
🚀 The feature, motivation and pitch
Description
Currently, every build instance pulls the base Docker images (nvidia/cuda, Python base, PyTorch, etc.) from registries on first use. These base images are large (~8-10GB) and rarely change. By pre-pulling and caching these images in the EC2 AMI using Packer, instances start with a warm cache, eliminating the initial pull time and reducing build times, especially for new/cold instances.
What You'll Do
- Analyze which base images are most frequently used:
nvidia/cuda:12.9.1-devel-ubuntu20.04nvidia/cuda:12.9.1-runtime-ubuntu22.04- PyTorch images from
download.pytorch.org - Python base images
- Modify Packer configuration to:
- Pull base images during AMI build
- Store them in
/var/lib/docker - Verify images are properly saved in Docker cache
- Calculate optimal set of images to bake in (balance AMI size vs. benefit)
- Set up AMI update automation:
- Rebuild AMI monthly or when base images update
- Version AMI appropriately
- Update Terraform to use new AMI
- Test that instances launch with pre-cached images
- Document AMI maintenance procedures
Deliverables
- Analysis document: which base images to pre-cache and why
- Modified Packer configuration with image pre-caching
- Script to identify and pull latest base image versions
- AMI size comparison (before/after)
- Build time comparison (cold start with/without pre-cached images)
- Automation for monthly AMI rebuilds
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Todo