Skip to content

[SLURM] Docker build fails due to insufficient disk space #913

@maekawataiki

Description

@maekawataiki

Problem Description
Docker builds on the Slurm controller node fail with no space left on device error. The root volume has only 50GB capacity left by default, leaving insufficient space for building Docker images.

Impact
This issue affects multiple workshops and documentation that require Docker image builds, causing build failures and blocking users from completing the tutorials.

Affected Resources
Workshop: Picotron Docker Setup
Repository: NCCL Tests

Proposed Solutions
Change containerd root to EBS volume such as /opt/sagemaker

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions