Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ sudo usermod -aG docker ubuntu
# See: https://github.com/aws-samples/awsome-distributed-training/issues/127
#
# Docker workdir doesn't like Lustre. Tried with storage driver overlay2, fuse-overlayfs, & vfs.
# Also, containerd ships with a commented root in its default config; we need to ensure an
# uncommented root that points to the fast local volume.
if [[ $(mount | grep /opt/sagemaker) ]]; then
cat <<EOL >> /etc/docker/daemon.json
{
Expand All @@ -66,6 +68,14 @@ EOL
sed -i \
's|^\[Service\]$|[Service]\nEnvironment="DOCKER_TMPDIR=/opt/sagemaker/docker/tmp"|' \
/usr/lib/systemd/system/docker.service

# Ensure containerd config exists and point its root to /opt/sagemaker
if [[ ! -f /etc/containerd/config.toml ]]; then
containerd config default | sudo tee /etc/containerd/config.toml >/dev/null
fi
sudo sed -i \
-e 's|^#\\?root *=.*|root = "/opt/sagemaker/docker/containerd"|' \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested these changes but it didn't work but the below did
sudo sed -i -e 's|^#\?root *=.*|root = "/opt/sagemaker/docker/containerd"|'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use /opt/sagemaker/containerd/data-root instead of /opt/sagemaker/docker/containerd? for consistency with HyperPod EKS side.

See: https://github.com/aws-samples/awsome-distributed-training/blob/main/1.architectures/7.sagemaker-hyperpod-eks/LifecycleScripts/base-config/on_create_main.sh#L70

/etc/containerd/config.toml
elif [[ $(mount | grep /opt/dlami/nvme) ]]; then
cat <<EOL >> /etc/docker/daemon.json
{
Expand All @@ -76,7 +86,16 @@ EOL
sed -i \
's|^\[Service\]$|[Service]\nEnvironment="DOCKER_TMPDIR=/opt/dlami/nvme/docker/tmp"|' \
/usr/lib/systemd/system/docker.service

# Ensure containerd config exists and point its root to /opt/dlami/nvme
if [[ ! -f /etc/containerd/config.toml ]]; then
containerd config default | sudo tee /etc/containerd/config.toml >/dev/null
fi
sudo sed -i \
-e 's|^#\\?root *=.*|root = "/opt/dlami/nvme/docker/containerd"|' \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested these changes but it didn't work but the below did
sudo sed -i -e 's|^#\?root *=.*|root = "/opt/sagemaker/docker/containerd"|'

/etc/containerd/config.toml
fi

systemctl daemon-reload
systemctl restart docker
systemctl restart containerd
systemctl restart docker