-
Notifications
You must be signed in to change notification settings - Fork 159
slurm: set containerd root to EBS #914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thanks @maekawataiki for reporting the issue and creating this PR. The suggested fix did not take effect as expected because the original $ bash easy-ssh.sh -r us-east-1 hyperpod-after-20241216
=================================================
==== 🚀 HyperPod Cluster Easy SSH Script! 🚀 ====
=================================================
srun Cluster id: jhroxiiv5v3e
Instance id: i-05060251d0a782283
Node Group: controller-machine
SSH User: ubuntu
1. Detected hyperpod-after-20241216 in ~/.ssh/config. Skipping adding...
2. Detected SSH public key ~/.ssh/id_rsa.pub for user ubuntu on the cluster. Skipping adding...
Now you can run:
$ ssh hyperpod-after-20241216
Starting session with SessionId: i-0f5934b931601f25a-epjgelyqlb4aq6l44epdkeuo4q
$ srun cat /etc/containerd/config.toml
# Copyright 2018-2022 Docker Inc.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
disabled_plugins = ["cri"]
#root = "/opt/dlami/nvme/docker/containerd" # Here
#state = "/run/containerd"
#subreaper = true
#oom_score = 0
#[grpc]
# address = "/run/containerd/containerd.sock"
# uid = 0
# gid = 0
#[debug]
# address = "/run/containerd/debug.sock"
# uid = 0
# gid = 0
# level = "info"
$ |
|
For reference, here's the original $ bash easy-ssh.sh -r us-east-1 hyperpod-before-20241216
=================================================
==== 🚀 HyperPod Cluster Easy SSH Script! 🚀 ====
=================================================
Cluster id: 69q9l3vgs5iv
Instance id: i-0105da2ccc9eae353
Node Group: controller-machine
SSH User: ubuntu
1. Detected hyperpod-before-20241216 in ~/.ssh/config. Skipping adding...
2. Detected SSH public key ~/.ssh/id_rsa.pub for user ubuntu on the cluster. Skipping adding...
Now you can run:
$ ssh hyperpod-before-20241216
Starting session with SessionId: i-0f5934b931601f25a-dab2bng46eqgfk9a3vyx8pesdq
$ srun cat /etc/containerd/config.toml
# Copyright 2018-2022 Docker Inc.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
disabled_plugins = ["cri"]
#root = "/var/lib/containerd"
#state = "/run/containerd"
#subreaper = true
#oom_score = 0
#[grpc]
# address = "/run/containerd/containerd.sock"
# uid = 0
# gid = 0
#[debug]
# address = "/run/containerd/debug.sock"
# uid = 0
# gid = 0
# level = "info"
$ |
|
In the original version, |
|
Workaround on existing clusters: |
Removed state configuration from containerd setup for both paths.
| containerd config default | sudo tee /etc/containerd/config.toml >/dev/null | ||
| fi | ||
| sudo sed -i \ | ||
| -e 's|^#\\?root *=.*|root = "/opt/dlami/nvme/docker/containerd"|' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested these changes but it didn't work but the below did
sudo sed -i -e 's|^#\?root *=.*|root = "/opt/sagemaker/docker/containerd"|'
| containerd config default | sudo tee /etc/containerd/config.toml >/dev/null | ||
| fi | ||
| sudo sed -i \ | ||
| -e 's|^#\\?root *=.*|root = "/opt/sagemaker/docker/containerd"|' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested these changes but it didn't work but the below did
sudo sed -i -e 's|^#\?root *=.*|root = "/opt/sagemaker/docker/containerd"|'
| containerd config default | sudo tee /etc/containerd/config.toml >/dev/null | ||
| fi | ||
| sudo sed -i \ | ||
| -e 's|^#\\?root *=.*|root = "/opt/sagemaker/docker/containerd"|' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use /opt/sagemaker/containerd/data-root instead of /opt/sagemaker/docker/containerd? for consistency with HyperPod EKS side.
Issue #, if available:
#913 (related #127)
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.