Skip to content

login and worker pod fail because mounted jail root volume do not have mountpoints for virtual filesystems #299

@lgprobert

Description

@lgprobert

I try soperator 1.16.1. I have to build populate_jail and worker_slurmd images because the image pull are always failed due to image size. I uses NFS as the shared storage for the test.

After apply slurn-cluster helm chart, the slurm1-populate-jail pod finishes runing and exits after some time. I guess it prepares the jail root for login and worker nodes.

But login and worker nodes all fail with CrashLoopBackOff error. Look into the pod logs give same log messags as below:

Starting slurmd entrypoint script
cgroup v2 detected, creating cgroup for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9fa86a0e_4917_4b4a_a37d_afb81545c892.slice/cri-containerd-d66f3229577110ba455bf50d4cc12422dbef9dd7fb829dec0a6d70ca559b8934.scope
Link users from jail
Link home from jail because slurmd uses it
Bind-mount slurm configs from K8S config map
Make ulimits as big as possible
Apply sysctl limits from /etc/sysctl.conf
vm.max_map_count = 655300
Update linker cache
Complement jail rootfs
+ set -e
+ getopts j:u:wh flag
+ case "${flag}" in
+ jaildir=/mnt/jail
+ getopts j:u:wh flag
+ case "${flag}" in
+ upperdir=/mnt/jail.upper
+ getopts j:u:wh flag
+ case "${flag}" in
+ worker=1
+ getopts j:u:wh flag
+ '[' -z /mnt/jail ']'
+ '[' -z /mnt/jail.upper ']'
+ pushd /mnt/jail
+ echo 'Bind-mount virtual filesystems'
/mnt/jail /
Bind-mount virtual filesystems
+ mount -t proc /proc proc/

If it can help, I got the file list of the shared jail volume directory:

$ ls /srv/nfs/kubedata/jail/
assoc_mgr_state      fed_mgr_state      jwt_hs256.key     node_state.old               qos_usage       trigger_state.old
assoc_mgr_state.old  fed_mgr_state.old  last_config_lite  oci-layout                   qos_usage.old
assoc_usage          heartbeat          last_tres         part_state                   repositories
assoc_usage.old      index.json         last_tres.old     part_state.old               resv_state
blobs                job_state          manifest.json     priority_last_decay_ran      resv_state.old
clustername          job_state.old      node_state        priority_last_decay_ran.old  trigger_state

I think jail volume is mounted to /mnt/jail, then /opt/bin/slurm/complement_jail.sh -j /mnt/jail -u /mnt/jail.upper is triggered by container entrypoint script. The script changes working directory to /mnt/jail, it then tries to mount the virtual filesystems but it is obvious the mountpoints are not present.

    mount -t proc /proc proc/
    mount -t sysfs /sys sys/
    mount --rbind /dev dev/
    mount --rbind /run run/

How this can be workaround or anything wrong with my setup?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions