You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/software/container-engine/run.md
+27Lines changed: 27 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,6 +24,33 @@ There are three ways to do so:
24
24
!!! note "Shared container at the node-level"
25
25
For memory efficiency reasons, all Slurm tasks on an individual compute node share the same container, including its filesystem. As a consequence, any write operation to the container filesystem by one task will eventually become visible to all other tasks on the same node.
26
26
27
+
!!! warning "Container start failure with `id: cannot find name for user ID`"
28
+
If your slurm job using a container fails to start with an error message similar to:
29
+
```console
30
+
slurmstepd: error: pyxis: container start failed with error code: 1
31
+
slurmstepd: error: pyxis: container exited too soon
slurmstepd: error: spank: required plugin spank_pyxis.so: task_init() failed with rc=-1
39
+
slurmstepd: error: Failed to invoke spank plugin stack
40
+
srun: error: nid001234: task 0: Exited with exit code 1
41
+
srun: Terminating StepId=12345.0
42
+
```
43
+
it does not indicate an issue with your container, but instead means that one or more of the compute nodes have user databases that are not fully synchronized.
44
+
If the problematic node is not automatically drained, please [let us know][ref-get-in-touch] so that we can ensure the node is in a good state.
45
+
You can check the state of a node using `sinfo --nodes=<node>`, e.g.:
46
+
```console
47
+
$ sinfo --nodes=nid006886
48
+
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
49
+
debug up 1:30:00 0 n/a
50
+
normal* up 12:00:00 1 drain$ nid006886
51
+
xfer up 1-00:00:00 0 n/a
52
+
```
53
+
27
54
### Use from batch scripts
28
55
29
56
Use `--environment` with the Slurm command (e.g., `srun` or `salloc`):
0 commit comments