Skip to content

Commit 1cbb3ca

Browse files
gwangmubcumming
andauthored
Add a warning message about "--environment in SBATCH" (#237)
Resolves: [VCUE-1014](https://jira.cscs.ch/browse/VCUE-1014) As specifying "--environment" as an "#SBATCH" option has caused many nonsensical problems, the container team decided to add an explicit warning (alongside some mitigations to known errors). To avoid distraction, the warning message was kept short in the main usage page ("Using container engine") and linked to the main warning message ("Known issues"). --------- Co-authored-by: Gwangmu Lee <[email protected]> Co-authored-by: Ben Cumming <[email protected]>
1 parent 785f8ce commit 1cbb3ca

File tree

2 files changed

+31
-5
lines changed

2 files changed

+31
-5
lines changed

docs/software/container-engine/known-issue.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,3 +52,16 @@ Mounting individual home directories (usually located on the `/users` filesystem
5252

5353
It is generally NOT recommended to mount home folders inside containers, due to the risk of exposing personal data to programs inside the container.
5454
Defining a mount related to `/users` in the EDF should only be done when there is a specific reason to do so, and the container image being deployed is trusted.
55+
56+
[](){#ref-ce-why-no-sbatch-env}
57+
## Why `--environment` as `#SBATCH` is discouraged
58+
59+
The use of `--environment` as `#SBATCH` is known to cause **unexpected behaviors** and is exclusively reserved for highly customized workflows. This is because `--environment` as `#SBATCH` puts the entire SBATCH script in a container from the EDF file. The following are a few known associated issues.
60+
61+
- **Slurm availability in a container**: Either Slurm components are not completely injected inside a container, or injected Slurm components do not function properly.
62+
63+
- **Non-host execution context**: Since the SBATCH script runs inside a container, most host resources are inaccessible by default unless EDF explicitly exposes them. Affected resources include: filesystems, devices, system resources, container hooks, etc.
64+
65+
- **Nested use of `--environment`**: running `srun --environment` in `#SBATCH --environment` results in double-entering EDF containers, causing unexpected errors in the underlying container runtime.
66+
67+
To avoid any unexpected confusion, users are advised **not** to use `--environment` as `#SBATCH`. If users encounter a problem while using this, it's recommended to move `--environment` from `#SBATCH` to each `srun` and see if the problem disappears.

docs/software/container-engine/run.md

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -34,17 +34,30 @@ Use `--environment` with the Slurm command (e.g., `srun` or `salloc`):
3434
#SBATCH --job-name=edf-example
3535
#SBATCH --time=00:01:00
3636
...
37-
38-
# Run job step
3937
srun --environment=ubuntu cat /etc/os-release
4038
```
4139

40+
Multiple Slurm commands may have different EDF environments; this is useful when a single environment is not feasible due to compatibility issues or keep EDF files modular.
41+
42+
!!! example "`srun`s with different EDFs"
43+
```bash
44+
#!/bin/bash
45+
#SBATCH --job-name=edf-example
46+
#SBATCH --time=00:01:00
47+
...
48+
srun --environment=env1 ... # (1)!
49+
...
50+
srun --environment=env2 ... # (2)!
51+
```
52+
53+
1. Assuming `env1.toml` is at `EDF_PATH`. See [EDF search path][ref-ce-edf-search-path] below.
54+
2. Assuming `env2.toml` is at `EDF_PATH`. See [EDF search path][ref-ce-edf-search-path] below.
55+
4256
Specifying the `--environment` option with an `#SBATCH` option is **experimental**.
4357
Such usage is discouraged as it may result in unexpected behaviors.
4458

45-
!!! note
46-
Specifying `--environment` with `#SBATCH` will put the entire batch script inside the containerized environment, requiring the Slurm hook to use any Slurm commands within the batch script (e.g., `srun` or `scontrol`).
47-
The hook is controlled by the `ENROOT_SLURM_HOOK` environment variable and activated by default on most vClusters.
59+
!!! warning
60+
The use of `--environment` as an `#SBATCH` option is reserved for highly customized workflows, and it may result in several **counterintuitive, hard-to-diagnose failures**. See [Why `--environment` as `#SBATCH` is discouraged][ref-ce-why-no-sbatch-env] for details.
4861

4962
[](){#ref-ce-edf-search-path}
5063
### EDF search path

0 commit comments

Comments
 (0)