Skip to content

Commit 92805a0

Browse files
committed
Fix merge (update Jupyter refs)
1 parent 0626ccd commit 92805a0

File tree

2 files changed

+5
-5
lines changed

2 files changed

+5
-5
lines changed

docs/access/jupyterlab.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ If the default base images do not meet your requirements, you can specify a cust
8383
```
8484

8585
1. Avoid mounting all of `$HOME` to avoid subtle issues with cached files, but mount Jupyter kernels
86-
2. Enable SLURM commands (together with two subsequent mounts)
86+
2. Enable Slurm commands (together with two subsequent mounts)
8787
3. Currently only required on Daint and Santis, not on Clariden
8888
4. Set working directory of Jupyter session (file browser root directory)
8989
5. Use environment settings for optimized communication
@@ -215,7 +215,7 @@ A popular approach to run multi-GPU ML workloads is with `accelerate` and `torch
215215
!!! note "Notebook structure"
216216
In none of these scenarios any significant memory allocations or background computations are performed on the main Jupyter process. Instead, the resources are kept available for the processes launched by `accelerate` or `torchrun`, respectively.
217217

218-
Alternatively to using these launchers, it is also possible to use SLURM to obtain more control over resource mappings, e.g. by launching an overlapping SLURM step onto the same node used by the Jupyter process. An example with the container engine looks like this:
218+
Alternatively to using these launchers, it is also possible to use Slurm to obtain more control over resource mappings, e.g. by launching an overlapping Slurm step onto the same node used by the Jupyter process. An example with the container engine looks like this:
219219

220220
```bash
221221
!srun --overlap -ul --environment /path/to/edf.toml \
@@ -226,7 +226,7 @@ Alternatively to using these launchers, it is also possible to use SLURM to obta
226226
python train.py ..."
227227
```
228228

229-
where `/path/to/edf.toml` should be replaced by the TOML file and `train.py` is a script using `torch.distributed` for distributed training. This can be further customized with extra SLURM options.
229+
where `/path/to/edf.toml` should be replaced by the TOML file and `train.py` is a script using `torch.distributed` for distributed training. This can be further customized with extra Slurm options.
230230

231231
!!! warning "Concurrent usage of resources"
232232
Subtle bugs can occur when running multiple Jupyter notebooks concurrently that each assume access to the full node. Also, some notebooks may hold on to resources such as spawned child processes or allocated memory despite having completed. In this case, resources such as a GPU may still be busy, blocking another notebook from using it. Therefore, it is good practice to only keep one such notebook running that occupies the full node and restarting a kernel once a notebook has completed. If in doubt, system monitoring with `htop` and [nvdashboard](https://github.com/rapidsai/jupyterlab-nvdashboard) can be helpful for debugging.

docs/clusters/eiger.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ Eiger is an Alps cluster that provides compute nodes and file systems designed t
2929
### Unimplemented features
3030

3131
!!! under-construction "Jupyter is not yet available"
32-
[Jupyter][ref-jlab] has not yet been configured on `Eiger.Alps`.
32+
[Jupyter][ref-jupyter] has not yet been configured on `Eiger.Alps`.
3333

3434
**It will be deployed as soon as possible and this documentation will be updated accordingly**
3535

@@ -161,7 +161,7 @@ See the Slurm documentation for instructions on how to run jobs on the [AMD CPU
161161
### Jupyter and FirecREST
162162

163163
!!! under-construction "FirecREST is not yet available"
164-
[Jupyter][ref-jlab] has not yet been configured on `Eiger.Alps`.
164+
[Jupyter][ref-jupyter] has not yet been configured on `Eiger.Alps`.
165165

166166
**It will be deployed as soon as possible and this documentation will be updated accordingly**
167167

0 commit comments

Comments
 (0)