Skip to content

Commit be06ec2

Browse files
committed
Split EDF and executon console result
1 parent a1c7e8d commit be06ec2

File tree

1 file changed

+16
-12
lines changed

1 file changed

+16
-12
lines changed

docs/software/container-engine/resource-hook.md

Lines changed: 16 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -230,14 +230,15 @@ By default, the server started by the SSH hook listens to port 15263, but this s
230230

231231
!!! example "Logging into a sleeping container via SSH"
232232
* On the cluster
233-
```console
234-
$ cat ubuntu-ssh.toml
233+
```toml title="EDF: ${EDF_PATH}/ubuntu-ssh.toml"
235234
image = "ubuntu:latest"
236235

237236
[annotations]
238237
com.hooks.ssh.enabled = "true"
239238
com.hooks.ssh.authorize_ssh_key = "<public-key>"
239+
```
240240

241+
```console
241242
$ srun --environment=./ubuntu-ssh.toml --pty sleep 30
242243
```
243244

@@ -267,42 +268,45 @@ The hook can be activated by setting the `com.hooks.nvidia_cuda_mps.enabled` to
267268
The container must be **writable** (default) to use the CUDA MPS hook.
268269

269270
!!! example "Using the CUDA MPS hook"
270-
```console
271-
$ cat vectoradd-cuda-mps.toml
271+
```toml title="EDF: ${EDF_PATH}/vectoradd-cuda-mps.toml"
272272
image = "nvcr.io#nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04"
273273

274274
[annotations]
275275
com.hooks.nvidia_cuda_mps.enabled = "true"
276+
```
276277

278+
```console
277279
$ srun -t2 -N1 -n8 --environment=./vectoradd-cuda-mps.toml /cuda-samples/vectorAdd | grep "Test PASSED" | wc -l
278280
8
279281
```
280282

281283
??? example "Available GPUs and oversubscription error"
284+
```toml title="EDF: ${HOME}/.edf/vectoradd-cuda.toml
285+
image = "nvcr.io#nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04" # (1)
286+
```
287+
288+
1. This EDF uses the CUDA vector addition sample from NVIDIA's NGC catalog.
289+
282290
```console
283291
$ nvidia-smi -L
284292
GPU 0: GH200 120GB (UUID: GPU-...)
285293
GPU 1: GH200 120GB (UUID: GPU-...)
286294
GPU 2: GH200 120GB (UUID: GPU-...)
287295
GPU 3: GH200 120GB (UUID: GPU-...)
288296

289-
$ cat ${HOME}/.edf/vectoradd-cuda.toml # (1)
290-
image = "nvcr.io#nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04"
291-
292-
$ srun -t2 -N1 -n4 --environment=vectoradd-cuda /cuda-samples/vectorAdd | grep "Test PASSED" # (2)
297+
$ srun -t2 -N1 -n4 --environment=vectoradd-cuda /cuda-samples/vectorAdd | grep "Test PASSED" # (1)
293298
Test PASSED
294299
Test PASSED
295300
Test PASSED
296301
Test PASSED
297302

298-
$ srun -t2 -N1 -n5 --environment=vectoradd-cuda /cuda-samples/vectorAdd | grep "Test PASSED" # (3)
303+
$ srun -t2 -N1 -n5 --environment=vectoradd-cuda /cuda-samples/vectorAdd | grep "Test PASSED" # (2)
299304
Failed to allocate device vector A (error code CUDA-capable device(s) is/are busy or unavailable)!
300305
srun: error: ...
301306
```
302307

303-
1. This EDF uses the CUDA vector addition sample from NVIDIA's NGC catalog.
304-
2. 4 processes run successfully.
305-
3. More than 4 concurrent processes result in oversubscription errors.
308+
1. 4 processes run successfully.
309+
2. More than 4 concurrent processes result in oversubscription errors.
306310

307311
## Accessing NVIDIA GPUs
308312

0 commit comments

Comments
 (0)