Skip to content

Commit 4a21427

Browse files
committed
triton/: GPU and GRES minor updates
1 parent 25f98c3 commit 4a21427

File tree

4 files changed

+43
-32
lines changed

4 files changed

+43
-32
lines changed

triton/ref/gpu.rst

Lines changed: 17 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,28 @@
11
.. csv-table::
22
:delim: |
33
:header-rows: 1
4+
:class: scicomp-table-dense
45

5-
GPU brand name | GPU name in Slurm (``--gpus=NAME:n``) | Amount of VRAM | CUDA compute capability | total amount | nodes | GPUs per node | Compute threads per GPU | Slurm partition (``--partition=``) |
6-
NVIDIA H200(*) | ``h200`` | 141GB (``--gres=gpu-vram:141g``) | 9.0 (``--gres=min-cuda-cc=90``) | 112 | gpu[50-63] | 8 | 16896 | ``gpu-h200-141g-ellis``, ``gpu-h200-141g-short`` |
7-
NVIDIA H200(**) | ``h200_2g.35gb`` | 35GB (``--gres=gpu-vram:35g``) | 9.0 (``--gres=min-cuda-cc=90``) | 24 | gpu[49] | 24 | 4224 | ``gpu-h200-35g-ia-ellis``, ``gpu-h200-35g-ia`` |
8-
NVIDIA H100 | ``h100`` | 80GB (``--gres=gpu-vram:80g``) | 9.0 (``--gres=min-cuda-cc=90``) | 16 | gpu[45-48] | 4 | 16896 | ``gpu-h100-80g`` |
9-
NVIDIA A100 | ``a100`` | 80GB (``--gres=gpu-vram:80g``) | 8.0 (``--gres=min-cuda-cc=80``) | 56 | gpu[11-17,38-44] | 4 | 7936 | ``gpu-a100-80g`` |
10-
NVIDIA V100 | ``v100`` | 32GB (``--gres=gpu-vram:32g``) | 7.0 (``--gres=min-cuda-cc=70``) | 40 | gpu[28-37] | 4 | 5120 | ``gpu-v100-32g`` |
11-
NVIDIA V100 | ``v100`` | 32GB (``--gres=gpu-vram:32g``) | 7.0 (``--gres=min-cuda-cc=70``) | 40 | gpu[1-10] | 4 | 5120 | ``gpu-v100-32g`` |
12-
NVIDIA V100 | ``v100`` | 32GB (``--gres=gpu-vram:32g``) | 7.0 (``--gres=min-cuda-cc=70``) | 32 | dgx[3,5-7] | 8 | 5120 | ``gpu-v100-32g`` |
13-
NVIDIA V100 | ``v100`` | 16GB (``--gres=gpu-vram:16g``) | 7.0 (``--gres=min-cuda-cc=70``) | 176 | dgx[1-2,8-27] | 8 | 5120 | ``gpu-v100-16g`` |
14-
AMD MI210 | ``mi210`` with ``-p gpu-amd`` | 32GB | | 2 | gpuamd[1] | 2 | 7680 | ``gpu-amd`` |
15-
AMD MI100 | ``mi100`` with ``-p gpu-amd`` | 64GB | | 1 | gpuamd[1] | 1 | 6656 | ``gpu-amd`` |
6+
GPU brand name | GPU name in Slurm (``--gpus=NAME:n``) | VRAM GB (``--gres=gpu-vram:NNg``) | CUDA compute capability (``--gres=min-cuda-cc=NN``) | total amount | nodes | GPUs per node | Compute threads per GPU | Slurm partition (``--partition=``) |
7+
NVIDIA H200(*) | ``h200`` | ``141`` | 9.0 (``90``) | 112 | gpu[50-63] | 8 | 16896 | ``gpu-h200-141g-ellis``, ``gpu-h200-141g-short`` |
8+
NVIDIA H200(**) | ``h200_2g.35gb`` | ``35`` | 9.0 (``90``) | 24 | gpu[49] | 24 | 4224 | ``gpu-h200-35g-ia-ellis``, ``gpu-h200-35g-ia`` |
9+
NVIDIA H100 | ``h100`` | ``80`` | 9.0 (``90``) | 16 | gpu[45-48] | 4 | 16896 | ``gpu-h100-80g`` |
10+
NVIDIA A100 | ``a100`` | ``80`` | 8.0 (``80``) | 56 | gpu[11-17,38-44] | 4 | 7936 | ``gpu-a100-80g`` |
11+
NVIDIA V100 | ``v100`` | ``32`` | 7.0 (``70``) | 40 | gpu[28-37] | 4 | 5120 | ``gpu-v100-32g`` |
12+
NVIDIA V100 | ``v100`` | ``32`` | 7.0 (``70``) | 40 | gpu[1-10] | 4 | 5120 | ``gpu-v100-32g`` |
13+
NVIDIA V100 | ``v100`` | ``32`` | 7.0 (``70``) | 32 | dgx[3,5-7] | 8 | 5120 | ``gpu-v100-32g`` |
14+
NVIDIA V100 | ``v100`` | ``16`` | 7.0 (``70``) | 176 | dgx[1-2,8-27] | 8 | 5120 | ``gpu-v100-16g`` |
15+
AMD MI210 | ``mi210`` with ``-p gpu-amd`` | ``32`` | | 2 | gpuamd[1] | 2 | 7680 | ``gpu-amd`` |
16+
AMD MI100 | ``mi100`` with ``-p gpu-amd`` | ``64`` | | 1 | gpuamd[1] | 1 | 6656 | ``gpu-amd`` |
1617

17-
To request multiple gres, e.g. both 32GB of memory and compute capability 8.0, use a comma separated list: ``--gres=gpu-vram:32g,min-cuda-cc=80``.
18+
Since 2025, the main way to request certain types of GPUs is with
19+
``--gres``, for example ``--gpus=1 --gres=min-vram:32``. Only one
20+
``--gres`` option is allowed, so to combine gres, use a comma
21+
separated list: ``--gres=gpu-vram:32g,min-cuda-cc=80``.
1822

1923
(*) These GPUs have a priority queue for the Ellis project, since they were
2024
procured for this project. Any job submitted to the short queue might be
2125
preempted if a job requiring the resources comes in from the Ellis queue.
2226

23-
(**) These GPUs are split from a single GPU with NVIDIA's
27+
(**) These GPUs are split from a single GPU with NVIDIA's
2428
`Multi-Instance GPU <https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html>`__-feature.

triton/ref/hardware.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
.. csv-table::
22
:delim: |
33
:header-rows: 1
4+
:class: scicomp-table-dense
45

56
Node name | Number of nodes | Node type | Year | Arch (``--constraint``) | CPU type | Memory Configuration | Infiniband | GPUs | Disks
67
pe[1-48,65-81] | 65 | Dell PowerEdge C4130 | 2016 | hsw avx2 | 2x12 core `Xeon E5 2680 v3 <https://ark.intel.com/products/81908/Intel-Xeon-Processor-E5-2680-v3-30M-Cache-2_50-GHz>`__ 2.50GHz | 128GB DDR4-2133 | FDR | | 900GB HDD

triton/ref/slurm.rst

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -17,24 +17,26 @@
1717
:header-rows: 1
1818
:delim: !
1919

20-
Command ! Option ! Description
21-
``sbatch``/``srun``/etc ! ``-t``, ``--time=``\ *HH:MM:SS* ! **time limit**
22-
! ``-t, --time=``\ *DD-HH* ! **time limit, days-hours**
23-
! ``-p, --partition=``\ *PARTITION*! **job partition. Usually leave off and things are auto-detected.**
24-
! ``--mem-per-cpu=``\ *N* ! **request n MB of memory per core**
25-
! ``--mem=``\ *N* ! **request n MB memory per node**
26-
! ``-c``, ``--cpus-per-task=``\ *N* ! **Allocate *n* CPU's for each task. For multithreaded jobs. (compare ``--ntasks``: ``-c`` means the number of cores for each process started.)**
27-
! ``-N``, ``--nodes=``\ *N-M* ! allocate minimum of n, maximum of m nodes.
28-
! ``-n``, ``--ntasks=``\ *N* ! allocate resources for and start *n* tasks (one task=one process started, it is up to you to make them communicate. However the main script runs only on first node, the sub-processes run with "srun" are run this many times.)
29-
! ``-J``, ``--job-name=``\ *NAME* ! short job name
30-
! ``-o`` *OUTPUTFILE* ! print output into file *output*
31-
! ``-e`` *ERRORFILE* ! print errors into file *error*
20+
Command ! Option ! Description
21+
``sbatch``/``srun``/etc ! ``-t``, ``--time=HH:MM:SS`` ! **time limit**
22+
! ``-t``, ``--time=DD-HH`` ! **time limit, days-hours**
23+
! ``-p PARTITION``, ``--partition=PARTITION`` ! **job partition. Usually leave off and things are auto-detected.**
24+
! ``--mem-per-cpu=N`` ! **request N MB of memory per core**
25+
! ``--mem=N`` ! **request N MB memory per node**
26+
! ``-c``, ``--cpus-per-task=N`` ! **Allocate *n* CPU's for each task. For multithreaded jobs. (compare ``--ntasks``: ``-c`` means the number of cores for each process started.)**
27+
! ``-N``, ``--nodes=N-M`` ! allocate minimum of N, maximum of M nodes.
28+
! ``-n``, ``--ntasks=N`` ! allocate resources for and start *n* tasks (one task=one process started, it is up to you to make them communicate. However the main script runs only on first node, the sub-processes run with "srun" are run this many times.)
29+
! ``--gpus=1`` ! request a GPU, or ``--gpus=N`` for multiple
30+
! ``--gres=min-vram:NNg`` ! request GPUs with at least ``NN`` GB of VRAM. To combine with other ``--gres`` options, use ``--gres=min-vram:NNg,min-cuda-cc=NN``.
31+
! ``--gres=min-cuda-cc:NN`` ! request GPUs with CUDA compute capability of at least N.N. See above for combining with other GRES.
32+
! ``-J``, ``--job-name=NAME`` ! short job name
33+
! ``-o OUTPUTFILE`` ! print output into file *output*
34+
! ``-e ERRORFILE`` ! print errors into file *error*
3235
! ``--exclusive`` ! allocate exclusive access to nodes. For large parallel jobs.
33-
! ``--constraint=``\ *FEATURE* ! request *feature* (see ``slurm features`` for the current list of configured features, or Arch under the :ref:`hardware list <hardware-list>`). Multiple with ``--constraint="hsw|skl"``.
36+
! ``--constraint=FEATURE`` ! request *feature* (see ``slurm features`` for the current list of configured features, or Arch under the :ref:`hardware list <hardware-list>`). Multiple with ``--constraint="hsw|skl"``.
3437
! ``--constraint=localdisk`` ! request nodes that have local disks
3538
! ``--tmp=nnnG`` ! Request ``nnn`` GB of :doc:`local disk storage space </triton/usage/localstorage>`
36-
! ``--array=``\ *0-5,7,10-15* ! Run job multiple times, use variable ``$SLURM_ARRAY_TASK_ID`` to adjust parameters.
37-
! ``--gpus=1`` ! request a GPU, or ``--gpus=N`` for multiple
38-
! ``--mail-type=``\ *TYPE* ! notify of events: ``BEGIN``, ``END``, ``FAIL``, ``ALL``, ``REQUEUE`` (not on triton) or ``ALL.`` MUST BE used with ``--mail-user=`` only
39-
! ``--mail-user=``\ *first.last@aalto.fi* ! Aalto email to send the notification about the job. External email addresses doesn't work.
40-
``srun`` ! ``-N`` *N_NODES* hostname ! Print allocated nodes (from within script)
39+
! ``--array=0-5,7,10-15` ` ! Run job multiple times, use variable ``$SLURM_ARRAY_TASK_ID`` to adjust parameters.
40+
! ``--mail-type=TYPE`` ! notify of events: ``BEGIN``, ``END``, ``FAIL``, ``ALL``, ``REQUEUE`` (not on triton) or ``ALL.`` MUST BE used with ``--mail-user=`` only
41+
! ``--mail-user=first.last@aalto.fi`` ! Aalto email to send the notification about the job. External email addresses doesn't work.
42+
``srun`` ! ``-N N_NODES hostname`` ! Print allocated nodes (from within script)

triton/tut/gpu.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,8 @@ GPU computing
1313
* Select a GPU with certain CUDA compute capability with e.g.
1414
``--gpus=1 --gres=min-cuda-cc:80``.
1515
* See :ref:`the quick reference <available-gpus>` for available GPU names,
16-
memory capacities and compute capabilities.
16+
memory capacities, and compute capabilities, and how to combine
17+
``--gres`` options.
1718
* Monitor GPU performance with ``seff JOBID``.
1819
* You can test out small jobs of 30 minutes or less in the
1920
``gpu-debug``-partition (``--partition=gpu-debug``).
@@ -150,6 +151,9 @@ with ``--gpus=N`` as well.
150151
For example, specifying ``--gpus=1`` and ``--gres=min-cuda-cc:80`` would give
151152
you a single GPU with minimum compute capabilty support of 8.0.
152153

154+
Only one ``--gres`` option can be given, so combine them with a comma
155+
like ``--gres=min-vram:40g,min-cuda-cc:80``.
156+
153157
See the :ref:`available GPUs reference <available-gpus>` for more information on
154158
available GPUs.
155159

0 commit comments

Comments
 (0)