Skip to content

Commit f116de2

Browse files
authored
Support automatic GRES configuration for NVIDIA GPUs (#820)
* wip - bump openhpc role for testing * remove GresTypes from MIG docs * enable nvml autoconfiguration for CaaS * fix linter problems * bump openhpc to release w/ auto gres support
1 parent 8722f35 commit f116de2

File tree

3 files changed

+4
-5
lines changed

3 files changed

+4
-5
lines changed

docs/mig.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -214,10 +214,6 @@ openhpc_nodegroups:
214214
- conf: "gpu:nvidia_h100_80gb_hbm3:2"
215215
- conf: "gpu:nvidia_h100_80gb_hbm3_4g.40gb:2"
216216
- conf: "gpu:nvidia_h100_80gb_hbm3_1g.10gb:6"
217-
218-
openhpc_config:
219-
GresTypes:
220-
- gpu
221217
```
222218

223219
Making sure the types (the identifier after `gpu:`) match those collected with `slurmd -G`. Substrings

environments/.caas/inventory/group_vars/all/openhpc.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,6 @@ openhpc_cluster_name: "{{ cluster_name }}"
44
# Provision a single "standard" compute nodegroup using the supplied
55
# node count and flavor
66
openhpc_nodegroups: "{{ hostvars[groups['openstack'][0]]['openhpc_nodegroups'] }}"
7+
8+
# Enable autoconfiguration of NVIDIA GPUs, if using a suitable (`cuda`) image:
9+
openhpc_gres_autodetect: nvml

requirements.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ roles:
44
version: v25.3.2
55
name: stackhpc.nfs
66
- src: https://github.com/stackhpc/ansible-role-openhpc.git
7-
version: v1.4.1
7+
version: v1.5.0
88
name: stackhpc.openhpc
99
- src: https://github.com/stackhpc/ansible-node-exporter.git
1010
version: stackhpc

0 commit comments

Comments
 (0)