You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -59,15 +59,20 @@ unique set of homogenous nodes:
59
59
`free --mebi` total * `openhpc_ram_multiplier`.
60
60
*`ram_multiplier`: Optional. An override for the top-level definition
61
61
`openhpc_ram_multiplier`. Has no effect if `ram_mb` is set.
62
-
*`gres`: Optional. List of dicts defining [generic resources](https://slurm.schedmd.com/gres.html). Each dict must define:
62
+
*`gres_autodetect`: Optional. The [auto detection mechanism](https://slurm.schedmd.com/gres.conf.html#OPT_AutoDetect) to use for the generic resources. Note: you must still define the `gres` dictionary (see below) but you only need the define the `conf` key. See [GRES autodetection](#gres-autodetection) section below.
63
+
*`gres`: Optional. List of dicts defining [generic resources](https://slurm.schedmd.com/gres.html). Each dict should define:
63
64
-`conf`: A string with the [resource specification](https://slurm.schedmd.com/slurm.conf.html#OPT_Gres_1) but requiring the format `<name>:<type>:<number>`, e.g. `gpu:A100:2`. Note the `type` is an arbitrary string.
64
-
-`file`: A string with the [File](https://slurm.schedmd.com/gres.conf.html#OPT_File) (path to device(s)) for this resource, e.g. `/dev/nvidia[0-1]` for the above example.
65
+
-`file`: Omit if `gres_autodetect` is set. A string with the [File](https://slurm.schedmd.com/gres.conf.html#OPT_File) (path to device(s)) for this resource, e.g. `/dev/nvidia[0-1]` for the above example.
66
+
65
67
Note [GresTypes](https://slurm.schedmd.com/slurm.conf.html#OPT_GresTypes) must be set in `openhpc_config` if this is used.
66
-
*`params`: Optional. Mapping of additional parameters and values for
68
+
*`features`: Optional. List of [Features](https://slurm.schedmd.com/slurm.conf.html#OPT_Features) strings.
69
+
*`node_params`: Optional. Mapping of additional parameters and values for
`{{ openhpc_cluster_name }}_{{ name }}`, where `name` is the nodegroup name.
75
+
Note that:
71
76
- Each host may only appear in one nodegroup.
72
77
- Hosts in a nodegroup are assumed to be homogenous in terms of processor and memory.
73
78
- Hosts may have arbitrary hostnames, but these should be lowercase to avoid a
@@ -78,18 +83,23 @@ unique set of homogenous nodes:
78
83
This is used to set `Sockets`, `CoresPerSocket`, `ThreadsPerCore` and
79
84
optionally `RealMemory` for the nodegroup.
80
85
81
-
`openhpc_partitions`: Optional, default `[]`. List of mappings, each defining a
86
+
`openhpc_partitions`: Optional. List of mappings, each defining a
82
87
partition. Each partition mapping may contain:
83
88
*`name`: Required. Name of partition.
84
-
*`groups`: Optional. List of nodegroup names. If omitted, the partition name
85
-
is assumed to match a nodegroup name.
89
+
*`nodegroups`: Optional. List of node group names. If omitted, the node group
90
+
with the same name as the partition is used.
86
91
*`default`: Optional. A boolean flag for whether this partion is the default. Valid settings are `YES` and `NO`.
87
-
*`maxtime`: Optional. A partition-specific time limit following the format of [slurm.conf](https://slurm.schedmd.com/slurm.conf.html) parameter `MaxTime`. The default value is
88
-
given by `openhpc_job_maxtime`. The value should be quoted to avoid Ansible conversions.
89
-
*`params`: Optional. Mapping of additional parameters and values for
92
+
*`maxtime`: Optional. A partition-specific time limit overriding `openhpc_job_maxtime`.
93
+
*`partition_params`: Optional. Mapping of additional parameters and values for
**NB:** Parameters which can be set via the keys above must not be included here.
96
+
97
+
If this variable is not set one partition per nodegroup is created, with default
98
+
partition configuration for each.
91
99
92
-
`openhpc_job_maxtime`: Maximum job time limit, default `'60-0'` (60 days). See [slurm.conf](https://slurm.schedmd.com/slurm.conf.html) parameter `MaxTime` for format. The default is 60 days. The value should be quoted to avoid Ansible conversions.
100
+
`openhpc_job_maxtime`: Maximum job time limit, default `'60-0'` (60 days), see
The following creates a cluster with a a single partition `compute`
207
+
containing two nodes:
182
208
183
-
[cluster_batch:children]
184
-
openhpc_compute
209
+
```ini
210
+
# inventory/hosts:
211
+
[hpc_login]
212
+
cluster-login-0
185
213
186
-
## Example Playbooks
214
+
[hpc_compute]
215
+
cluster-compute-0
216
+
cluster-compute-1
187
217
188
-
To deploy, create a playbook which looks like this:
189
-
190
-
---
191
-
- hosts:
192
-
- cluster_login
193
-
- cluster_control
194
-
- cluster_batch
195
-
become: yes
196
-
roles:
197
-
- role: openhpc
198
-
openhpc_enable:
199
-
control: "{{ inventory_hostname in groups['cluster_control'] }}"
200
-
batch: "{{ inventory_hostname in groups['cluster_batch'] }}"
201
-
runtime: true
202
-
openhpc_slurm_service_enabled: true
203
-
openhpc_slurm_control_host: "{{ groups['cluster_control'] | first }}"
204
-
openhpc_slurm_partitions:
205
-
- name: "compute"
206
-
openhpc_cluster_name: openhpc
207
-
openhpc_packages: []
208
-
...
218
+
[hpc_control]
219
+
cluster-control
220
+
```
209
221
222
+
```yaml
223
+
#playbook.yml
224
+
---
225
+
- hosts: all
226
+
become: yes
227
+
tasks:
228
+
- import_role:
229
+
name: stackhpc.openhpc
230
+
vars:
231
+
openhpc_cluster_name: hpc
232
+
openhpc_enable:
233
+
control: "{{ inventory_hostname in groups['cluster_control'] }}"
234
+
batch: "{{ inventory_hostname in groups['cluster_compute'] }}"
235
+
runtime: true
236
+
openhpc_slurm_control_host: "{{ groups['cluster_control'] | first }}"
237
+
openhpc_nodegroups:
238
+
- name: compute
239
+
openhpc_partitions:
240
+
- name: compute
210
241
---
242
+
```
243
+
244
+
### Multiple nodegroups
245
+
246
+
This example shows how partitions can span multiple types of compute node.
247
+
248
+
This example inventory describes three types of compute node (login and
249
+
control nodes are omitted for brevity):
250
+
251
+
```ini
252
+
# inventory/hosts:
253
+
...
254
+
[hpc_general]
255
+
# standard compute nodes
256
+
cluster-general-0
257
+
cluster-general-1
258
+
259
+
[hpc_large]
260
+
# large memory nodes
261
+
cluster-largemem-0
262
+
cluster-largemem-1
263
+
264
+
[hpc_gpu]
265
+
# GPU nodes
266
+
cluster-a100-0
267
+
cluster-a100-1
268
+
...
269
+
```
270
+
271
+
Firstly the `openhpc_nodegroups` is set to capture these inventory groups and
272
+
apply any node-level parameters - in this case the `largemem` nodes have
273
+
2x cores reserved for some reason, and GRES is configured for the GPU nodes:
274
+
275
+
```yaml
276
+
openhpc_cluster_name: hpc
277
+
openhpc_nodegroups:
278
+
- name: general
279
+
- name: large
280
+
node_params:
281
+
CoreSpecCount: 2
282
+
- name: gpu
283
+
gres:
284
+
- conf: gpu:A100:2
285
+
file: /dev/nvidia[0-1]
286
+
```
287
+
or if using the NVML gres_autodection mechamism (NOTE: this requires recompilation of the slurm binaries to link against the [NVIDIA Management libray](#gres-autodetection)):
288
+
289
+
```yaml
290
+
openhpc_cluster_name: hpc
291
+
openhpc_nodegroups:
292
+
- name: general
293
+
- name: large
294
+
node_params:
295
+
CoreSpecCount: 2
296
+
- name: gpu
297
+
gres_autodetect: nvml
298
+
gres:
299
+
- conf: gpu:A100:2
300
+
```
301
+
Now two partitions can be configured - a default one with a short timelimit and
302
+
no large memory nodes for testing jobs, and another with all hardware and longer
303
+
job runtime for "production" jobs:
304
+
305
+
```yaml
306
+
openhpc_partitions:
307
+
- name: test
308
+
nodegroups:
309
+
- general
310
+
- gpu
311
+
maxtime: '1:0:0'# 1 hour
312
+
default: 'YES'
313
+
- name: general
314
+
nodegroups:
315
+
- general
316
+
- large
317
+
- gpu
318
+
maxtime: '2-0'# 2 days
319
+
default: 'NO'
320
+
```
321
+
Users will select the partition using `--partition` argument and request nodes
322
+
with appropriate memory or GPUs using the `--mem` and `--gres` or `--gpus*`
323
+
options for `sbatch` or `srun`.
324
+
325
+
Finally here some additional configuration must be provided for GRES:
326
+
```yaml
327
+
openhpc_config:
328
+
GresTypes:
329
+
-gpu
330
+
```
331
+
332
+
## GRES autodetection
333
+
334
+
Some autodetection mechanisms require recompilation of the slurm packages to
335
+
link against external libraries. Examples are shown in the sections below.
336
+
337
+
### Recompiling slurm binaries against the [NVIDIA Management libray](https://developer.nvidia.com/management-library-nvml)
338
+
339
+
This will allow you to use `gres_autodetect: nvml` in your `nodegroup`
340
+
definitions.
341
+
342
+
First, [install the complete cuda toolkit from NVIDIA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/).
343
+
You can then recompile the slurm packages from the source RPMS as follows:
344
+
345
+
```sh
346
+
dnf download --source slurm-slurmd-ohpc
347
+
348
+
rpm -i slurm-ohpc-*.src.rpm
349
+
350
+
cd /root/rpmbuild/SPECS
351
+
352
+
dnf builddep slurm.spec
353
+
354
+
rpmbuild -bb -D "_with_nvml --with-nvml=/usr/local/cuda-12.8/targets/x86_64-linux/" slurm.spec | tee /tmp/build.txt
355
+
```
356
+
357
+
NOTE: This will need to be adapted for the version of CUDA installed (12.8 is used in the example).
358
+
359
+
The RPMs will be created in ` /root/rpmbuild/RPMS/x86_64/`. The method to distribute these RPMs to
360
+
each compute node is out of scope of this document. You can either use a custom package repository
361
+
or simply install them manually on each node with Ansible.
362
+
363
+
#### Configuration example
364
+
365
+
A configuration snippet is shown below:
366
+
367
+
```yaml
368
+
openhpc_cluster_name: hpc
369
+
openhpc_nodegroups:
370
+
- name: general
371
+
- name: large
372
+
node_params:
373
+
CoreSpecCount: 2
374
+
- name: gpu
375
+
gres_autodetect: nvml
376
+
gres:
377
+
- conf: gpu:A100:2
378
+
```
379
+
for additional context refer to the GPU example in: [Multiple Nodegroups](#multiple-nodegroups).
380
+
211
381
212
382
<b id="slurm_ver_footnote">1</b> Slurm 20.11 removed `accounting_storage/filetxt` as an option. This version of Slurm was introduced in OpenHPC v2.1 but the OpenHPC repos are common to all OpenHPC v2.x releases. [↩](#accounting_storage)
0 commit comments