You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To deploy, create a playbook which looks like this:
187
-
188
-
---
189
-
- hosts:
190
-
- cluster_login
191
-
- cluster_control
192
-
- cluster_batch
193
-
become: yes
194
-
roles:
195
-
- role: openhpc
196
-
openhpc_enable:
197
-
control: "{{ inventory_hostname in groups['cluster_control'] }}"
198
-
batch: "{{ inventory_hostname in groups['cluster_batch'] }}"
199
-
runtime: true
200
-
openhpc_slurm_service_enabled: true
201
-
openhpc_slurm_control_host: "{{ groups['cluster_control'] | first }}"
202
-
openhpc_slurm_partitions:
203
-
- name: "compute"
204
-
openhpc_cluster_name: openhpc
205
-
openhpc_packages: []
206
-
...
164
+
## Example
207
165
166
+
### Simple
167
+
168
+
The following creates a cluster with a a single partition `compute`
169
+
containing two nodes:
170
+
171
+
```ini
172
+
# inventory/hosts:
173
+
[hpc_login]
174
+
cluster-login-0
175
+
176
+
[hpc_compute]
177
+
cluster-compute-0
178
+
cluster-compute-1
179
+
180
+
[hpc_control]
181
+
cluster-control
182
+
```
183
+
184
+
```yaml
185
+
#playbook.yml
186
+
---
187
+
- hosts: all
188
+
become: yes
189
+
tasks:
190
+
- import_role:
191
+
name: stackhpc.openhpc
192
+
vars:
193
+
openhpc_cluster_name: hpc
194
+
openhpc_enable:
195
+
control: "{{ inventory_hostname in groups['cluster_control'] }}"
196
+
batch: "{{ inventory_hostname in groups['cluster_compute'] }}"
197
+
runtime: true
198
+
openhpc_slurm_control_host: "{{ groups['cluster_control'] | first }}"
199
+
openhpc_nodegroups:
200
+
- name: compute
201
+
openhpc_partitions:
202
+
- name: compute
208
203
---
204
+
```
205
+
206
+
### Multiple nodegroups
207
+
208
+
This example shows how partitions can span multiple types of compute node.
209
+
210
+
This example inventory describes three types of compute node (login and
211
+
control nodes are omitted for brevity):
212
+
213
+
```ini
214
+
# inventory/hosts:
215
+
...
216
+
[hpc_general]
217
+
# standard compute nodes
218
+
cluster-general-0
219
+
cluster-general-1
220
+
221
+
[hpc_large]
222
+
# large memory nodes
223
+
cluster-largemem-0
224
+
cluster-largemem-1
225
+
226
+
[hpc_gpu]
227
+
# GPU nodes
228
+
cluster-a100-0
229
+
cluster-a100-1
230
+
...
231
+
```
232
+
233
+
Firstly the `openhpc_nodegroups` is set to capture these inventory groups and
234
+
apply any node-level parameters - in this case the `largemem` nodes have
235
+
2x cores reserved for some reason, and GRES is configured for the GPU nodes:
236
+
237
+
```yaml
238
+
openhpc_cluster_name: hpc
239
+
openhpc_nodegroups:
240
+
- name: general
241
+
- name: large
242
+
node_params:
243
+
CoreSpecCount: 2
244
+
- name: gpu
245
+
gres:
246
+
- conf: gpu:A100:2
247
+
file: /dev/nvidia[0-1]
248
+
```
249
+
250
+
Now two partitions can be configured - a default one with a short timelimit and
251
+
no large memory nodes for testing jobs, and another with all hardware and longer
252
+
job runtime for "production" jobs:
253
+
254
+
```yaml
255
+
openhpc_partitions:
256
+
- name: test
257
+
groups:
258
+
- general
259
+
- gpu
260
+
maxtime: '1:0:0'# 1 hour
261
+
default: 'YES'
262
+
- name: general
263
+
groups:
264
+
- general
265
+
- large
266
+
- gpu
267
+
maxtime: '2-0'# 2 days
268
+
default: 'NO'
269
+
```
270
+
Users will select the partition using `--partition` argument and request nodes
271
+
with appropriate memory or GPUs using the `--mem` and `--gres` or `--gpus*`
272
+
options for `sbatch` or `srun`.
273
+
274
+
Finally here some additional configuration must be provided for GRES:
275
+
```yaml
276
+
openhpc_config:
277
+
GresTypes:
278
+
-gpu
279
+
```
209
280
210
281
<b id="slurm_ver_footnote">1</b> Slurm 20.11 removed `accounting_storage/filetxt` as an option. This version of Slurm was introduced in OpenHPC v2.1 but the OpenHPC repos are common to all OpenHPC v2.x releases. [↩](#accounting_storage)
0 commit comments