@@ -107,42 +107,17 @@ The configuration of this is complex and involves:
107
107
defined in the `compute` or `login` variables, to override the default
108
108
image for specific node groups.
109
109
110
- 5. Modify `openhpc_slurm_partitions` to add a new partition covering rebuildable
111
- nodes to use for for rebuild jobs. If using the default OpenTofu
112
- configurations, this variable is contained in an OpenTofu-templated file
113
- `environments/$ENV/group_vars/all/partitions.yml` which must be overriden
114
- by copying it to e.g. a `z_partitions.yml` file in the same directory.
115
- However production sites will probably be overriding this file anyway to
116
- customise it.
117
-
118
- An example partition definition, given the two node groups "general" and
119
- "gpu" shown in Step 2, is:
120
-
121
- ```yaml
122
- openhpc_slurm_partitions:
123
- ...
124
- - name: rebuild
125
- groups:
126
- - name: general
127
- - name: gpu
128
- default: NO
129
- maxtime: 30
130
- partition_params:
131
- PriorityJobFactor: 65533
132
- Hidden: YES
133
- RootOnly: YES
134
- DisableRootJobs: NO
135
- PreemptMode: 'OFF'
136
- OverSubscribe: EXCLUSIVE
137
- ```
138
-
139
- Which has parameters as follows:
110
+ 5. Ensure `openhpc_partitions` contains a partition covering the nodes to run
111
+ rebuild jobs. The default definition in `environments/common/inventory/group_vars/all/openhpc.yml`
112
+ will automatically include this via `openhpc_rebuild_partition` also in that
113
+ file. If modifying this, note the important parameters are:
114
+
140
115
- `name`: Partition name matching `rebuild` role variable `rebuild_partitions`,
141
116
default `rebuild`.
142
- - `groups`: A list of node group names, matching keys in the OpenTofu
143
- `compute` variable (see example in step 2 above). Normally every compute
144
- node group should be listed here, unless Slurm-controlled rebuild is not
145
- required for certain node groups.
117
+ - `groups`: A list of nodegroup names, matching `openhpc_nodegroup` and
118
+ keys in the OpenTofu `compute` variable (see example in step 2 above).
119
+ Normally every compute node group should be listed here, unless
120
+ Slurm-controlled rebuild is not required for certain node groups.
146
121
- `default`: Must be set to `NO` so that it is not the default partition.
147
122
- `maxtime`: Maximum time to allow for rebuild jobs, in
148
123
[slurm.conf format](https://slurm.schedmd.com/slurm.conf.html#OPT_MaxTime).
0 commit comments