@@ -107,42 +107,17 @@ The configuration of this is complex and involves:
107107 defined in the `compute` or `login` variables, to override the default
108108 image for specific node groups.
109109
110- 5. Modify `openhpc_slurm_partitions` to add a new partition covering rebuildable
111- nodes to use for for rebuild jobs. If using the default OpenTofu
112- configurations, this variable is contained in an OpenTofu-templated file
113- `environments/$ENV/group_vars/all/partitions.yml` which must be overriden
114- by copying it to e.g. a `z_partitions.yml` file in the same directory.
115- However production sites will probably be overriding this file anyway to
116- customise it.
117-
118- An example partition definition, given the two node groups "general" and
119- "gpu" shown in Step 2, is:
120-
121- ```yaml
122- openhpc_slurm_partitions:
123- ...
124- - name: rebuild
125- groups:
126- - name: general
127- - name: gpu
128- default: NO
129- maxtime: 30
130- partition_params:
131- PriorityJobFactor: 65533
132- Hidden: YES
133- RootOnly: YES
134- DisableRootJobs: NO
135- PreemptMode: 'OFF'
136- OverSubscribe: EXCLUSIVE
137- ```
138-
139- Which has parameters as follows:
110+ 5. Ensure `openhpc_partitions` contains a partition covering the nodes to run
111+ rebuild jobs. The default definition in `environments/common/inventory/group_vars/all/openhpc.yml`
112+ will automatically include this via `openhpc_rebuild_partition` also in that
113+ file. If modifying this, note the important parameters are:
114+
140115 - `name`: Partition name matching `rebuild` role variable `rebuild_partitions`,
141116 default `rebuild`.
142- - `groups`: A list of node group names, matching keys in the OpenTofu
143- `compute` variable (see example in step 2 above). Normally every compute
144- node group should be listed here, unless Slurm-controlled rebuild is not
145- required for certain node groups.
117+ - `groups`: A list of nodegroup names, matching `openhpc_nodegroup` and
118+ keys in the OpenTofu `compute` variable (see example in step 2 above).
119+ Normally every compute node group should be listed here, unless
120+ Slurm-controlled rebuild is not required for certain node groups.
146121 - `default`: Must be set to `NO` so that it is not the default partition.
147122 - `maxtime`: Maximum time to allow for rebuild jobs, in
148123 [slurm.conf format](https://slurm.schedmd.com/slurm.conf.html#OPT_MaxTime).
0 commit comments