|
2 | 2 |
|
3 | 3 | # stackhpc.openhpc
|
4 | 4 |
|
5 |
| -This Ansible role installs packages and performs configuration to provide an OpenHPC Slurm cluster. It can also be used to drain and resume nodes. |
| 5 | +This Ansible role installs packages and performs configuration to provide an OpenHPC v2.x Slurm cluster. |
6 | 6 |
|
7 | 7 | As a role it must be used from a playbook, for which a simple example is given below. This approach means it is totally modular with no assumptions about available networks or any cluster features except for some hostname conventions. Any desired cluster fileystem or other required functionality may be freely integrated using additional Ansible roles or other approaches.
|
8 | 8 |
|
9 |
| -The minimal image for nodes is a CentOS 7 or RockyLinux 8 GenericCloud image. These use OpenHPC v1 and v2 respectively. Centos8/OpenHPCv2 is generally preferred as it provides additional functionality for Slurm, compilers, MPI and transport libraries. |
| 9 | +The minimal image for nodes is a RockyLinux 8 GenericCloud image. |
10 | 10 |
|
11 | 11 | ## Role Variables
|
12 | 12 |
|
13 |
| -`openhpc_version`: Optional. OpenHPC version to install. Defaults provide `1.3` for Centos 7 and `2` for RockyLinux/CentOS 8. |
14 |
| - |
15 | 13 | `openhpc_extra_repos`: Optional list. Extra Yum repository definitions to configure, following the format of the Ansible
|
16 | 14 | [yum_repository](https://docs.ansible.com/ansible/2.9/modules/yum_repository_module.html) module. Respected keys for
|
17 | 15 | each list element:
|
@@ -39,12 +37,10 @@ each list element:
|
39 | 37 | * `database`: whether to enable slurmdbd
|
40 | 38 | * `batch`: whether to enable compute nodes
|
41 | 39 | * `runtime`: whether to enable OpenHPC runtime
|
42 |
| -* `drain`: whether to drain compute nodes |
43 |
| -* `resume`: whether to resume compute nodes |
44 | 40 |
|
45 | 41 | `openhpc_slurmdbd_host`: Optional. Where to deploy slurmdbd if are using this role to deploy slurmdbd, otherwise where an existing slurmdbd is running. This should be the name of a host in your inventory. Set this to `none` to prevent the role from managing slurmdbd. Defaults to `openhpc_slurm_control_host`.
|
46 | 42 |
|
47 |
| -`openhpc_slurm_configless`: Optional, default false. If true then slurm's ["configless" mode](https://slurm.schedmd.com/configless_slurm.html) is used. **NB: Requires Centos8/OpenHPC v2.** |
| 43 | +`openhpc_slurm_configless`: Optional, default false. If true then slurm's ["configless" mode](https://slurm.schedmd.com/configless_slurm.html) is used. |
48 | 44 |
|
49 | 45 | `openhpc_munge_key`: Optional. Define a munge key to use. If not provided then one is generated but the `openhpc_slurm_control_host` must be in the play.
|
50 | 46 |
|
@@ -184,54 +180,6 @@ To deploy, create a playbook which looks like this:
|
184 | 180 | openhpc_packages: []
|
185 | 181 | ...
|
186 | 182 |
|
187 |
| -To drain nodes, for example, before scaling down the cluster to 6 nodes: |
188 |
| - |
189 |
| - --- |
190 |
| - - hosts: openstack |
191 |
| - gather_facts: false |
192 |
| - vars: |
193 |
| - partition: "{{ cluster_group.output_value | selectattr('group', 'equalto', item.name) | list }}" |
194 |
| - openhpc_slurm_partitions: |
195 |
| - - name: "compute" |
196 |
| - flavor: "compute-A" |
197 |
| - image: "CentOS7.5-OpenHPC" |
198 |
| - num_nodes: 6 |
199 |
| - user: "centos" |
200 |
| - openhpc_cluster_name: openhpc |
201 |
| - roles: |
202 |
| - # Our stackhpc.cluster-infra role can be invoked in `query` mode which |
203 |
| - # looks up the state of the cluster by querying the Heat API. |
204 |
| - - role: stackhpc.cluster-infra |
205 |
| - cluster_name: "{{ cluster_name }}" |
206 |
| - cluster_state: query |
207 |
| - cluster_params: |
208 |
| - cluster_groups: "{{ cluster_groups }}" |
209 |
| - tasks: |
210 |
| - # Given that the original cluster that was created had 8 nodes and the |
211 |
| - # cluster we want to create has 6 nodes, the computed desired_state |
212 |
| - # variable stores the list of instances to leave untouched. |
213 |
| - - name: Count the number of compute nodes per slurm partition |
214 |
| - set_fact: |
215 |
| - desired_state: "{{ (( partition | first).nodes | map(attribute='name') | list )[:item.num_nodes] + desired_state | default([]) }}" |
216 |
| - when: partition | length > 0 |
217 |
| - with_items: "{{ openhpc_slurm_partitions }}" |
218 |
| - - debug: var=desired_state |
219 |
| - |
220 |
| - - hosts: cluster_batch |
221 |
| - become: yes |
222 |
| - vars: |
223 |
| - desired_state: "{{ hostvars['localhost']['desired_state'] | default([]) }}" |
224 |
| - roles: |
225 |
| - # Now, the stackhpc.openhpc role is invoked in drain/resume modes where |
226 |
| - # the instances in desired_state are resumed if in a drained state and |
227 |
| - # drained if in a resumed state. |
228 |
| - - role: stackhpc.openhpc |
229 |
| - openhpc_slurm_control_host: "{{ groups['cluster_control'] | first }}" |
230 |
| - openhpc_enable: |
231 |
| - drain: "{{ inventory_hostname not in desired_state }}" |
232 |
| - resume: "{{ inventory_hostname in desired_state }}" |
233 |
| - ... |
234 |
| - |
235 | 183 | ---
|
236 | 184 |
|
237 | 185 | <b id="slurm_ver_footnote">1</b> Slurm 20.11 removed `accounting_storage/filetxt` as an option. This version of Slurm was introduced in OpenHPC v2.1 but the OpenHPC repos are common to all OpenHPC v2.x releases. [↩](#accounting_storage)
|
0 commit comments