Skip to content

Commit ac31fd3

Browse files
authored
Drop support for CentOS 7 (OpenHPC v1.3) (#152)
* remove drain and resume functionality * allow install and runtime taskbooks to be used directly * fix linter complaints * fix slurmctld state * move common tasks to pre.yml * remove unused openhpc_slurm_service * fix ini_file use for some community.general versions * fix var precedence in molecule test13 * fix var precedence in all molecule tests * fix slurmd always starting on control node * remove unused ohpc_slurm_services var * remove support for CentOS7 / OpenHPC * remove post-configure, not needed as of slurm v20.02 * remove unused openhpc_version
1 parent 2eae901 commit ac31fd3

File tree

28 files changed

+196
-418
lines changed

28 files changed

+196
-418
lines changed

.github/workflows/ci.yml

Lines changed: 1 addition & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,6 @@ jobs:
2424
fail-fast: false
2525
matrix:
2626
image:
27-
- 'centos:7'
2827
- 'rockylinux:8.8'
2928
scenario:
3029
- test1
@@ -44,29 +43,7 @@ jobs:
4443
- test13
4544
- test14
4645

47-
exclude:
48-
- image: 'centos:7'
49-
scenario: test5
50-
- image: 'centos:7'
51-
scenario: test6
52-
- image: 'centos:7'
53-
scenario: test7
54-
- image: 'centos:7'
55-
scenario: test8
56-
- image: 'centos:7'
57-
scenario: test9
58-
- image: 'centos:7'
59-
scenario: test10
60-
- image: 'centos:7'
61-
scenario: test11
62-
- image: 'centos:7'
63-
scenario: test12
64-
- image: 'centos:7'
65-
scenario: test13
66-
- image: 'centos:7'
67-
scenario: test14
68-
- image: 'centos:7'
69-
scenario: test15
46+
exclude: []
7047

7148
steps:
7249
- name: Check out the codebase.

README.md

Lines changed: 3 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,14 @@
22

33
# stackhpc.openhpc
44

5-
This Ansible role installs packages and performs configuration to provide an OpenHPC Slurm cluster. It can also be used to drain and resume nodes.
5+
This Ansible role installs packages and performs configuration to provide an OpenHPC v2.x Slurm cluster.
66

77
As a role it must be used from a playbook, for which a simple example is given below. This approach means it is totally modular with no assumptions about available networks or any cluster features except for some hostname conventions. Any desired cluster fileystem or other required functionality may be freely integrated using additional Ansible roles or other approaches.
88

9-
The minimal image for nodes is a CentOS 7 or RockyLinux 8 GenericCloud image. These use OpenHPC v1 and v2 respectively. Centos8/OpenHPCv2 is generally preferred as it provides additional functionality for Slurm, compilers, MPI and transport libraries.
9+
The minimal image for nodes is a RockyLinux 8 GenericCloud image.
1010

1111
## Role Variables
1212

13-
`openhpc_version`: Optional. OpenHPC version to install. Defaults provide `1.3` for Centos 7 and `2` for RockyLinux/CentOS 8.
14-
1513
`openhpc_extra_repos`: Optional list. Extra Yum repository definitions to configure, following the format of the Ansible
1614
[yum_repository](https://docs.ansible.com/ansible/2.9/modules/yum_repository_module.html) module. Respected keys for
1715
each list element:
@@ -39,12 +37,10 @@ each list element:
3937
* `database`: whether to enable slurmdbd
4038
* `batch`: whether to enable compute nodes
4139
* `runtime`: whether to enable OpenHPC runtime
42-
* `drain`: whether to drain compute nodes
43-
* `resume`: whether to resume compute nodes
4440

4541
`openhpc_slurmdbd_host`: Optional. Where to deploy slurmdbd if are using this role to deploy slurmdbd, otherwise where an existing slurmdbd is running. This should be the name of a host in your inventory. Set this to `none` to prevent the role from managing slurmdbd. Defaults to `openhpc_slurm_control_host`.
4642

47-
`openhpc_slurm_configless`: Optional, default false. If true then slurm's ["configless" mode](https://slurm.schedmd.com/configless_slurm.html) is used. **NB: Requires Centos8/OpenHPC v2.**
43+
`openhpc_slurm_configless`: Optional, default false. If true then slurm's ["configless" mode](https://slurm.schedmd.com/configless_slurm.html) is used.
4844

4945
`openhpc_munge_key`: Optional. Define a munge key to use. If not provided then one is generated but the `openhpc_slurm_control_host` must be in the play.
5046

@@ -184,54 +180,6 @@ To deploy, create a playbook which looks like this:
184180
openhpc_packages: []
185181
...
186182

187-
To drain nodes, for example, before scaling down the cluster to 6 nodes:
188-
189-
---
190-
- hosts: openstack
191-
gather_facts: false
192-
vars:
193-
partition: "{{ cluster_group.output_value | selectattr('group', 'equalto', item.name) | list }}"
194-
openhpc_slurm_partitions:
195-
- name: "compute"
196-
flavor: "compute-A"
197-
image: "CentOS7.5-OpenHPC"
198-
num_nodes: 6
199-
user: "centos"
200-
openhpc_cluster_name: openhpc
201-
roles:
202-
# Our stackhpc.cluster-infra role can be invoked in `query` mode which
203-
# looks up the state of the cluster by querying the Heat API.
204-
- role: stackhpc.cluster-infra
205-
cluster_name: "{{ cluster_name }}"
206-
cluster_state: query
207-
cluster_params:
208-
cluster_groups: "{{ cluster_groups }}"
209-
tasks:
210-
# Given that the original cluster that was created had 8 nodes and the
211-
# cluster we want to create has 6 nodes, the computed desired_state
212-
# variable stores the list of instances to leave untouched.
213-
- name: Count the number of compute nodes per slurm partition
214-
set_fact:
215-
desired_state: "{{ (( partition | first).nodes | map(attribute='name') | list )[:item.num_nodes] + desired_state | default([]) }}"
216-
when: partition | length > 0
217-
with_items: "{{ openhpc_slurm_partitions }}"
218-
- debug: var=desired_state
219-
220-
- hosts: cluster_batch
221-
become: yes
222-
vars:
223-
desired_state: "{{ hostvars['localhost']['desired_state'] | default([]) }}"
224-
roles:
225-
# Now, the stackhpc.openhpc role is invoked in drain/resume modes where
226-
# the instances in desired_state are resumed if in a drained state and
227-
# drained if in a resumed state.
228-
- role: stackhpc.openhpc
229-
openhpc_slurm_control_host: "{{ groups['cluster_control'] | first }}"
230-
openhpc_enable:
231-
drain: "{{ inventory_hostname not in desired_state }}"
232-
resume: "{{ inventory_hostname in desired_state }}"
233-
...
234-
235183
---
236184

237185
<b id="slurm_ver_footnote">1</b> Slurm 20.11 removed `accounting_storage/filetxt` as an option. This version of Slurm was introduced in OpenHPC v2.1 but the OpenHPC repos are common to all OpenHPC v2.x releases. [](#accounting_storage)

defaults/main.yml

Lines changed: 0 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
---
2-
openhpc_version: "{{ '1.3' if ansible_distribution_major_version == '7' else '2' }}"
32
openhpc_slurm_service_enabled: true
43
openhpc_slurm_service_started: "{{ openhpc_slurm_service_enabled }}"
54
openhpc_slurm_service:
@@ -9,7 +8,6 @@ openhpc_slurm_partitions: []
98
openhpc_cluster_name:
109
openhpc_packages:
1110
- slurm-libpmi-ohpc
12-
openhpc_drain_timeout: 86400
1311
openhpc_resume_timeout: 300
1412
openhpc_retry_delay: 10
1513
openhpc_job_maxtime: '60-0' # quote this to avoid ansible converting some formats to seconds, which is interpreted as minutes by Slurm
@@ -46,29 +44,11 @@ openhpc_enable:
4644
batch: false
4745
database: false
4846
runtime: false
49-
drain: false
50-
resume: false
51-
ohpc_slurm_services:
52-
control: slurmctld
53-
batch: slurmd
5447

5548
# Repository configuration
5649
openhpc_extra_repos: []
5750

5851
ohpc_openhpc_repos:
59-
"7":
60-
- name: OpenHPC
61-
file: OpenHPC
62-
description: "OpenHPC-1.3 - Base"
63-
baseurl: "http://build.openhpc.community/OpenHPC:/1.3/CentOS_7"
64-
gpgcheck: true
65-
gpgkey: https://raw.githubusercontent.com/openhpc/ohpc/v1.3.5.GA/components/admin/ohpc-release/SOURCES/RPM-GPG-KEY-OpenHPC-1
66-
- name: OpenHPC-updates
67-
file: OpenHPC
68-
description: "OpenHPC-1.3 - Updates"
69-
baseurl: "http://build.openhpc.community/OpenHPC:/1.3/updates/CentOS_7"
70-
gpgcheck: true
71-
gpgkey: https://raw.githubusercontent.com/openhpc/ohpc/v1.3.5.GA/components/admin/ohpc-release/SOURCES/RPM-GPG-KEY-OpenHPC-1
7252
"8":
7353
- name: OpenHPC
7454
file: OpenHPC
@@ -84,13 +64,6 @@ ohpc_openhpc_repos:
8464
gpgkey: https://raw.githubusercontent.com/openhpc/ohpc/v2.6.1.GA/components/admin/ohpc-release/SOURCES/RPM-GPG-KEY-OpenHPC-2
8565

8666
ohpc_default_extra_repos:
87-
"7":
88-
- name: epel
89-
file: epel
90-
description: "Extra Packages for Enterprise Linux 7 - $basearch"
91-
metalink: "https://mirrors.fedoraproject.org/metalink?repo=epel-7&arch=$basearch&infra=$infra&content=$contentdir"
92-
gpgcheck: true
93-
gpgkey: "https://dl.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-7"
9467
"8":
9568
- name: epel
9669
file: epel

handlers/main.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,4 +60,5 @@
6060
state: restarted
6161
when:
6262
- openhpc_slurm_service_started | bool
63-
- openhpc_slurm_service == 'slurmd'
63+
- openhpc_enable.batch | default(false) | bool
64+
# 2nd condition required as notification happens on controller, which isn't necessarily a compute note

molecule/README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,8 +42,7 @@ Local installation on a RockyLinux 8.x machine looks like:
4242
Then to run tests, e.g.::
4343

4444
cd ansible-role-openhpc/
45-
MOLECULE_IMAGE=centos:7 molecule test --all # NB some won't work as require OpenHPC v2.x (-> CentOS 8.x) features - see `.github/workflows/ci.yml`
46-
MOLECULE_IMAGE=rockylinux:8.6 molecule test --all
45+
MOLECULE_IMAGE=rockylinux:8.8 molecule test --all
4746

4847
During development you may want to:
4948

molecule/test1/converge.yml

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,16 @@
11
---
22
- name: Converge
33
hosts: all
4+
vars:
5+
openhpc_enable:
6+
control: "{{ inventory_hostname in groups['testohpc_login'] }}"
7+
batch: "{{ inventory_hostname in groups['testohpc_compute'] }}"
8+
runtime: true
9+
openhpc_slurm_control_host: "{{ groups['testohpc_login'] | first }}"
10+
openhpc_slurm_partitions:
11+
- name: "compute"
12+
openhpc_cluster_name: testohpc
413
tasks:
514
- name: "Include ansible-role-openhpc"
615
include_role:
716
name: "{{ lookup('env', 'MOLECULE_PROJECT_DIRECTORY') | basename }}"
8-
vars:
9-
openhpc_enable:
10-
control: "{{ inventory_hostname in groups['testohpc_login'] }}"
11-
batch: "{{ inventory_hostname in groups['testohpc_compute'] }}"
12-
runtime: true
13-
openhpc_slurm_control_host: "{{ groups['testohpc_login'] | first }}"
14-
openhpc_slurm_partitions:
15-
- name: "compute"
16-
openhpc_cluster_name: testohpc
17-

molecule/test10/converge.yml

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
11
---
22
- name: Create initial cluster
33
hosts: initial
4+
vars:
5+
openhpc_enable:
6+
control: "{{ inventory_hostname in groups['testohpc_login'] }}"
7+
batch: "{{ inventory_hostname in groups['testohpc_compute'] }}"
8+
runtime: true
9+
openhpc_slurm_control_host: "{{ groups['testohpc_login'] | first }}"
10+
openhpc_slurm_partitions:
11+
- name: "compute"
12+
openhpc_cluster_name: testohpc
13+
openhpc_slurm_configless: true
414
tasks:
515
- name: "Include ansible-role-openhpc"
616
include_role:
717
name: "{{ lookup('env', 'MOLECULE_PROJECT_DIRECTORY') | basename }}"
8-
vars:
9-
openhpc_enable:
10-
control: "{{ inventory_hostname in groups['testohpc_login'] }}"
11-
batch: "{{ inventory_hostname in groups['testohpc_compute'] }}"
12-
runtime: true
13-
openhpc_slurm_control_host: "{{ groups['testohpc_login'] | first }}"
14-
openhpc_slurm_partitions:
15-
- name: "compute"
16-
openhpc_cluster_name: testohpc
17-
openhpc_slurm_configless: true

molecule/test13/converge.yml

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,21 @@
11
---
22
- name: Converge
33
hosts: all
4+
vars:
5+
openhpc_enable:
6+
control: "{{ inventory_hostname in groups['testohpc_control'] }}"
7+
batch: "{{ inventory_hostname in groups['testohpc_compute'] }}"
8+
runtime: true
9+
openhpc_slurm_control_host: "{{ groups['testohpc_control'] | first }}"
10+
openhpc_slurm_partitions:
11+
- name: "compute"
12+
openhpc_cluster_name: testohpc
13+
openhpc_slurm_configless: true
14+
openhpc_login_only_nodes: 'testohpc_login'
15+
openhpc_config:
16+
FirstJobId: 13
17+
SlurmctldSyslogDebug: error
418
tasks:
519
- name: "Include ansible-role-openhpc"
620
include_role:
721
name: "{{ lookup('env', 'MOLECULE_PROJECT_DIRECTORY') | basename }}"
8-
vars:
9-
openhpc_enable:
10-
control: "{{ inventory_hostname in groups['testohpc_control'] }}"
11-
batch: "{{ inventory_hostname in groups['testohpc_compute'] }}"
12-
runtime: true
13-
openhpc_slurm_control_host: "{{ groups['testohpc_control'] | first }}"
14-
openhpc_slurm_partitions:
15-
- name: "compute"
16-
openhpc_cluster_name: testohpc
17-
openhpc_slurm_configless: true
18-
openhpc_login_only_nodes: 'testohpc_login'
19-
openhpc_config:
20-
FirstJobId: 13
21-
SlurmctldSyslogDebug: error

molecule/test14/converge.yml

Lines changed: 22 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,29 @@
11
---
22
- name: Converge
33
hosts: all
4+
vars:
5+
openhpc_enable:
6+
control: "{{ inventory_hostname in groups['testohpc_login'] }}"
7+
batch: "{{ inventory_hostname in groups['testohpc_compute'] }}"
8+
runtime: true
9+
openhpc_slurm_control_host: "{{ groups['testohpc_login'] | first }}"
10+
openhpc_slurm_partitions:
11+
- name: "compute"
12+
extra_nodes:
13+
# Need to specify IPs for the non-existent State=DOWN nodes, because otherwise even in this state slurmctld will exclude a node with no lookup information from the config.
14+
# We use invalid IPs here (i.e. starting 0.) to flag the fact the nodes shouldn't exist.
15+
# Note this has to be done via slurm config rather than /etc/hosts due to Docker limitations on modifying the latter.
16+
- NodeName: fake-x,fake-y
17+
NodeAddr: 0.42.42.0,0.42.42.1
18+
State: DOWN
19+
CPUs: 1
20+
- NodeName: fake-2cpu-[3,7-9]
21+
NodeAddr: 0.42.42.3,0.42.42.7,0.42.42.8,0.42.42.9
22+
State: DOWN
23+
CPUs: 2
24+
openhpc_cluster_name: testohpc
25+
openhpc_slurm_configless: true
426
tasks:
527
- name: "Include ansible-role-openhpc"
628
include_role:
729
name: "{{ lookup('env', 'MOLECULE_PROJECT_DIRECTORY') | basename }}"
8-
vars:
9-
openhpc_enable:
10-
control: "{{ inventory_hostname in groups['testohpc_login'] }}"
11-
batch: "{{ inventory_hostname in groups['testohpc_compute'] }}"
12-
runtime: true
13-
openhpc_slurm_control_host: "{{ groups['testohpc_login'] | first }}"
14-
openhpc_slurm_partitions:
15-
- name: "compute"
16-
extra_nodes:
17-
# Need to specify IPs for the non-existent State=DOWN nodes, because otherwise even in this state slurmctld will exclude a node with no lookup information from the config.
18-
# We use invalid IPs here (i.e. starting 0.) to flag the fact the nodes shouldn't exist.
19-
# Note this has to be done via slurm config rather than /etc/hosts due to Docker limitations on modifying the latter.
20-
- NodeName: fake-x,fake-y
21-
NodeAddr: 0.42.42.0,0.42.42.1
22-
State: DOWN
23-
CPUs: 1
24-
- NodeName: fake-2cpu-[3,7-9]
25-
NodeAddr: 0.42.42.3,0.42.42.7,0.42.42.8,0.42.42.9
26-
State: DOWN
27-
CPUs: 2
28-
openhpc_cluster_name: testohpc
29-
openhpc_slurm_configless: true
30-

molecule/test1b/converge.yml

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,16 @@
11
---
22
- name: Converge
33
hosts: all
4+
vars:
5+
openhpc_enable:
6+
control: "{{ inventory_hostname in groups['testohpc_login'] }}"
7+
batch: "{{ inventory_hostname in groups['testohpc_compute'] }}"
8+
runtime: true
9+
openhpc_slurm_control_host: "{{ groups['testohpc_login'] | first }}"
10+
openhpc_slurm_partitions:
11+
- name: "compute"
12+
openhpc_cluster_name: testohpc
413
tasks:
514
- name: "Include ansible-role-openhpc"
615
include_role:
716
name: "{{ lookup('env', 'MOLECULE_PROJECT_DIRECTORY') | basename }}"
8-
vars:
9-
openhpc_enable:
10-
control: "{{ inventory_hostname in groups['testohpc_login'] }}"
11-
batch: "{{ inventory_hostname in groups['testohpc_compute'] }}"
12-
runtime: true
13-
openhpc_slurm_control_host: "{{ groups['testohpc_login'] | first }}"
14-
openhpc_slurm_partitions:
15-
- name: "compute"
16-
openhpc_cluster_name: testohpc
17-

0 commit comments

Comments
 (0)