Skip to content

Commit 3608f48

Browse files
committed
Merge remote-tracking branch 'origin/feat/nodegroups-v2' into HEAD
2 parents eacee19 + 5ceb9e1 commit 3608f48

File tree

7 files changed

+141
-14
lines changed

7 files changed

+141
-14
lines changed

README.md

Lines changed: 31 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,8 @@ unique set of homogenous nodes:
7070
**NB:** Parameters which can be set via the keys above must not be included here.
7171

7272
Each nodegroup will contain hosts from an Ansible inventory group named
73-
`{{ openhpc_cluster_name }}_{{ group_name}}`. Note that:
73+
`{{ openhpc_cluster_name }}_{{ name }}`, where `name` is the nodegroup name.
74+
Note that:
7475
- Each host may only appear in one nodegroup.
7576
- Hosts in a nodegroup are assumed to be homogenous in terms of processor and memory.
7677
- Hosts may have arbitrary hostnames, but these should be lowercase to avoid a
@@ -143,10 +144,12 @@ accounting data such as start and end times. By default no job accounting is con
143144
`openhpc_slurm_job_comp_loc`: Location to store the job accounting records. Depends on value of
144145
`openhpc_slurm_job_comp_type`, e.g for `jobcomp/filetxt` represents a path on disk.
145146

146-
### slurmdbd.conf
147+
### slurmdbd
147148

148-
The following options affect `slurmdbd.conf`. Please see the slurm [documentation](https://slurm.schedmd.com/slurmdbd.conf.html) for more details.
149-
You will need to configure these variables if you have set `openhpc_enable.database` to `true`.
149+
When the slurm database daemon (`slurmdbd`) is enabled by setting
150+
`openhpc_enable.database` to `true` the following options must be configured.
151+
See documentation for [slurmdbd.conf](https://slurm.schedmd.com/slurmdbd.conf.html)
152+
for more details.
150153

151154
`openhpc_slurmdbd_port`: Port for slurmdb to listen on, defaults to `6819`.
152155

@@ -158,6 +161,30 @@ You will need to configure these variables if you have set `openhpc_enable.datab
158161

159162
`openhpc_slurmdbd_mysql_username`: Username for authenticating with the database, defaults to `slurm`.
160163

164+
Before starting `slurmdbd`, the role will check if a database upgrade is
165+
required to due to a Slurm major version upgrade and carry it out if so.
166+
Slurm versions before 24.11 do not support this check and so no upgrade will
167+
occur. The following variables control behaviour during this upgrade:
168+
169+
`openhpc_slurm_accounting_storage_client_package`: Optional. String giving the
170+
name of the database client package to install, e.g. `mariadb`. Default `mysql`.
171+
172+
`openhpc_slurm_accounting_storage_backup_cmd`: Optional. String (possibly
173+
multi-line) giving a command for `ansible.builtin.shell` to run a backup of the
174+
Slurm database before performing the databse upgrade. Default is the empty
175+
string which performs no backup.
176+
177+
`openhpc_slurm_accounting_storage_backup_host`: Optional. Inventory hostname
178+
defining host to run the backup command. Default is `openhpc_slurm_accounting_storage_host`.
179+
180+
`openhpc_slurm_accounting_storage_backup_become`: Optional. Whether to run the
181+
backup command as root. Default `true`.
182+
183+
`openhpc_slurm_accounting_storage_service`: Optional. Name of systemd service
184+
for the accounting storage database, e.g. `mysql`. If this is defined this
185+
service is stopped before the backup and restarted after, to allow for physical
186+
backups. Default is the empty string, which does not stop/restart any service.
187+
161188
## Facts
162189

163190
This role creates local facts from the live Slurm configuration, which can be

defaults/main.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,3 +102,10 @@ openhpc_module_system_install: true
102102

103103
# Auto detection
104104
openhpc_ram_multiplier: 0.95
105+
106+
# Database upgrade
107+
openhpc_slurm_accounting_storage_service: ''
108+
openhpc_slurm_accounting_storage_backup_cmd: ''
109+
openhpc_slurm_accounting_storage_backup_host: "{{ openhpc_slurm_accounting_storage_host }}"
110+
openhpc_slurm_accounting_storage_backup_become: true
111+
openhpc_slurm_accounting_storage_client_package: mysql

handlers/main.yml

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,4 @@
11
---
2-
# NOTE: We need this running before slurmdbd
3-
- name: Restart Munge service
4-
service:
5-
name: "munge"
6-
state: restarted
7-
when: openhpc_slurm_service_started | bool
82

93
# NOTE: we need this running before slurmctld start
104
- name: Issue slurmdbd restart command

molecule/test4/converge.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
openhpc_nodegroups:
1616
- name: "compute"
1717
openhpc_cluster_name: testohpc
18+
openhpc_slurm_accounting_storage_client_package: mariadb
1819
tasks:
1920
- name: "Include ansible-role-openhpc"
2021
include_role:

tasks/install.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,8 +49,8 @@
4949
install_weak_deps: false # avoids getting recommended packages
5050
when: openhpc_slurm_pkglist | default(false, true)
5151

52-
- name: Install packages from openhpc_packages variable
52+
- name: Install other packages
5353
yum:
54-
name: "{{ openhpc_packages }}"
54+
name: "{{ openhpc_packages + [openhpc_slurm_accounting_storage_client_package] }}"
5555

5656
...

tasks/runtime.yml

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,8 +47,7 @@
4747
owner: munge
4848
group: munge
4949
mode: 0400
50-
notify:
51-
- Restart Munge service
50+
register: _openhpc_munge_key_copy
5251

5352
- name: Ensure JobComp logfile exists
5453
file:
@@ -150,6 +149,24 @@
150149
changed_when: false # so molecule doesn't fail
151150
become: no
152151

152+
- name: Ensure Munge service is running
153+
service:
154+
name: munge
155+
state: "{{ 'restarted' if _openhpc_munge_key_copy.changed else 'started' }}"
156+
when: openhpc_slurm_service_started | bool
157+
158+
- name: Check slurmdbd state
159+
command: systemctl is-active slurmdbd # noqa: command-instead-of-module
160+
changed_when: false
161+
failed_when: false # rc = 0 when active
162+
register: _openhpc_slurmdbd_state
163+
164+
- name: Ensure slurm database is upgraded if slurmdbd inactive
165+
import_tasks: upgrade.yml # need import for conditional support
166+
when:
167+
- "_openhpc_slurmdbd_state.stdout == 'inactive'"
168+
- openhpc_enable.database | default(false)
169+
153170
- name: Notify handler for slurmd restart
154171
debug:
155172
msg: "notifying handlers" # meta: noop doesn't support 'when'

tasks/upgrade.yml

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
- name: Check if slurm database has been initialised
2+
# DB is initialised on the first slurmdbd startup (without -u option).
3+
# If it is not initialised, `slurmdbd -u` errors with something like
4+
# > Slurm Database is somehow higher than expected '4294967294' but I only
5+
# > know as high as '16'. Conversion needed.
6+
community.mysql.mysql_query:
7+
login_db: "{{ openhpc_slurmdbd_mysql_database }}"
8+
login_user: "{{ openhpc_slurmdbd_mysql_username }}"
9+
login_password: "{{ openhpc_slurmdbd_mysql_password }}"
10+
login_host: "{{ openhpc_slurmdbd_host }}"
11+
query: SHOW TABLES
12+
config_file: ''
13+
register: _openhpc_slurmdb_tables
14+
15+
- name: Check if slurm database requires an upgrade
16+
ansible.builtin.command: slurmdbd -u
17+
register: _openhpc_slurmdbd_check
18+
changed_when: false
19+
failed_when: >-
20+
_openhpc_slurmdbd_check.rc > 1 or
21+
'Slurm Database is somehow higher than expected' in _openhpc_slurmdbd_check.stdout
22+
# from https://github.com/SchedMD/slurm/blob/master/src/plugins/accounting_storage/mysql/as_mysql_convert.c
23+
when: _openhpc_slurmdb_tables.query_result | flatten | length > 0 # i.e. when db is initialised
24+
25+
- name: Set fact for slurm database upgrade
26+
# Explanation of ifs below:
27+
# - `slurmdbd -u` rc == 0 then no conversion required (from manpage)
28+
# - default of 0 on rc skips upgrade steps if check was skipped because
29+
# db is not initialised
30+
# - Usage message (and rc == 1) if -u option doesn't exist, in which case
31+
# it can't be a major upgrade due to existing openhpc versions
32+
set_fact:
33+
_openhpc_slurmdb_upgrade: >-
34+
{{ false
35+
if (
36+
( _openhpc_slurmdbd_check.rc | default(0) == 0)
37+
or
38+
( 'Usage: slurmdbd' in _openhpc_slurmdbd_check.stderr )
39+
) else
40+
true
41+
}}
42+
43+
- name: Ensure Slurm database service stopped
44+
ansible.builtin.systemd:
45+
name: "{{ openhpc_slurm_accounting_storage_service }}"
46+
state: stopped
47+
register: _openhpc_slurmdb_state
48+
when:
49+
- _openhpc_slurmdb_upgrade
50+
- openhpc_slurm_accounting_storage_service != ''
51+
52+
- name: Backup Slurm database
53+
ansible.builtin.shell: # noqa: command-instead-of-shell
54+
cmd: "{{ openhpc_slurm_accounting_storage_backup_cmd }}"
55+
delegate_to: "{{ openhpc_slurm_accounting_storage_backup_host }}"
56+
become: "{{ openhpc_slurm_accounting_storage_backup_become }}"
57+
changed_when: true
58+
run_once: true
59+
when:
60+
- _openhpc_slurmdb_upgrade
61+
- openhpc_slurm_accounting_storage_backup_cmd != ''
62+
63+
- name: Ensure Slurm database service started
64+
ansible.builtin.systemd:
65+
name: "{{ openhpc_slurm_accounting_storage_service }}"
66+
state: started
67+
when:
68+
- openhpc_slurm_accounting_storage_service != ''
69+
- _openhpc_slurmdb_state.changed | default(false)
70+
71+
- name: Run slurmdbd in foreground for upgrade
72+
ansible.builtin.expect:
73+
command: /usr/sbin/slurmdbd -D -vvv
74+
responses:
75+
(?i)Everything rolled up:
76+
# See https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#upgrade-slurmdbd
77+
# and
78+
# https://github.com/SchedMD/slurm/blob/0ce058c5adcf63001ec2ad211c65e67b0e7682a8/src/plugins/accounting_storage/mysql/as_mysql_usage.c#L1042
79+
become: true
80+
become_user: slurm
81+
when: _openhpc_slurmdb_upgrade

0 commit comments

Comments
 (0)