Skip to content

Commit e51919d

Browse files
committed
Stop slurmd during slurmctld restart
Some Slurm configuration changes can cause compute nodes to go into invalid state if slurmctld is restarted while slurmd services are still running. Stop slurmd services while slurmctld is being restarted. This has been tested not to affect running jobs. Closes #199
1 parent 34d3996 commit e51919d

File tree

1 file changed

+17
-0
lines changed

1 file changed

+17
-0
lines changed

tasks/runtime.yml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,23 @@
127127
- "_openhpc_slurmdbd_state.stdout == 'inactive'"
128128
- openhpc_enable.database | default(false)
129129

130+
- name: Stop slurmd if configuration has changed
131+
service:
132+
name: "slurmd"
133+
state: stopped
134+
retries: 5
135+
register: slurmd_stop
136+
until: slurmd_stop is success
137+
delay: 30
138+
when:
139+
- openhpc_slurm_service_started | bool
140+
- openhpc_enable.batch | default(false) | bool
141+
- openhpc_slurm_control_host in ansible_play_hosts
142+
- hostvars[openhpc_slurm_control_host].ohpc_slurm_conf.changed or hostvars[openhpc_slurm_control_host].ohpc_gres_conf.changed # noqa no-handler
143+
144+
- name: Flush handler
145+
meta: flush_handlers # This will restart slurmctld while slurmd services are stopped, if needed
146+
130147
- name: Notify handler for slurmd restart
131148
debug:
132149
msg: "notifying handlers" # meta: noop doesn't support 'when'

0 commit comments

Comments
 (0)