Skip to content

Commit 4f81b89

Browse files
committed
fix nfs and make openhpc fully-capable in compute-init
1 parent 79f52f9 commit 4f81b89

File tree

5 files changed

+26
-22
lines changed

5 files changed

+26
-22
lines changed

ansible/roles/compute_init/README.md

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,7 @@ The following roles are currently fully functional:
3535
- `resolv_conf`: all functionality
3636
- `etc_hosts`: all functionality
3737
- `nfs`: client functionality only
38-
- `stackhpc.openhpc`: all functionality, except that the control server name
39-
must be the control node's `inventory_hostname`; `openhpc_slurm_control_host`
40-
and `openhpc_slurm_control_host_address` are ignored.
38+
- `stackhpc.openhpc`: all functionality
4139

4240
# Development/debugging
4341

@@ -96,8 +94,8 @@ as in step 3.
9694
support certain subsets of role functionality or variables
9795
Examples: resolv_conf, stackhpc.openhpc
9896

99-
- Some hostvars are tempalted from hostvars from other nodes, which aren't
100-
available in the current approach:
97+
- Some variables are defined using hostvars from other nodes, which aren't
98+
available v the current approach:
10199

102100
```
103101
[root@rl9-compute-0 rocky]# grep hostvars /mnt/cluster/hostvars/rl9-compute-0/hostvars.yml
@@ -116,8 +114,12 @@ as in step 3.
116114
More generally, there is nothing to stop any group var depending on a
117115
"{{ hostvars[] }}" interpolation ...
118116
119-
Currently, this has been worked around for the following cases:
120-
- The inventory hostname for the control node, indirected via `.api_address`
121-
in the above hostvars. This is needed for the default nfs configuration
122-
and the slurmctld namne. For compute-init this has been Defined using
123-
"{{ groups['control'] | first }}" as the hostvars do include the groups.
117+
Only `nfs_server_default` and `openhpc_slurm_control_host` are of concern
118+
for compute nodes - both of these indirect via `api_address` to
119+
`inventory_hostname`. This has been worked around by replacing this with
120+
"{{ groups['control'] | first }}" which does result in the control node
121+
inventory hostname when templating.
122+
123+
Note that although `groups` is defined in the templated hostvars, when
124+
the hostvars are loaded using `include_vars:` is is ignored as it is a
125+
"magic variable" determined by ansible itself and cannot be set.

ansible/roles/compute_init/files/compute-init.yml

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@
99
enable_compute: "{{ os_metadata.meta.enable_compute | default(false) | bool }}"
1010
enable_resolv_conf: "{{ os_metadata.meta.enable_resolv_conf | default(false) | bool }}"
1111
enable_etc_hosts: "{{ os_metadata.meta.enable_etc_hosts | default(false) | bool }}"
12+
enable_nfs: "{{ os_metadata.meta.enable_nfs | default(false) | bool }}"
1213

1314
# TODO: "= role defaults" - could be moved to a vars_file: on play with similar precedence effects
14-
# this is a good example: common environment actually defines this (non-functional w/o compute groups), but role default is empty
1515
resolv_conf_nameservers: []
1616

1717
nfs_client_mnt_point: "/mnt"
@@ -20,9 +20,8 @@
2020
nfs_configurations:
2121
nfs_enable:
2222
clients: false
23-
nfs_enable:
24-
server: false
25-
clients: false
23+
24+
# openhpc: no defaults required
2625

2726
tasks:
2827
- block:
@@ -106,8 +105,10 @@
106105

107106
# TODO: - name: NFS client mount
108107
- name: If nfs-clients is present
109-
include_tasks: nfs-clients.yml
110-
when: nfs_enable.clients | bool or ('nfs_enable' in item and item.nfs_enable.clients | bool)
108+
include_tasks: ../tasks/nfs-clients.yml
109+
when:
110+
- enable_nfs
111+
- nfs_enable.clients | bool or ('nfs_enable' in item and item.nfs_enable.clients | bool)
111112
loop: "{{ nfs_configurations }}"
112113

113114
# TODO: - name: Manila mount
@@ -130,7 +131,7 @@
130131
- name: Set slurmctld location for configless operation
131132
lineinfile:
132133
path: /etc/sysconfig/slurmd
133-
line: "SLURMD_OPTIONS='--conf-server {{ groups['control'] | first }}'"
134+
line: "SLURMD_OPTIONS='--conf-server {{ openhpc_slurm_control_host_address | default(openhpc_slurm_control_host) }}'"
134135
regexp: "^SLURMD_OPTIONS="
135136
create: yes
136137
owner: root
@@ -152,3 +153,4 @@
152153
- name: Ensure node is resumed
153154
# TODO: consider if this is always safe for all job states?
154155
command: scontrol update state=resume nodename={{ ansible_hostname }}
156+
# TODO: make safe for repeated runs

ansible/roles/compute_init/tasks/install.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
- files
1313
- library
1414
- filter_plugins
15-
- playbooks
15+
- tasks
1616

1717
- name: Inject files from roles
1818
copy:
@@ -35,7 +35,7 @@
3535
- src: ../../basic_users/filter_plugins/filter_keys.py
3636
dest: filter_plugins/filter_keys.py
3737
- src: ../../stackhpc.nfs/tasks/nfs-clients.yml
38-
dest: playbooks/nfs-clients.yml
38+
dest: tasks/nfs-clients.yml
3939

4040
- name: Add filter_plugins to ansible.cfg
4141
lineinfile:
@@ -52,4 +52,4 @@
5252
dest: /etc/ansible-init/playbooks/1-compute-init.yml
5353
owner: root
5454
group: root
55-
mode: 0644
55+
mode: 0644

environments/common/inventory/group_vars/all/nfs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
# See: https://github.com/stackhpc/ansible-role-cluster-nfs
44
# for variable definitions
55

6-
nfs_server_default: "{{ groups['control'] | first }}" # avoid using hostvars so
6+
nfs_server_default: "{{ groups['control'] | first }}" # avoid using hostvars for compute-init
77

88
nfs_configurations:
99
- comment: Export /exports/home from Slurm control node as /home

environments/common/inventory/group_vars/all/openhpc.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ openhpc_slurm_accounting_storage_type: 'accounting_storage/slurmdbd'
1313
openhpc_slurmdbd_mysql_database: slurm_acct_db
1414
openhpc_slurmdbd_mysql_password: "{{ vault_mysql_slurm_password }}"
1515
openhpc_slurmdbd_mysql_username: slurm
16-
openhpc_slurm_control_host: "{{ hostvars[groups['control'].0].api_address }}"
16+
openhpc_slurm_control_host: "{{ groups['control'] | first }}" # avoid using hostvars for compute-init
1717
openhpc_slurmdbd_host: "{{ openhpc_slurm_control_host }}"
1818
openhpc_slurm_partitions:
1919
- name: "compute"

0 commit comments

Comments
 (0)