1
1
# compute-init
2
2
3
- TODO: describe current status.
3
+ The following roles are currently functional:
4
+ - resolv_conf
5
+ - etc_hosts
6
+ - stackhpc.openhpc
4
7
5
8
# Development
6
9
7
10
To develop/debug this without actually having to build an image:
8
11
9
12
10
13
1 . Deploy a cluster using tofu and ansible/site.yml as normal. This will
11
- additionally configure the control node to export compute hosts over NFS.
14
+ additionally configure the control node to export compute hostvars over NFS.
12
15
Check the cluster is up.
13
16
14
17
2 . Reimage the compute nodes:
@@ -22,6 +25,10 @@ To develop/debug this without actually having to build an image:
22
25
23
26
ansible-playbook ansible/fatimage.yml --tags compute_init
24
27
28
+ NB: This will also re-export the compute hostvars, as the nodes are not
29
+ in the builder group, which conveniently means any changes made to that
30
+ play also get picked up.
31
+
25
32
5 . Fake a reimage of compute to run ansible-init and the compute-init playbook:
26
33
27
34
On compute node where metadata was added:
@@ -31,8 +38,9 @@ To develop/debug this without actually having to build an image:
31
38
32
39
Use ` systemctl status ansible-init ` to view stdout/stderr from Ansible.
33
40
34
- Steps 4/5 can be repeated with changes to the compute script. If desirable
35
- reimage the compute node(s) first as in step 3.
41
+ Steps 4/5 can be repeated with changes to the compute script. If required,
42
+ reimage the compute node(s) first as in step 2 and/or add additional metadata
43
+ as in step 3.
36
44
37
45
# Results/progress
38
46
@@ -144,3 +152,40 @@ This commit - shows that hostvars have loaded:
144
152
Dec 13 21:06:20 rl9-compute-0.rl9.invalid ansible-init[27585]: [INFO] ansible-init completed successfully
145
153
Dec 13 21:06:20 rl9-compute-0.rl9.invalid systemd[1]: Finished ansible-init.service.
146
154
155
+ # Design notes
156
+
157
+ - In general, we don't want to rely on NFS export. So should e.g. copy files
158
+ from this mount ASAP in the compute-init script. TODO:
159
+ - There are a few possible approaches:
160
+
161
+ 1 . Control node copies files resulting from role into cluster exports,
162
+ compute-init copies to local disk. Only works if files are not host-specific
163
+ Examples: etc_hosts, eessi config?
164
+
165
+ 2 . Re-implement the role. Works if the role vars are not too complicated,
166
+ (else they all need to be duplicated in compute-init). Could also only
167
+ support certain subsets of role functionality or variables
168
+ Examples: resolv_conf, stackhpc.openhpc
169
+
170
+
171
+ # Problems with templated hostvars
172
+
173
+ Here are all the ones which actually rely on hostvars from other nodes,
174
+ which therefore aren't available:
175
+
176
+ ```
177
+ [root@rl9-compute-0 rocky]# grep hostvars /mnt/cluster/hostvars/rl9-compute-0/hostvars.yml
178
+ "grafana_address": "{{ hostvars[groups['grafana'].0].api_address }}",
179
+ "grafana_api_address": "{{ hostvars[groups['grafana'].0].internal_address }}",
180
+ "mysql_host": "{{ hostvars[groups['mysql'] | first].api_address }}",
181
+ "nfs_server_default": "{{ hostvars[groups['control'] | first ].internal_address }}",
182
+ "openhpc_slurm_control_host": "{{ hostvars[groups['control'].0].api_address }}",
183
+ "openondemand_address": "{{ hostvars[groups['openondemand'].0].api_address if groups['openondemand'] | count > 0 else '' }}",
184
+ "openondemand_node_proxy_directives": "{{ _opeonondemand_unset_auth if (openondemand_auth == 'basic_pam' and 'openondemand_host_regex' and groups['grafana'] | length > 0 and hostvars[ groups['grafana'] | first]._grafana_auth_is_anonymous) else '' }}",
185
+ "openondemand_servername": "{{ hostvars[ groups['openondemand'] | first].ansible_host }}",
186
+ "prometheus_address": "{{ hostvars[groups['prometheus'].0].api_address }}",
187
+ "{{ hostvars[groups['freeipa_server'].0].ansible_host }}"
188
+ ```
189
+
190
+ More generally, there is nothing to stop any group var depending on a
191
+ "{{ hostvars[ ] }}" interpolation ...
0 commit comments