move most compute-init docs to the role readme

sjpb · sjpb · commit 5e7f809276ff · 2024-12-17T12:05:17.000Z
diff --git a/ansible/roles/compute_init/README.md b/ansible/roles/compute_init/README.md
@@ -0,0 +1,119 @@
+# EXPERIMENTAL: compute-init
+
+Experimental / in-progress functionality to allow compute nodes to rejoin the
+cluster after a reboot.
+
+To enable this add compute nodes (or a subset of them into) the `compute_init`
+group.
+
+This works as follows:
+1. During image build, an ansible-init playbook and supporting files
+(e.g. templates, filters, etc) are installed.
+2. Cluster instances are created as usual; the above compute-init playbook does
+not run.
+3. The `site.yml` playbook is run as usual to configure all the instances into
+a cluster. In addition, with `compute-init` enabled, a `/exports/cluster` NFS
+share is created on the control node containing:
+    - an /etc/hosts file for the cluster
+    - Hostvars for each compute node
+4. On reboot of a compute node, ansible-init runs the compute-init playbook
+which:
+    a. Checks whether the `enable_compute` metadata flag is set, and exits if
+       not.
+    b. Tries to mount the above `/exports/cluster` NFS share from the control
+       node, and exits if it cannot.
+    c. Configures itself using the exported hostvars, depending on the
+       `enable_*` flags set in metadata.
+    d. Issues an `scontrol` command to resume the node (because Slurm will
+       consider it as "unexpectedly rebooted").
+
+The check in 4b. above is what prevents the compute-init script from trying
+to configure the node before the services on the control node are available
+(which requires running the site.yml playbook).
+
+The following roles are currently fully functional:
+- `resolv_conf`
+- `etc_hosts`
+- `stackhpc.openhpc`
+
+# Development/debugging
+
+To develop/debug this without actually having to build an image:
+
+
+1. Deploy a cluster using tofu and ansible/site.yml as normal. This will
+   additionally configure the control node to export compute hostvars over NFS.
+   Check the cluster is up.
+
+2. Reimage the compute nodes:
+
+        ansible-playbook --limit compute ansible/adhoc/rebuild
+
+3. Add metadata to a compute node e.g. via Horzon to turn on compute-init
+   playbook functionality.
+
+4. Fake an image build to deploy the compute-init playbook:
+
+        ansible-playbook ansible/fatimage.yml --tags compute_init
+
+    NB: This will also re-export the compute hostvars, as the nodes are not
+    in the builder group, which conveniently means any changes made to that
+    play also get picked up.
+
+5. Fake a reimage of compute to run ansible-init and the compute-init playbook:
+
+    On compute node where metadata was added:
+
+        [root@rl9-compute-0 rocky]# rm -f /var/lib/ansible-init.done && systemctl restart ansible-init
+        [root@rl9-compute-0 rocky]# systemctl status ansible-init
+
+    Use `systemctl status ansible-init` to view stdout/stderr from Ansible.
+
+Steps 4/5 can be repeated with changes to the compute script. If required,
+reimage the compute node(s) first as in step 2 and/or add additional metadata
+as in step 3.
+
+
+# Design notes
+- Duplicating code in roles into the `compute-init` script is unfortunate, but
+  does allow developing this functionality without wider changes to the
+  appliance.
+
+- In general, we don't want to rely on NFS export. So should e.g. copy files
+  from this mount ASAP in the compute-init script. TODO:
+
+- There are a couple of approaches to supporting existing roles using `compute-init`:
+
+  1. Control node copies files resulting from role into cluster exports,
+     compute-init copies to local disk. Only works if files are not host-specific
+     Examples: etc_hosts, eessi config?
+  
+  2. Re-implement the role. Works if the role vars are not too complicated,
+     (else they all need to be duplicated in compute-init). Could also only
+     support certain subsets of role functionality or variables
+     Examples: resolv_conf, stackhpc.openhpc
+
+- Some hostvars are tempalted from hostvars from other nodes, which aren't
+  available in the current approach:
+
+    ```
+    [root@rl9-compute-0 rocky]# grep hostvars /mnt/cluster/hostvars/rl9-compute-0/hostvars.yml 
+        "grafana_address": "{{ hostvars[groups['grafana'].0].api_address }}",
+        "grafana_api_address": "{{ hostvars[groups['grafana'].0].internal_address }}",
+        "mysql_host": "{{ hostvars[groups['mysql'] | first].api_address }}",
+        "nfs_server_default": "{{ hostvars[groups['control'] | first ].internal_address }}",
+        "openhpc_slurm_control_host": "{{ hostvars[groups['control'].0].api_address }}",
+        "openondemand_address": "{{ hostvars[groups['openondemand'].0].api_address if groups['openondemand'] | count > 0 else '' }}",
+        "openondemand_node_proxy_directives": "{{ _opeonondemand_unset_auth if (openondemand_auth == 'basic_pam' and 'openondemand_host_regex' and groups['grafana'] | length > 0 and hostvars[ groups['grafana']  | first]._grafana_auth_is_anonymous) else '' }}",
+        "openondemand_servername": "{{ hostvars[ groups['openondemand'] | first].ansible_host }}",
+        "prometheus_address": "{{ hostvars[groups['prometheus'].0].api_address }}",
+            "{{ hostvars[groups['freeipa_server'].0].ansible_host }}"
+    ```
+
+    More generally, there is nothing to stop any group var depending on a
+    "{{ hostvars[] }}" interpolation ...
+
+    Currently, the only functionality this has been problematic for is setting
+    the control node address for the slurmd node, which has been done using
+    the (metadata-provided) IP, given this is needed to do the NFS mount anyway
+    in the absence of working internal DNS.
diff --git a/docs/experimental/compute-init.md b/docs/experimental/compute-init.md
@@ -1,46 +1,6 @@
 # compute-init
 
-The following roles are currently functional:
-- resolv_conf
-- etc_hosts
-- stackhpc.openhpc
-
-# Development
-
-To develop/debug this without actually having to build an image:
-
-
-1. Deploy a cluster using tofu and ansible/site.yml as normal. This will
-   additionally configure the control node to export compute hostvars over NFS.
-   Check the cluster is up.
-
-2. Reimage the compute nodes:
-
-        ansible-playbook --limit compute ansible/adhoc/rebuild
-
-3. Add metadata to a compute node e.g. via Horzon to turn on compute-init
-   playbook functionality.
-
-4. Fake an image build to deploy the compute-init playbook:
-
-        ansible-playbook ansible/fatimage.yml --tags compute_init
-
-    NB: This will also re-export the compute hostvars, as the nodes are not
-    in the builder group, which conveniently means any changes made to that
-    play also get picked up.
-
-5. Fake a reimage of compute to run ansible-init and the compute-init playbook:
-
-    On compute node where metadata was added:
-
-        [root@rl9-compute-0 rocky]# rm -f /var/lib/ansible-init.done && systemctl restart ansible-init
-        [root@rl9-compute-0 rocky]# systemctl status ansible-init
-
-    Use `systemctl status ansible-init` to view stdout/stderr from Ansible.
-
-Steps 4/5 can be repeated with changes to the compute script. If required,
-reimage the compute node(s) first as in step 2 and/or add additional metadata
-as in step 3.
+See the role README.md
 
 # Results/progress
 
@@ -151,41 +111,3 @@ This commit - shows that hostvars have loaded:
     Dec 13 21:06:20 rl9-compute-0.rl9.invalid ansible-init[27585]: [INFO] writing sentinel file /var/lib/ansible-init.done
     Dec 13 21:06:20 rl9-compute-0.rl9.invalid ansible-init[27585]: [INFO] ansible-init completed successfully
     Dec 13 21:06:20 rl9-compute-0.rl9.invalid systemd[1]: Finished ansible-init.service.
-
-# Design notes
-
-- In general, we don't want to rely on NFS export. So should e.g. copy files
-  from this mount ASAP in the compute-init script. TODO:
-- There are a few possible approaches:
-
-  1. Control node copies files resulting from role into cluster exports,
-     compute-init copies to local disk. Only works if files are not host-specific
-     Examples: etc_hosts, eessi config?
-  
-  2. Re-implement the role. Works if the role vars are not too complicated,
-     (else they all need to be duplicated in compute-init). Could also only
-     support certain subsets of role functionality or variables
-     Examples: resolv_conf, stackhpc.openhpc
-
-
-# Problems with templated hostvars
-
-Here are all the ones which actually rely on hostvars from other nodes,
-which therefore aren't available:
-
-```
-[root@rl9-compute-0 rocky]# grep hostvars /mnt/cluster/hostvars/rl9-compute-0/hostvars.yml 
-    "grafana_address": "{{ hostvars[groups['grafana'].0].api_address }}",
-    "grafana_api_address": "{{ hostvars[groups['grafana'].0].internal_address }}",
-    "mysql_host": "{{ hostvars[groups['mysql'] | first].api_address }}",
-    "nfs_server_default": "{{ hostvars[groups['control'] | first ].internal_address }}",
-    "openhpc_slurm_control_host": "{{ hostvars[groups['control'].0].api_address }}",
-    "openondemand_address": "{{ hostvars[groups['openondemand'].0].api_address if groups['openondemand'] | count > 0 else '' }}",
-    "openondemand_node_proxy_directives": "{{ _opeonondemand_unset_auth if (openondemand_auth == 'basic_pam' and 'openondemand_host_regex' and groups['grafana'] | length > 0 and hostvars[ groups['grafana']  | first]._grafana_auth_is_anonymous) else '' }}",
-    "openondemand_servername": "{{ hostvars[ groups['openondemand'] | first].ansible_host }}",
-    "prometheus_address": "{{ hostvars[groups['prometheus'].0].api_address }}",
-        "{{ hostvars[groups['freeipa_server'].0].ansible_host }}"
-```
-
-More generally, there is nothing to stop any group var depending on a
-"{{ hostvars[] }}" interpolation ...