Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .github/workflows/stackhpc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -182,9 +182,8 @@ jobs:
run: |
. venv/bin/activate
. environments/.stackhpc/activate
ansible-playbook -v --limit compute ansible/adhoc/rebuild.yml
ansible-playbook -v ansible/ci/check_slurm.yml
ansible-playbook -v ansible/adhoc/reboot_via_slurm.yml
ansible-playbook -v ansible/ci/check_slurm.yml

- name: Check sacct state survived reimage
run: |
Expand Down
9 changes: 6 additions & 3 deletions ansible/roles/compute_init/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,13 @@
Experimental functionality to allow compute nodes to rejoin the cluster after
a reboot without running the `ansible/site.yml` playbook.

**CAUTION:** The approach used here of exporting cluster secrets over NFS
is considered to be a security risk due to the potential for cluster users to
mount the share on a user-controlled machine by tunnelling through a login
node. This feature should not be enabled on production clusters at this time.

To enable this:
1. Add the `compute` group (or a subset) into the `compute_init` group. This is
the default when using cookiecutter to create an environment, via the
"everything" template.
1. Add the `compute` group (or a subset) into the `compute_init` group.
2. Build an image which includes the `compute_init` group. This is the case
for StackHPC-built release images.
3. Enable the required functionalities during boot, by setting the
Expand Down
3 changes: 3 additions & 0 deletions environments/.stackhpc/inventory/extra_groups
Original file line number Diff line number Diff line change
Expand Up @@ -41,3 +41,6 @@ control

[cacerts:children]
cluster

[compute_init:children]
compute
3 changes: 1 addition & 2 deletions environments/common/layouts/everything
Original file line number Diff line number Diff line change
Expand Up @@ -93,9 +93,8 @@ cluster
[sshd]
# Hosts where the OpenSSH server daemon should be configured

[compute_init:children]
[compute_init]
# EXPERIMENTAL: Compute hosts to enable joining cluster on boot on
compute

[k3s:children]
# Hosts to run k3s server/agent
Expand Down