You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+18-10Lines changed: 18 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,8 +31,7 @@ It requires an OpenStack cloud, and an Ansible "deploy host" with access to that
31
31
32
32
Before starting ensure that:
33
33
- You have root access on the deploy host.
34
-
- You can create instances using a Rocky 9 GenericCloud image (or an image based on that).
35
-
-**NB**: In general it is recommended to use the [latest released image](https://github.com/stackhpc/ansible-slurm-appliance/releases) which already contains the required packages. This is built and tested in StackHPC's CI.
34
+
- You can create instances from the [latest Slurm appliance image](https://github.com/stackhpc/ansible-slurm-appliance/releases), which already contains the required packages. This is built and tested in StackHPC's CI. Although you can use a Rocky Linux 9 GenericCloud instead, it is not recommended.
36
35
- You have an SSH keypair defined in OpenStack, with the private key available on the deploy host.
37
36
- Created instances have access to internet (note proxies can be setup through the appliance if necessary).
38
37
- Created instances have accurate/synchronised time (for VM instances this is usually provided by the hypervisor; if not or for bare metal instances it may be necessary to configure a time service via the appliance).
@@ -82,30 +81,39 @@ And generate secrets for it:
82
81
83
82
Create an OpenTofu variables file to define the required infrastructure, e.g.:
84
83
85
-
# environments/$ENV/terraform/terraform.tfvars:
84
+
# environments/$ENV/tofu/terraform.tfvars:
86
85
87
86
cluster_name = "mycluster"
88
-
cluster_net = "some_network" # *
89
-
cluster_subnet = "some_subnet" # *
87
+
cluster_networks = [
88
+
{
89
+
network = "some_network" # *
90
+
subnet = "some_subnet" # *
91
+
}
92
+
]
90
93
key_pair = "my_key" # *
91
94
control_node_flavor = "some_flavor_name"
92
-
login_nodes = {
93
-
login-0: "login_flavor_name"
95
+
login = {
96
+
# Arbitrary group name for these login nodes
97
+
interactive = {
98
+
nodes: ["login-0"]
99
+
flavor: "login_flavor_name" # *
100
+
}
94
101
}
95
102
cluster_image_id = "rocky_linux_9_image_uuid"
96
103
compute = {
104
+
# Group name used for compute node partition definition
97
105
general = {
98
106
nodes: ["compute-0", "compute-1"]
99
-
flavor: "compute_flavor_name"
107
+
flavor: "compute_flavor_name" # *
100
108
}
101
109
}
102
110
103
-
Variables marked `*` refer to OpenStack resources which must already exist. The above is a minimal configuration - for all variables and descriptions see `environments/$ENV/terraform/terraform.tfvars`.
111
+
Variables marked `*` refer to OpenStack resources which must already exist. The above is a minimal configuration - for all variables and descriptions see `environments/$ENV/tofu/variables.tf`.
104
112
105
113
To deploy this infrastructure, ensure the venv and the environment are [activated](#create-a-new-environment) and run:
Copy file name to clipboardExpand all lines: ansible/roles/basic_users/README.md
+12-3Lines changed: 12 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,16 +2,19 @@
2
2
basic_users
3
3
===========
4
4
5
-
Setup users on cluster nodes using `/etc/passwd` and manipulating `$HOME`, i.e. without requiring LDAP etc. Features:
5
+
Setup users on cluster nodes using `/etc/passwd` and manipulating `$HOME`, i.e.
6
+
without requiring LDAP etc. Features:
6
7
- UID/GID is consistent across cluster (and explicitly defined).
7
8
- SSH key generated and propagated to all nodes to allow login between cluster nodes.
8
9
- An "external" SSH key can be added to allow login from elsewhere.
9
-
- Login to the control node is prevented.
10
+
- Login to the control node is prevented (by default)
10
11
- When deleting users, systemd user sessions are terminated first.
11
12
12
13
Requirements
13
14
------------
14
-
- $HOME (for normal users, i.e. not `centos`) is assumed to be on a shared filesystem.
15
+
-`$HOME` (for normal users, i.e. not `rocky`) is assumed to be on a shared
16
+
filesystem. Actions affecting that shared filesystem are run on a single host,
17
+
see `basic_users_manage_homedir` below.
15
18
16
19
Role Variables
17
20
--------------
@@ -22,9 +25,15 @@ Role Variables
22
25
-`shell` if *not* set will be `/sbin/nologin` on the `control` node and the default shell on other users. Explicitly setting this defines the shell for all nodes.
23
26
- An additional key `public_key` may optionally be specified to define a key to log into the cluster.
24
27
- An additional key `sudo` may optionally be specified giving a string (possibly multiline) defining sudo rules to be templated.
28
+
-`ssh_key_type` defaults to `ed25519` instead of the `ansible.builtin.user` default of `rsa`.
25
29
- Any other keys may present for other purposes (i.e. not used by this role).
26
30
-`basic_users_groups`: Optional, default empty list. A list of mappings defining information for each group. Mapping keys/values are passed through as parameters to [ansible.builtin.group](https://docs.ansible.com/ansible/latest/collections/ansible/builtin/group_module.html) and default values are as given there.
27
31
-`basic_users_override_sssd`: Optional bool, default false. Whether to disable `sssd` when ensuring users/groups exist with this role. Permits creating local users/groups even if they clash with users provided via sssd (e.g. from LDAP). Ignored if host is not in group `sssd` as well. Note with this option active `sssd` will be stopped and restarted each time this role is run.
32
+
-`basic_users_manage_homedir`: Optional bool, must be true on a single host to
33
+
determine which host runs tasks affecting the shared filesystem. The default
34
+
is to use the first play host which is not the control node, because the
35
+
default NFS configuration does not have the shared `/home` directory mounted
0 commit comments