Skip to content
111 changes: 82 additions & 29 deletions ansible/roles/basic_users/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,64 +5,117 @@ basic_users
Setup users on cluster nodes using `/etc/passwd` and manipulating `$HOME`, i.e.
without requiring LDAP etc. Features:
- UID/GID is consistent across cluster (and explicitly defined).
- SSH key generated and propagated to all nodes to allow login between cluster nodes.
- SSH key generated and propagated to all nodes to allow login between cluster
nodes.
- An "external" SSH key can be added to allow login from elsewhere.
- Login to the control node is prevented (by default)
- Login to the control node is prevented (by default).
- When deleting users, systemd user sessions are terminated first.

Requirements
------------
- `$HOME` (for normal users, i.e. not `rocky`) is assumed to be on a shared
filesystem. Actions affecting that shared filesystem are run on a single host,
see `basic_users_manage_homedir` below.
> [!IMPORTANT] This role assumes that `$HOME` for users managed by this role
(e.g. not `rocky` and other system users) is on a shared filesystem. The export
of this shared filesystem may be root squashed if its server is in the
`basic_user` group - see configuration examples below.

Role Variables
--------------

- `basic_users_users`: Optional, default empty list. A list of mappings defining information for each user. In general, mapping keys/values are passed through as parameters to [ansible.builtin.user](https://docs.ansible.com/ansible/latest/collections/ansible/builtin/user_module.html) and default values are as given there. However:
- `create_home`, `generate_ssh_key` and `ssh_key_comment` are set automatically; this assumes home directories are on a cluster-shared filesystem.
- `uid` should be set, so that the UID/GID is consistent across the cluster (which Slurm requires).
- `shell` if *not* set will be `/sbin/nologin` on the `control` node and the default shell on other users. Explicitly setting this defines the shell for all nodes.
- `create_home` and `generate_ssh_key`: Normally set automatically. Can be
set `false` if necessary to disable home directory creation/cluster ssh
key creation. Should not be set `true` to avoid trying to modify home
directories from multiple nodes simultaneously.
- `ssh_key_comment`: Default is user name.
- `home`: Set automatically based on the user name and
`basic_users_homedir_host_path`. Can be overriden if required for e.g.
users with non-standard home directory paths.
- `uid`: Should be set, so that the UID/GID is consistent across the cluster
(which Slurm requires).
- `shell`: If *not* set will be `/sbin/nologin` on the `control` node to
prevent users logging in to this node, and the default shell on other
nodes. Explicitly setting this defines the shell for all nodes and if the
shared home directories are mounted on the control node will allow the
user to log in to the control node.
- An additional key `public_key` may optionally be specified to define a key to log into the cluster.
- An additional key `sudo` may optionally be specified giving a string (possibly multiline) defining sudo rules to be templated.
- `ssh_key_type` defaults to `ed25519` instead of the `ansible.builtin.user` default of `rsa`.
- Any other keys may present for other purposes (i.e. not used by this role).
- `basic_users_groups`: Optional, default empty list. A list of mappings defining information for each group. Mapping keys/values are passed through as parameters to [ansible.builtin.group](https://docs.ansible.com/ansible/latest/collections/ansible/builtin/group_module.html) and default values are as given there.
- `basic_users_override_sssd`: Optional bool, default false. Whether to disable `sssd` when ensuring users/groups exist with this role. Permits creating local users/groups even if they clash with users provided via sssd (e.g. from LDAP). Ignored if host is not in group `sssd` as well. Note with this option active `sssd` will be stopped and restarted each time this role is run.
- `basic_users_manage_homedir`: Optional bool, must be true on a single host to
determine which host runs tasks affecting the shared filesystem. The default
is to use the first play host which is not the control node, because the
default NFS configuration does not have the shared `/home` directory mounted
on the control node.
- `basic_users_homedir_host`: Optional inventory hostname defining the host
to use to create home directories. If the home directory export is root
squashed, this host *must* be the home directory server. Default is the
`control` node which is appropriate for the default appliance configuration.
Not relevant if `create_home` is false for all users.
- `basic_users_homedir_host_path`: Optional path prefix for home directories on
the `basic_users_homedir_host`, i.e. on the "server side". Default is
`/exports/home` which is appropriate for the default appliance configuration.

Dependencies
------------

None.

Example Playbook
----------------
Example Configurations
----------------------

```yaml
- hosts: basic_users
become: yes
gather_facts: yes
tasks:
- import_role:
name: basic_users
```

Example variables, to create user `alice` and delete user `bob`:
With default appliance NFS configuration, create user `alice` with access
to all nodes except the control node, and delete user `bob`:

```yaml
basic_users_users:
- comment: Alice Aardvark
name: alice
uid: 2005
public_key: ssh-rsa ...
public_key: ssh-ed25519 ...
- comment: Bob Badger
name: bob
uid: 2006
public_key: ssh-rsa ...
public_key: ssh-ed25519 ...
state: absent
```

Using an external share which:
- does not root squash (so this role can create directories on it)
- is mounted to all nodes including the control node (so this role can set
authorized keys there)

Create user `Carol`:

```yaml
basic_users_homedir_host: "{{ ansible_play_hosts | first }}" # doesn't matter which host is used
basic_users_homedir_host_path: /home # homedir_host is client not server
basic_users_user:
- comment: Carol Crane
name: carol
uid: 2007
public_key: ssh-ed25519 ...
```

Using an external share which *does* root squash, so home directories cannot be
created by this role and must already exist, create user `Dan`:

```yaml
basic_users_homedir_host: "{{ ansible_play_hosts | first }}"
basic_users_homedir_host_path: /home
basic_users_users:
- comment: Dan Deer
create_home: false
name: dan
uuid: 2008
public_key: ssh-ed25519 ...
```

Using NFS exported from the control node, but mounted to all nodes (so that
authorized keys applies to all nodes), create user `Erin` with passwordless sudo:

```yaml
basic_users_users:
- comment: Erin Eagle
name: erin
uid: 2009
shell: /bin/bash # override default nologin on control
groups:
- adm # enables ssh to compute nodes even without a job running
sudo: erin ALL=(ALL) NOPASSWD:ALL
public_key: ssh-ed25519 ...
```
9 changes: 5 additions & 4 deletions ansible/roles/basic_users/defaults/main.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
basic_users_manage_homedir: "{{ ansible_hostname == (ansible_play_hosts | difference(groups['control']) | first) }}"
basic_users_homedir_host: "{{ groups['control'] | first }}" # no way, generally, to find the nfs_server
basic_users_homedir_host_path: /exports/home
# _basic_users_manage_homedir: "{{ ansible_hostname == basic_users_homedir_host }}"
basic_users_userdefaults:
state: present
create_home: "{{ basic_users_manage_homedir }}"
generate_ssh_key: "{{ basic_users_manage_homedir }}"
state: present # need this here so don't have to add default() everywhere
generate_ssh_key: true
ssh_key_comment: "{{ item.name }}"
ssh_key_type: ed25519
shell: "{{'/sbin/nologin' if 'control' in group_names else omit }}"
Expand Down
133 changes: 98 additions & 35 deletions ansible/roles/basic_users/tasks/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,53 +21,116 @@
ansible.builtin.group: "{{ item }}"
loop: "{{ basic_users_groups }}"

- name: Create users and generate public keys
user: "{{ basic_users_userdefaults | combine(item) | filter_user_params() }}"
- name: Create users
user: "{{ basic_users_userdefaults | combine(item) | filter_user_params() | combine(_disable_homedir) }}"
loop: "{{ basic_users_users }}"
loop_control:
label: "{{ item.name }} [{{ item.state | default('present') }}]"
register: basic_users_info
label: "{{ item.name }}"
vars:
_disable_homedir: # ensure this task doesn't touch $HOME
create_home: false
generate_ssh_key: false

- name: Write sudo rules
blockinfile:
path: /etc/sudoers.d/80-{{ item.name }}-user
block: "{{ item.sudo }}"
create: true
loop: "{{ basic_users_users }}"
loop_control:
label: "{{ item.name }}"
when:
- item.state | default('present') == 'present'
- "'sudo' in item"

- name: Restart sssd if required
systemd:
name: sssd
state: started
when: _stop_sssd is changed

- name: Write supplied public key as authorized for SSH access
authorized_key:
user: "{{ item.name }}"
state: present
key: "{{ item.public_key }}"
# This task runs (only) on the home directory server, if in the group, so it can
# handle root squashed exports
- name: Create home directories
# doesn't delete with state=absent, same as ansible.builtin.user
ansible.builtin.copy:
remote_src: true
src: "{{ item.skeleton | default('/etc/skel/') }}"
dest: "{{ item.home | default( basic_users_homedir_host_path + '/' + item.name ) }}"
owner: "{{ item.name }}"
group: "{{ item.name }}"
mode: u=rwX,go=
delegate_to: "{{ basic_users_homedir_host }}"
run_once: true
loop: "{{ basic_users_users }}"
loop_control:
label: "{{ item.name }} [{{ item.state | default('present') }}]"
label: "{{ item.name }}"
when:
- item.state | default('present') == 'present'
- item.public_key is defined
- basic_users_manage_homedir
- item.create_home | default(true) | bool

- name: Write generated public key as authorized for SSH access
# this only runs on the basic_users_manage_homedir so has registered var
# from that host too
authorized_key:
user: "{{ item.name }}"
state: present
manage_dir: no
key: "{{ item.ssh_public_key }}"
loop: "{{ basic_users_info.results }}"
loop_control:
label: "{{ item.name }}"
when:
- item.ssh_public_key is defined
- basic_users_manage_homedir
# The following tasks deliberately run on a (single) *client* node, so that
# home directory paths are easily constructed, becoming each user so that root
# squash doesn't matter
- delegate_to: "{{ groups['basic_users'] | difference([basic_users_homedir_host]) | first }}"
run_once: true
block:
- name: Create ~/.ssh directories
file:
state: directory
path: ~/.ssh/
owner: "{{ item.name }}"
group: "{{ item.name }}"
mode: u=rwX,go=
become_user: "{{ item.name }}"
loop: "{{ basic_users_users }}"
loop_control:
label: "{{ item.name }}"
when:
- item.state | default('present') == 'present'

- name: Write sudo rules
blockinfile:
path: /etc/sudoers.d/80-{{ item.name}}-user
block: "{{ item.sudo }}"
create: true
loop: "{{ basic_users_users }}"
loop_control:
label: "{{ item.name }}"
when: "'sudo' in item"
- name: Generate cluster ssh key
community.crypto.openssh_keypair:
path: "{{ item.ssh_key_file | default('~/.ssh/id_' + _ssh_key_type )}}" # NB: ssh_key_file is from ansible.builtin.user
type: "{{ _ssh_key_type }}"
comment: "{{ item.ssh_key_comment | default(item.name) }}"
vars:
_ssh_key_type: "{{ item.ssh_key_type | default('ed25519') }}"
become_user: "{{ item.name }}"
loop: "{{ basic_users_users }}"
loop_control:
label: "{{ item.name }}"
when:
- item.state | default('present') == 'present'
- item.generate_ssh_key | default(true) | bool
register: _cluster_ssh_keypair

- name: Write generated cluster ssh key to authorized_keys
ansible.posix.authorized_key:
user: "{{ item.item.name }}"
state: present
manage_dir: false
key: "{{ item.public_key }}"
path: ~/.ssh/authorized_keys
become_user: "{{ item.item.name }}"
loop: "{{ _cluster_ssh_keypair.results }}"
loop_control:
label: "{{ item.item.name }}"
when:
- item.item.state | default('present') == 'present'
- "'public_key' in item"

- name: Write supplied public key to authorized_keys
ansible.posix.authorized_key:
user: "{{ item.name }}"
state: present
manage_dir: false
key: "{{ item.public_key }}"
path: ~/.ssh/authorized_keys
become_user: "{{ item.name }}"
loop: "{{ basic_users_users }}"
loop_control:
label: "{{ item.name }}"
when:
- item.state | default('present') == 'present'
- item.public_key is defined
6 changes: 4 additions & 2 deletions ansible/roles/hpctests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,10 @@ Requirements

Role Variables
--------------

- `hpctests_rootdir`: Required. Path to root of test directory tree, which must be on a r/w filesystem shared to all cluster nodes under test. The last directory component will be created.
- `hpctests_user`: Optional. User to run jobs as. Default is `ansible_user`.
- `hpctests_rootdir`: Optional. Path to root of test directory tree. This must
be a r/w filesystem shared to all cluster nodes under test. Default is
`/home/{{ hpctests_user }}/hpctests`. **NB:** Do not use `~` in this path.
- `hpctests_partition`: Optional. Name of partition to use, otherwise default partition is used.
- `hpctests_nodes`: Optional. A Slurm node expression, e.g. `'compute-[0-15,19]'` defining the nodes to use. If not set all nodes in the selected partition are used.
- `hpctests_ucx_net_devices`: Optional. Control which network device/interface to use, e.g. `mlx5_1:0`. The default of `all` (as per UCX) may not be appropriate for multi-rail nodes with different bandwidths on each device. See [here](https://openucx.readthedocs.io/en/master/faq.html#what-is-the-default-behavior-in-a-multi-rail-environment) and [here](https://github.com/openucx/ucx/wiki/UCX-environment-parameters#setting-the-devices-to-use). Alternatively a mapping of partition name (as `hpctests_partition`) to device/interface can be used. For partitions not defined in the mapping the default of `all` is used.
Expand Down
3 changes: 2 additions & 1 deletion ansible/roles/hpctests/defaults/main.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
hpctests_rootdir:
hpctests_user: "{{ ansible_user }}"
hpctests_rootdir: "/home/{{ hpctests_user }}/hpctests"
hpctests_pre_cmd: ''
hpctests_pingmatrix_modules: [gnu12 openmpi4]
hpctests_pingpong_modules: [gnu12 openmpi4 imb]
Expand Down
6 changes: 3 additions & 3 deletions ansible/roles/hpctests/library/plot_nxnlatbw.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# Apache 2 License

from ansible.module_utils.basic import AnsibleModule
import json
import json, os

ANSIBLE_METADATA = {
"metadata_version": "0.1",
Expand Down Expand Up @@ -109,8 +109,8 @@ def run_module():
module = AnsibleModule(argument_spec=module_args, supports_check_mode=True)
result = {"changed": False}

src = module.params["src"]
dest = module.params["dest"]
src = os.path.expanduser(module.params["src"])
dest = os.path.expanduser(module.params["dest"])
nodes = module.params["nodes"]
if nodes is not None:
nodes = nodes.split(',')
Expand Down
3 changes: 1 addition & 2 deletions ansible/roles/hpctests/tasks/build-hpl.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,6 @@

- name: Build HPL executable
shell:
cmd: "sbatch --wait hpl-build-{{ hpctests_hpl_arch }}.sh"
cmd: "bash -l -c 'sbatch --wait hpl-build-{{ hpctests_hpl_arch }}.sh'" # need login shell for module command
chdir: "{{ hpctests_hpl_srcdir }}"
creates: "bin/{{ hpctests_hpl_arch }}/xhpl"
become: no
3 changes: 1 addition & 2 deletions ansible/roles/hpctests/tasks/hpl-solo.yml
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,7 @@
cmd: "rm -f {{ hpctests_rootdir }}/hpl-solo/hpl-solo.sh.*.out"

- name: Run hpl-solo
shell: sbatch --wait hpl-solo.sh
become: no
shell: bash -l -c 'sbatch --wait hpl-solo.sh' # need login shell for module command
args:
chdir: "{{ hpctests_rootdir }}/hpl-solo"
async: "{{ 20 * 60 }}" # wait for up to 20 minutes
Expand Down
Loading