diff --git a/ansible/roles/basic_users/README.md b/ansible/roles/basic_users/README.md index e4dfa7790..7267b271b 100644 --- a/ansible/roles/basic_users/README.md +++ b/ansible/roles/basic_users/README.md @@ -5,64 +5,117 @@ basic_users Setup users on cluster nodes using `/etc/passwd` and manipulating `$HOME`, i.e. without requiring LDAP etc. Features: - UID/GID is consistent across cluster (and explicitly defined). -- SSH key generated and propagated to all nodes to allow login between cluster nodes. +- SSH key generated and propagated to all nodes to allow login between cluster + nodes. - An "external" SSH key can be added to allow login from elsewhere. -- Login to the control node is prevented (by default) +- Login to the control node is prevented (by default). - When deleting users, systemd user sessions are terminated first. -Requirements ------------- -- `$HOME` (for normal users, i.e. not `rocky`) is assumed to be on a shared - filesystem. Actions affecting that shared filesystem are run on a single host, - see `basic_users_manage_homedir` below. +> [!IMPORTANT] This role assumes that `$HOME` for users managed by this role +(e.g. not `rocky` and other system users) is on a shared filesystem. The export +of this shared filesystem may be root squashed if its server is in the +`basic_user` group - see configuration examples below. Role Variables -------------- - `basic_users_users`: Optional, default empty list. A list of mappings defining information for each user. In general, mapping keys/values are passed through as parameters to [ansible.builtin.user](https://docs.ansible.com/ansible/latest/collections/ansible/builtin/user_module.html) and default values are as given there. However: - - `create_home`, `generate_ssh_key` and `ssh_key_comment` are set automatically; this assumes home directories are on a cluster-shared filesystem. - - `uid` should be set, so that the UID/GID is consistent across the cluster (which Slurm requires). - - `shell` if *not* set will be `/sbin/nologin` on the `control` node and the default shell on other users. Explicitly setting this defines the shell for all nodes. + - `create_home` and `generate_ssh_key`: Normally set automatically. Can be + set `false` if necessary to disable home directory creation/cluster ssh + key creation. Should not be set `true` to avoid trying to modify home + directories from multiple nodes simultaneously. + - `ssh_key_comment`: Default is user name. + - `home`: Set automatically based on the user name and + `basic_users_homedir_host_path`. Can be overriden if required for e.g. + users with non-standard home directory paths. + - `uid`: Should be set, so that the UID/GID is consistent across the cluster + (which Slurm requires). + - `shell`: If *not* set will be `/sbin/nologin` on the `control` node to + prevent users logging in to this node, and the default shell on other + nodes. Explicitly setting this defines the shell for all nodes and if the + shared home directories are mounted on the control node will allow the + user to log in to the control node. - An additional key `public_key` may optionally be specified to define a key to log into the cluster. - An additional key `sudo` may optionally be specified giving a string (possibly multiline) defining sudo rules to be templated. - `ssh_key_type` defaults to `ed25519` instead of the `ansible.builtin.user` default of `rsa`. - Any other keys may present for other purposes (i.e. not used by this role). - `basic_users_groups`: Optional, default empty list. A list of mappings defining information for each group. Mapping keys/values are passed through as parameters to [ansible.builtin.group](https://docs.ansible.com/ansible/latest/collections/ansible/builtin/group_module.html) and default values are as given there. - `basic_users_override_sssd`: Optional bool, default false. Whether to disable `sssd` when ensuring users/groups exist with this role. Permits creating local users/groups even if they clash with users provided via sssd (e.g. from LDAP). Ignored if host is not in group `sssd` as well. Note with this option active `sssd` will be stopped and restarted each time this role is run. -- `basic_users_manage_homedir`: Optional bool, must be true on a single host to - determine which host runs tasks affecting the shared filesystem. The default - is to use the first play host which is not the control node, because the - default NFS configuration does not have the shared `/home` directory mounted - on the control node. +- `basic_users_homedir_host`: Optional inventory hostname defining the host + to use to create home directories. If the home directory export is root + squashed, this host *must* be the home directory server. Default is the + `control` node which is appropriate for the default appliance configuration. + Not relevant if `create_home` is false for all users. +- `basic_users_homedir_host_path`: Optional path prefix for home directories on + the `basic_users_homedir_host`, i.e. on the "server side". Default is + `/exports/home` which is appropriate for the default appliance configuration. Dependencies ------------ None. -Example Playbook ----------------- +Example Configurations +---------------------- -```yaml -- hosts: basic_users - become: yes - gather_facts: yes - tasks: - - import_role: - name: basic_users -``` - -Example variables, to create user `alice` and delete user `bob`: +With default appliance NFS configuration, create user `alice` with access +to all nodes except the control node, and delete user `bob`: ```yaml basic_users_users: - comment: Alice Aardvark name: alice uid: 2005 - public_key: ssh-rsa ... + public_key: ssh-ed25519 ... - comment: Bob Badger name: bob uid: 2006 - public_key: ssh-rsa ... + public_key: ssh-ed25519 ... state: absent ``` + +Using an external share which: + - does not root squash (so this role can create directories on it) + - is mounted to all nodes including the control node (so this role can set + authorized keys there) + +Create user `Carol`: + +```yaml +basic_users_homedir_host: "{{ ansible_play_hosts | first }}" # doesn't matter which host is used +basic_users_homedir_host_path: /home # homedir_host is client not server +basic_users_user: + - comment: Carol Crane + name: carol + uid: 2007 + public_key: ssh-ed25519 ... +``` + +Using an external share which *does* root squash, so home directories cannot be +created by this role and must already exist, create user `Dan`: + +```yaml +basic_users_homedir_host: "{{ ansible_play_hosts | first }}" +basic_users_homedir_host_path: /home +basic_users_users: + - comment: Dan Deer + create_home: false + name: dan + uuid: 2008 + public_key: ssh-ed25519 ... +``` + +Using NFS exported from the control node, but mounted to all nodes (so that +authorized keys applies to all nodes), create user `Erin` with passwordless sudo: + +```yaml +basic_users_users: + - comment: Erin Eagle + name: erin + uid: 2009 + shell: /bin/bash # override default nologin on control + groups: + - adm # enables ssh to compute nodes even without a job running + sudo: erin ALL=(ALL) NOPASSWD:ALL + public_key: ssh-ed25519 ... +``` diff --git a/ansible/roles/basic_users/defaults/main.yml b/ansible/roles/basic_users/defaults/main.yml index 7a79640e9..3de8dd6a4 100644 --- a/ansible/roles/basic_users/defaults/main.yml +++ b/ansible/roles/basic_users/defaults/main.yml @@ -1,8 +1,9 @@ -basic_users_manage_homedir: "{{ ansible_hostname == (ansible_play_hosts | difference(groups['control']) | first) }}" +basic_users_homedir_host: "{{ groups['control'] | first }}" # no way, generally, to find the nfs_server +basic_users_homedir_host_path: /exports/home +# _basic_users_manage_homedir: "{{ ansible_hostname == basic_users_homedir_host }}" basic_users_userdefaults: - state: present - create_home: "{{ basic_users_manage_homedir }}" - generate_ssh_key: "{{ basic_users_manage_homedir }}" + state: present # need this here so don't have to add default() everywhere + generate_ssh_key: true ssh_key_comment: "{{ item.name }}" ssh_key_type: ed25519 shell: "{{'/sbin/nologin' if 'control' in group_names else omit }}" diff --git a/ansible/roles/basic_users/tasks/main.yml b/ansible/roles/basic_users/tasks/main.yml index 0e6d4e0b0..6d3d9825a 100644 --- a/ansible/roles/basic_users/tasks/main.yml +++ b/ansible/roles/basic_users/tasks/main.yml @@ -21,12 +21,27 @@ ansible.builtin.group: "{{ item }}" loop: "{{ basic_users_groups }}" -- name: Create users and generate public keys - user: "{{ basic_users_userdefaults | combine(item) | filter_user_params() }}" +- name: Create users + user: "{{ basic_users_userdefaults | combine(item) | filter_user_params() | combine(_disable_homedir) }}" loop: "{{ basic_users_users }}" loop_control: - label: "{{ item.name }} [{{ item.state | default('present') }}]" - register: basic_users_info + label: "{{ item.name }}" + vars: + _disable_homedir: # ensure this task doesn't touch $HOME + create_home: false + generate_ssh_key: false + +- name: Write sudo rules + blockinfile: + path: /etc/sudoers.d/80-{{ item.name }}-user + block: "{{ item.sudo }}" + create: true + loop: "{{ basic_users_users }}" + loop_control: + label: "{{ item.name }}" + when: + - item.state | default('present') == 'present' + - "'sudo' in item" - name: Restart sssd if required systemd: @@ -34,40 +49,88 @@ state: started when: _stop_sssd is changed -- name: Write supplied public key as authorized for SSH access - authorized_key: - user: "{{ item.name }}" - state: present - key: "{{ item.public_key }}" +# This task runs (only) on the home directory server, if in the group, so it can +# handle root squashed exports +- name: Create home directories + # doesn't delete with state=absent, same as ansible.builtin.user + ansible.builtin.copy: + remote_src: true + src: "{{ item.skeleton | default('/etc/skel/') }}" + dest: "{{ item.home | default( basic_users_homedir_host_path + '/' + item.name ) }}" + owner: "{{ item.name }}" + group: "{{ item.name }}" + mode: u=rwX,go= + delegate_to: "{{ basic_users_homedir_host }}" + run_once: true loop: "{{ basic_users_users }}" loop_control: - label: "{{ item.name }} [{{ item.state | default('present') }}]" + label: "{{ item.name }}" when: - item.state | default('present') == 'present' - - item.public_key is defined - - basic_users_manage_homedir + - item.create_home | default(true) | bool -- name: Write generated public key as authorized for SSH access - # this only runs on the basic_users_manage_homedir so has registered var - # from that host too - authorized_key: - user: "{{ item.name }}" - state: present - manage_dir: no - key: "{{ item.ssh_public_key }}" - loop: "{{ basic_users_info.results }}" - loop_control: - label: "{{ item.name }}" - when: - - item.ssh_public_key is defined - - basic_users_manage_homedir +# The following tasks deliberately run on a (single) *client* node, so that +# home directory paths are easily constructed, becoming each user so that root +# squash doesn't matter +- delegate_to: "{{ groups['basic_users'] | difference([basic_users_homedir_host]) | first }}" + run_once: true + block: + - name: Create ~/.ssh directories + file: + state: directory + path: ~/.ssh/ + owner: "{{ item.name }}" + group: "{{ item.name }}" + mode: u=rwX,go= + become_user: "{{ item.name }}" + loop: "{{ basic_users_users }}" + loop_control: + label: "{{ item.name }}" + when: + - item.state | default('present') == 'present' -- name: Write sudo rules - blockinfile: - path: /etc/sudoers.d/80-{{ item.name}}-user - block: "{{ item.sudo }}" - create: true - loop: "{{ basic_users_users }}" - loop_control: - label: "{{ item.name }}" - when: "'sudo' in item" + - name: Generate cluster ssh key + community.crypto.openssh_keypair: + path: "{{ item.ssh_key_file | default('~/.ssh/id_' + _ssh_key_type )}}" # NB: ssh_key_file is from ansible.builtin.user + type: "{{ _ssh_key_type }}" + comment: "{{ item.ssh_key_comment | default(item.name) }}" + vars: + _ssh_key_type: "{{ item.ssh_key_type | default('ed25519') }}" + become_user: "{{ item.name }}" + loop: "{{ basic_users_users }}" + loop_control: + label: "{{ item.name }}" + when: + - item.state | default('present') == 'present' + - item.generate_ssh_key | default(true) | bool + register: _cluster_ssh_keypair + + - name: Write generated cluster ssh key to authorized_keys + ansible.posix.authorized_key: + user: "{{ item.item.name }}" + state: present + manage_dir: false + key: "{{ item.public_key }}" + path: ~/.ssh/authorized_keys + become_user: "{{ item.item.name }}" + loop: "{{ _cluster_ssh_keypair.results }}" + loop_control: + label: "{{ item.item.name }}" + when: + - item.item.state | default('present') == 'present' + - "'public_key' in item" + + - name: Write supplied public key to authorized_keys + ansible.posix.authorized_key: + user: "{{ item.name }}" + state: present + manage_dir: false + key: "{{ item.public_key }}" + path: ~/.ssh/authorized_keys + become_user: "{{ item.name }}" + loop: "{{ basic_users_users }}" + loop_control: + label: "{{ item.name }}" + when: + - item.state | default('present') == 'present' + - item.public_key is defined diff --git a/ansible/roles/hpctests/README.md b/ansible/roles/hpctests/README.md index ab32c9de8..2cb9b7663 100644 --- a/ansible/roles/hpctests/README.md +++ b/ansible/roles/hpctests/README.md @@ -22,8 +22,10 @@ Requirements Role Variables -------------- - -- `hpctests_rootdir`: Required. Path to root of test directory tree, which must be on a r/w filesystem shared to all cluster nodes under test. The last directory component will be created. +- `hpctests_user`: Optional. User to run jobs as. Default is `ansible_user`. +- `hpctests_rootdir`: Optional. Path to root of test directory tree. This must + be a r/w filesystem shared to all cluster nodes under test. Default is + `/home/{{ hpctests_user }}/hpctests`. **NB:** Do not use `~` in this path. - `hpctests_partition`: Optional. Name of partition to use, otherwise default partition is used. - `hpctests_nodes`: Optional. A Slurm node expression, e.g. `'compute-[0-15,19]'` defining the nodes to use. If not set all nodes in the selected partition are used. - `hpctests_ucx_net_devices`: Optional. Control which network device/interface to use, e.g. `mlx5_1:0`. The default of `all` (as per UCX) may not be appropriate for multi-rail nodes with different bandwidths on each device. See [here](https://openucx.readthedocs.io/en/master/faq.html#what-is-the-default-behavior-in-a-multi-rail-environment) and [here](https://github.com/openucx/ucx/wiki/UCX-environment-parameters#setting-the-devices-to-use). Alternatively a mapping of partition name (as `hpctests_partition`) to device/interface can be used. For partitions not defined in the mapping the default of `all` is used. diff --git a/ansible/roles/hpctests/defaults/main.yml b/ansible/roles/hpctests/defaults/main.yml index eb1864229..3ea0a0218 100644 --- a/ansible/roles/hpctests/defaults/main.yml +++ b/ansible/roles/hpctests/defaults/main.yml @@ -1,5 +1,6 @@ --- -hpctests_rootdir: +hpctests_user: "{{ ansible_user }}" +hpctests_rootdir: "/home/{{ hpctests_user }}/hpctests" hpctests_pre_cmd: '' hpctests_pingmatrix_modules: [gnu12 openmpi4] hpctests_pingpong_modules: [gnu12 openmpi4 imb] diff --git a/ansible/roles/hpctests/library/plot_nxnlatbw.py b/ansible/roles/hpctests/library/plot_nxnlatbw.py index 05df2ef83..ade7d3ddf 100644 --- a/ansible/roles/hpctests/library/plot_nxnlatbw.py +++ b/ansible/roles/hpctests/library/plot_nxnlatbw.py @@ -5,7 +5,7 @@ # Apache 2 License from ansible.module_utils.basic import AnsibleModule -import json +import json, os ANSIBLE_METADATA = { "metadata_version": "0.1", @@ -109,8 +109,8 @@ def run_module(): module = AnsibleModule(argument_spec=module_args, supports_check_mode=True) result = {"changed": False} - src = module.params["src"] - dest = module.params["dest"] + src = os.path.expanduser(module.params["src"]) + dest = os.path.expanduser(module.params["dest"]) nodes = module.params["nodes"] if nodes is not None: nodes = nodes.split(',') diff --git a/ansible/roles/hpctests/tasks/build-hpl.yml b/ansible/roles/hpctests/tasks/build-hpl.yml index 3f5351650..4fec6b75e 100644 --- a/ansible/roles/hpctests/tasks/build-hpl.yml +++ b/ansible/roles/hpctests/tasks/build-hpl.yml @@ -52,7 +52,6 @@ - name: Build HPL executable shell: - cmd: "sbatch --wait hpl-build-{{ hpctests_hpl_arch }}.sh" + cmd: "bash -l -c 'sbatch --wait hpl-build-{{ hpctests_hpl_arch }}.sh'" # need login shell for module command chdir: "{{ hpctests_hpl_srcdir }}" creates: "bin/{{ hpctests_hpl_arch }}/xhpl" - become: no diff --git a/ansible/roles/hpctests/tasks/hpl-solo.yml b/ansible/roles/hpctests/tasks/hpl-solo.yml index adf8e1823..4c495315b 100644 --- a/ansible/roles/hpctests/tasks/hpl-solo.yml +++ b/ansible/roles/hpctests/tasks/hpl-solo.yml @@ -80,8 +80,7 @@ cmd: "rm -f {{ hpctests_rootdir }}/hpl-solo/hpl-solo.sh.*.out" - name: Run hpl-solo - shell: sbatch --wait hpl-solo.sh - become: no + shell: bash -l -c 'sbatch --wait hpl-solo.sh' # need login shell for module command args: chdir: "{{ hpctests_rootdir }}/hpl-solo" async: "{{ 20 * 60 }}" # wait for up to 20 minutes diff --git a/ansible/roles/hpctests/tasks/main.yml b/ansible/roles/hpctests/tasks/main.yml index aa7a82008..f0f0817a6 100644 --- a/ansible/roles/hpctests/tasks/main.yml +++ b/ansible/roles/hpctests/tasks/main.yml @@ -1,28 +1,38 @@ - name: setup block: - - include: setup.yml + - include_tasks: setup.yml + become: true + become_user: "{{ hpctests_user }}" tags: always - name: pingpong block: - - include: pingpong.yml + - include_tasks: pingpong.yml when: hpctests_computes.stdout_lines | length > 1 + become: true + become_user: "{{ hpctests_user }}" tags: pingpong - name: pingmatrix block: - - include: pingmatrix.yml + - include_tasks: pingmatrix.yml when: hpctests_computes.stdout_lines | length > 1 + become: true + become_user: "{{ hpctests_user }}" tags: pingmatrix - name: build HPL block: - - include: build-hpl.yml + - include_tasks: build-hpl.yml + become: true + become_user: "{{ hpctests_user }}" tags: - hpl-solo - name: run HPL on individual nodes block: - - include: hpl-solo.yml + - include_tasks: hpl-solo.yml + become: true + become_user: "{{ hpctests_user }}" tags: - hpl-solo diff --git a/ansible/roles/hpctests/tasks/pingmatrix.yml b/ansible/roles/hpctests/tasks/pingmatrix.yml index 4d32ffcd7..3d20b784b 100644 --- a/ansible/roles/hpctests/tasks/pingmatrix.yml +++ b/ansible/roles/hpctests/tasks/pingmatrix.yml @@ -5,14 +5,6 @@ path: "{{ hpctests_rootdir }}/pingmatrix" state: directory -- name: Precreate files to workaround selinux context issues on NFS mounts - file: - path: "{{ hpctests_rootdir }}/pingmatrix/{{ item }}" - state: touch - loop: - - mpi_nxnlatbw.c - - pingmatrix.sh - - name: Copy source copy: src: mpi_nxnlatbw.c @@ -24,7 +16,7 @@ dest: "{{ hpctests_rootdir }}/pingmatrix/pingmatrix.sh" - name: Run ping matrix - shell: sbatch --wait pingmatrix.sh + shell: bash -l -c 'sbatch --wait pingmatrix.sh' # need login shell for module command args: chdir: "{{ hpctests_rootdir }}/pingmatrix" register: hpctests_pingmatrix_sbatch diff --git a/ansible/roles/hpctests/tasks/pingpong.yml b/ansible/roles/hpctests/tasks/pingpong.yml index 0d1d58d25..3cde8c22b 100644 --- a/ansible/roles/hpctests/tasks/pingpong.yml +++ b/ansible/roles/hpctests/tasks/pingpong.yml @@ -5,13 +5,6 @@ path: "{{ hpctests_rootdir }}/pingpong" state: directory -- name: Precreate files to workaround selinux context issues on NFS mounts - file: - path: "{{ hpctests_rootdir }}/pingpong/{{ item }}" - state: touch - loop: - - pingpong.sh - - name: Create sbatch script template: src: pingpong.sh.j2 @@ -20,8 +13,7 @@ - name: Run pingpong block: - name: Submit jobscript - shell: sbatch --wait pingpong.sh - become: no + shell: bash -l -c 'sbatch --wait pingpong.sh' # need login shell for module command args: chdir: "{{ hpctests_rootdir }}/pingpong" register: hpctests_pingpong_sbatch @@ -54,11 +46,13 @@ path: "{{ _pingpong_local_output }}" register: hpctests_pingpong_out delegate_to: localhost + become: false - name: Read nodes used shell: "grep 'SLURM_JOB_NODELIST:' {{ _pingpong_local_output }}" register: hpctests_pingpong_run_nodes delegate_to: localhost + become: false - name: Plot image shell: @@ -66,6 +60,7 @@ creates: "{{ _pingpong_local_output | dirname }}/latency.png" register: _pingpong_plot delegate_to: localhost + become: false when: hpctests_pingpong_plot | bool - debug: diff --git a/ansible/roles/hpctests/tasks/setup.yml b/ansible/roles/hpctests/tasks/setup.yml index fc1103271..248ee29fd 100644 --- a/ansible/roles/hpctests/tasks/setup.yml +++ b/ansible/roles/hpctests/tasks/setup.yml @@ -25,9 +25,8 @@ file: path: "{{ hpctests_rootdir }}" state: directory - owner: "{{ ansible_user }}" - group: "{{ ansible_user }}" - become: true + owner: "{{ hpctests_user }}" + group: "{{ hpctests_user }}" - name: Set fact for UCX_NET_DEVICES set_fact: diff --git a/environments/.stackhpc/inventory/group_vars/all/hpctests.yml b/environments/.stackhpc/inventory/group_vars/all/hpctests.yml new file mode 100644 index 000000000..e8cfcea5f --- /dev/null +++ b/environments/.stackhpc/inventory/group_vars/all/hpctests.yml @@ -0,0 +1 @@ +hpctests_user: demo_user diff --git a/environments/common/inventory/group_vars/all/basic_users.yml b/environments/common/inventory/group_vars/all/basic_users.yml index 0cecf4b78..a7b9359b7 100644 --- a/environments/common/inventory/group_vars/all/basic_users.yml +++ b/environments/common/inventory/group_vars/all/basic_users.yml @@ -1,7 +1,5 @@ --- -# See: ansible/roles/basic_users/README.md -# for variable definitions. +# See ansible/roles/basic_users/README.md for variable definitions. -basic_users_homedir: /home basic_users_users: [] diff --git a/environments/common/inventory/group_vars/all/hpctests.yml b/environments/common/inventory/group_vars/all/hpctests.yml index 746f419d1..65c62ee09 100644 --- a/environments/common/inventory/group_vars/all/hpctests.yml +++ b/environments/common/inventory/group_vars/all/hpctests.yml @@ -2,4 +2,4 @@ # See: ansible/roles/hpctests/README.md # for variable definitions. -hpctests_rootdir: "/home/hpctests" # Can't use centos's $HOME as that's not on /home and may not have another user +# hpctests_user: diff --git a/environments/common/inventory/group_vars/all/nfs.yml b/environments/common/inventory/group_vars/all/nfs.yml index f277170b6..511dd0da3 100644 --- a/environments/common/inventory/group_vars/all/nfs.yml +++ b/environments/common/inventory/group_vars/all/nfs.yml @@ -15,7 +15,7 @@ nfs_configurations: nfs_server: "{{ nfs_server_default }}" nfs_export: "/exports/home" # assumes skeleton TF is being used nfs_client_mnt_point: "/home" - -# Set 'secure' to prevent tunneling nfs mounts -# Cannot set 'root_squash' due to home directory creation -nfs_export_options: 'rw,secure,no_root_squash' + # prevent tunnelling and setuid binaries: + # NB: this is stackhpc.nfs role defaults but are set here to prevent being + # accidently overriden via default options + nfs_export_options: 'rw,secure,root_squash' diff --git a/environments/skeleton/{{cookiecutter.environment}}/inventory/group_vars/all/hpctests.yml b/environments/skeleton/{{cookiecutter.environment}}/inventory/group_vars/all/hpctests.yml new file mode 100644 index 000000000..e8cfcea5f --- /dev/null +++ b/environments/skeleton/{{cookiecutter.environment}}/inventory/group_vars/all/hpctests.yml @@ -0,0 +1 @@ +hpctests_user: demo_user