Skip to content

Commit 4f45701

Browse files
committed
merge conflicts
2 parents 21b7081 + 64a1e90 commit 4f45701

File tree

17 files changed

+406
-100
lines changed

17 files changed

+406
-100
lines changed

.github/workflows/fatimage.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,4 +117,4 @@ jobs:
117117
path: |
118118
./image-id.txt
119119
./image-name.txt
120-
overwrite: true
120+
overwrite: true

README.md

Lines changed: 23 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -55,21 +55,28 @@ You will also need to install [OpenTofu](https://opentofu.org/docs/intro/install
5555

5656
### Create a new environment
5757

58-
Use the `cookiecutter` template to create a new environment to hold your configuration. In the repository root run:
58+
Run the following from the repository root to activate the venv:
5959

6060
. venv/bin/activate
61+
62+
Use the `cookiecutter` template to create a new environment to hold your configuration:
63+
6164
cd environments
6265
cookiecutter skeleton
6366

6467
and follow the prompts to complete the environment name and description.
6568

6669
**NB:** In subsequent sections this new environment is refered to as `$ENV`.
6770

68-
Now generate secrets for this environment:
71+
Activate the new environment:
72+
73+
. environments/$ENV/activate
74+
75+
And generate secrets for it:
6976

7077
ansible-playbook ansible/adhoc/generate-passwords.yml
7178

72-
### Define infrastructure configuration
79+
### Define and deploy infrastructure
7380

7481
Create an OpenTofu variables file to define the required infrastructure, e.g.:
7582

@@ -91,20 +98,28 @@ Create an OpenTofu variables file to define the required infrastructure, e.g.:
9198
}
9299
}
93100

94-
Variables marked `*` refer to OpenStack resources which must already exist. The above is a minimal configuration - for all variables
95-
and descriptions see `environments/$ENV/terraform/terraform.tfvars`.
101+
Variables marked `*` refer to OpenStack resources which must already exist. The above is a minimal configuration - for all variables and descriptions see `environments/$ENV/terraform/terraform.tfvars`.
102+
103+
To deploy this infrastructure, ensure the venv and the environment are [activated](#create-a-new-environment) and run:
96104

97-
### Deploy appliance
105+
export OS_CLOUD=openstack
106+
cd environments/$ENV/terraform/
107+
tofu apply
108+
109+
and follow the prompts. Note the OS_CLOUD environment variable assumes that OpenStack credentials are defined using a [clouds.yaml](https://docs.openstack.org/python-openstackclient/latest/configuration/index.html#clouds-yaml) file in a default location with the default cloud name of `openstack`.
110+
111+
### Configure appliance
112+
113+
To configure the appliance, ensure the venv and the environment are [activated](#create-a-new-environment) and run:
98114

99115
ansible-playbook ansible/site.yml
100116

101-
You can now log in to the cluster using:
117+
Once it completes you can log in to the cluster using:
102118

103119
ssh rocky@$login_ip
104120

105121
where the IP of the login node is given in `environments/$ENV/inventory/hosts.yml`
106122

107-
108123
## Overview of directory structure
109124

110125
- `environments/`: See [docs/environments.md](docs/environments.md).

ansible/.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,3 +62,5 @@ roles/*
6262
!roles/k3s/**
6363
!roles/k9s/
6464
!roles/k9s/**
65+
!roles/lustre/
66+
!roles/lustre/**

ansible/fatimage.yml

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525

2626
- hosts: builder
2727
become: yes
28-
gather_facts: no
28+
gather_facts: yes
2929
tasks:
3030
# - import_playbook: iam.yml
3131
- name: Install FreeIPA client
@@ -44,6 +44,11 @@
4444
name: stackhpc.os-manila-mount
4545
tasks_from: install.yml
4646
when: "'manila' in group_names"
47+
- name: Install Lustre packages
48+
include_role:
49+
name: lustre
50+
tasks_from: install.yml
51+
when: "'lustre' in group_names"
4752

4853
- import_playbook: extras.yml
4954

@@ -57,6 +62,7 @@
5762
name: mysql
5863
tasks_from: install.yml
5964
when: "'mysql' in group_names"
65+
6066
- name: OpenHPC
6167
import_role:
6268
name: stackhpc.openhpc
@@ -83,18 +89,21 @@
8389
import_role:
8490
name: openondemand
8591
tasks_from: vnc_compute.yml
92+
8693
when: "'openondemand_desktop' in group_names"
94+
8795
- name: Open Ondemand jupyter node
8896
import_role:
8997
name: openondemand
9098
tasks_from: jupyter_compute.yml
91-
when: "'openondemand' in group_names"
99+
when: "'openondemand_jupyter' in group_names"
92100

93101
# - import_playbook: monitoring.yml:
94102
- import_role:
95103
name: opensearch
96104
tasks_from: install.yml
97105
when: "'opensearch' in group_names"
106+
98107
# slurm_stats - nothing to do
99108
- import_role:
100109
name: filebeat

ansible/filesystems.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,13 @@
2424
tasks:
2525
- include_role:
2626
name: stackhpc.os-manila-mount
27+
28+
- name: Setup Lustre clients
29+
hosts: lustre
30+
become: true
31+
tags: lustre
32+
tasks:
33+
- include_role:
34+
name: lustre
35+
# NB install is ONLY run in builder
36+
tasks_from: configure.yml

ansible/roles/lustre/README.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# lustre
2+
3+
Install and configure a Lustre client. This builds RPM packages from source.
4+
5+
**NB:** The `install.yml` playbook in this role should only be run during image build and is not idempotent. This will install the `kernel-devel` package; if not already installed (e.g. from an `ofed` installation), this may require enabling update of DNF packages during build using `update_enable=true`, which will upgrade the kernel as well.
6+
7+
**NB:** Currently this only supports RockyLinux 9.
8+
9+
## Role Variables
10+
11+
- `lustre_version`: Optional str. Version of lustre to build, default `2.15.5` which is the first version with EL9 support
12+
- `lustre_lnet_label`: Optional str. The "lnet label" part of the host's NID, e.g. `tcp0`. Only the `tcp` protocol type is currently supported. Default `tcp`.
13+
- `lustre_mgs_nid`: Required str. The NID(s) for the MGS, e.g. `192.168.227.11@tcp1` (separate mutiple MGS NIDs using `:`).
14+
- `lustre_mounts`: Required list. Define Lustre filesystems and mountpoints as a list of dicts with keys:
15+
- `fs_name`: Required str. The name of the filesystem to mount
16+
- `mount_point`: Required str. Path to mount filesystem at.
17+
- `mount_state`: Optional mount state, as for [ansible.posix.mount](https://docs.ansible.com/ansible/latest/collections/ansible/posix/mount_module.html#parameter-state). Default is `lustre_mount_state`.
18+
- `mount_options`: Optional mount options. Default is `lustre_mount_options`.
19+
- `lustre_mount_state`. Optional default mount state for all mounts, as for [ansible.posix.mount](https://docs.ansible.com/ansible/latest/collections/ansible/posix/mount_module.html#parameter-state). Default is `mounted`.
20+
- `lustre_mount_options`. Optional default mount options. Default values are systemd defaults from [Lustre client docs](http://wiki.lustre.org/Mounting_a_Lustre_File_System_on_Client_Nodes).
21+
22+
The following variables control the package build and and install and should not generally be required:
23+
- `lustre_build_packages`: Optional list. Prerequisite packages required to build Lustre. See `defaults/main.yml`.
24+
- `lustre_build_dir`: Optional str. Path to build lustre at, default `/tmp/lustre-release`.
25+
- `lustre_configure_opts`: Optional list. Options to `./configure` command. Default builds client rpms supporting Mellanox OFED, without support for GSS keys.
26+
- `lustre_rpm_globs`: Optional list. Shell glob patterns for rpms to install. Note order is important as the built RPMs are not in a yum repo. Default is just the `kmod-lustre-client` and `lustre-client` packages.
27+
- `lustre_build_cleanup`: Optional bool. Whether to uninstall prerequisite packages and delete the build directories etc. Default `true`.
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
lustre_version: '2.15.5' # https://www.lustre.org/lustre-2-15-5-released/
2+
lustre_lnet_label: tcp
3+
#lustre_mgs_nid:
4+
lustre_mounts: []
5+
lustre_mount_state: mounted
6+
lustre_mount_options: 'defaults,_netdev,noauto,x-systemd.automount,x-systemd.requires=lnet.service'
7+
8+
# below variables are for build and should not generally require changes
9+
lustre_build_packages:
10+
- "kernel-devel-{{ ansible_kernel }}"
11+
- git
12+
- gcc
13+
- libtool
14+
- python3
15+
- python3-devel
16+
- openmpi
17+
- elfutils-libelf-devel
18+
- libmount-devel
19+
- libnl3-devel
20+
- libyaml-devel
21+
- rpm-build
22+
- kernel-abi-stablelists
23+
- libaio
24+
- libaio-devel
25+
lustre_build_dir: /tmp/lustre-release
26+
lustre_configure_opts:
27+
- --disable-server
28+
- --with-linux=/usr/src/kernels/*
29+
- --with-o2ib=/usr/src/ofa_kernel/default
30+
- --disable-maintainer-mode
31+
- --disable-gss-keyring
32+
- --enable-mpitests=no
33+
lustre_rpm_globs: # NB: order is important here, as not installing from a repo
34+
- "kmod-lustre-client-{{ lustre_version | split('.') | first }}*" # only take part of the version as -RC versions produce _RC rpms
35+
- "lustre-client-{{ lustre_version | split('.') | first }}*"
36+
lustre_build_cleanup: true
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
- name: Gather Lustre interface info
2+
shell:
3+
cmd: |
4+
ip r get {{ _lustre_mgs_ip }}
5+
changed_when: false
6+
register: _lustre_ip_r_mgs
7+
vars:
8+
_lustre_mgs_ip: "{{ lustre_mgs_nid | split('@') | first }}"
9+
10+
- name: Set facts for Lustre interface
11+
set_fact:
12+
_lustre_interface: "{{ _lustre_ip_r_mgs_info[4] }}"
13+
_lustre_ip: "{{ _lustre_ip_r_mgs_info[6] }}"
14+
vars:
15+
_lustre_ip_r_mgs_info: "{{ _lustre_ip_r_mgs.stdout_lines.0 | split }}"
16+
# first line e.g. "10.167.128.1 via 10.179.0.2 dev eth0 src 10.179.3.149 uid 1000"
17+
18+
- name: Write LNet configuration file
19+
template:
20+
src: lnet.conf.j2
21+
dest: /etc/lnet.conf # exists from package install, expected by lnet service
22+
owner: root
23+
group: root
24+
mode: u=rw,go=r # from package install
25+
register: _lnet_conf
26+
27+
- name: Ensure lnet service state
28+
systemd:
29+
name: lnet
30+
state: "{{ 'restarted' if _lnet_conf.changed else 'started' }}"
31+
32+
- name: Ensure mount points exist
33+
ansible.builtin.file:
34+
path: "{{ item.mount_point }}"
35+
state: directory
36+
loop: "{{ lustre_mounts }}"
37+
when: "(item.mount_state | default(lustre_mount_state)) != 'absent'"
38+
39+
- name: Mount lustre filesystem
40+
ansible.posix.mount:
41+
fstype: lustre
42+
src: "{{ lustre_mgs_nid }}:/{{ item.fs_name }}"
43+
path: "{{ item.mount_point }}"
44+
state: "{{ (item.mount_state | default(lustre_mount_state)) }}"
45+
opts: "{{ item.mount_options | default(lustre_mount_options) }}"
46+
loop: "{{ lustre_mounts }}"
47+
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
- name: Install lustre build prerequisites
2+
ansible.builtin.dnf:
3+
name: "{{ lustre_build_packages }}"
4+
register: _lustre_dnf_build_packages
5+
6+
- name: Clone lustre git repo
7+
# https://git.whamcloud.com/?p=fs/lustre-release.git;a=summary
8+
ansible.builtin.git:
9+
repo: git://git.whamcloud.com/fs/lustre-release.git
10+
dest: "{{ lustre_build_dir }}"
11+
version: "{{ lustre_version }}"
12+
13+
- name: Prepare for lustre configuration
14+
ansible.builtin.command:
15+
cmd: sh ./autogen.sh
16+
chdir: "{{ lustre_build_dir }}"
17+
18+
- name: Configure lustre build
19+
ansible.builtin.command:
20+
cmd: "./configure {{ lustre_configure_opts | join(' ') }}"
21+
chdir: "{{ lustre_build_dir }}"
22+
23+
- name: Build lustre
24+
ansible.builtin.command:
25+
cmd: make rpms
26+
chdir: "{{ lustre_build_dir }}"
27+
28+
- name: Find rpms
29+
ansible.builtin.find:
30+
paths: "{{ lustre_build_dir }}"
31+
patterns: "{{ lustre_rpm_globs }}"
32+
use_regex: false
33+
register: _lustre_find_rpms
34+
35+
- name: Check rpms found
36+
assert:
37+
that: _lustre_find_rpms.files | length
38+
fail_msg: "No lustre repos found with lustre_rpm_globs = {{ lustre_rpm_globs }}"
39+
40+
- name: Install lustre rpms
41+
ansible.builtin.dnf:
42+
name: "{{ _lustre_find_rpms.files | map(attribute='path')}}"
43+
disable_gpg_check: yes
44+
45+
- block:
46+
- name: Remove lustre build prerequisites
47+
# NB Only remove ones this role installed which weren't upgrades
48+
ansible.builtin.dnf:
49+
name: "{{ _new_pkgs }}"
50+
state: absent
51+
vars:
52+
_installed_pkgs: |
53+
{{
54+
_lustre_dnf_build_packages.results |
55+
select('match', 'Installed:') |
56+
map('regex_replace', '^Installed: (.+?)-[0-9].*$', '\1')
57+
}}
58+
_removed_pkgs: |
59+
{{
60+
_lustre_dnf_build_packages.results |
61+
select('match', 'Removed:') |
62+
map('regex_replace', '^Removed: (.+?)-[0-9].*$', '\1')
63+
}}
64+
_new_pkgs: "{{ _installed_pkgs | difference(_removed_pkgs) }}"
65+
66+
- name: Delete lustre build dir
67+
file:
68+
path: "{{ lustre_build_dir }}"
69+
state: absent
70+
when: lustre_build_cleanup | bool
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
- name: Assert using RockyLinux 9
2+
assert:
3+
that: ansible_distribution_major_version | int == 9
4+
fail_msg: The 'lustre' role requires RockyLinux 9
5+
6+
- name: Check kernel-devel package is installed
7+
command: "dnf list --installed kernel-devel-{{ ansible_kernel }}"
8+
changed_when: false
9+
# NB: we don't check here the kernel will remain the same after reboot etc, see ofed/install.yml
10+
11+
- name: Ensure SELinux in permissive mode
12+
assert:
13+
that: selinux_state in ['permissive', 'disabled']
14+
fail_msg: "SELinux must be permissive for Lustre not '{{ selinux_state }}'; see variable selinux_state"
15+
16+
- name: Ensure lustre_mgs_nid is defined
17+
assert:
18+
that: lustre_mgs_nid is defined
19+
fail_msg: Variable lustre_mgs_nid must be defined
20+
21+
- name: Ensure lustre_mounts entries define filesystem name and mount point
22+
assert:
23+
that:
24+
- item.fs_name is defined
25+
- item.mount_point is defined
26+
fail_msg: All lustre_mounts entries must specify fs_name and mount_point
27+
loop: "{{ lustre_mounts }}"

0 commit comments

Comments
 (0)