-
Notifications
You must be signed in to change notification settings - Fork 35
Support lustre client #447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 24 commits
Commits
Show all changes
35 commits
Select commit
Hold shift + click to select a range
e1bdb58
WIP: add lustre role
sjpb 410e0ed
allow definition of multiple lustre_mounts
sjpb 9f47321
fix lustre build for 2.15.5 release candidate
sjpb 63840ee
simplify lustre defaults
sjpb 02b9ae1
allow lustre install during build to get kernel version
sjpb 8df4a61
allow extending fat images with site-specific groups
sjpb 3840baa
fix packer build so only roles for defined groups run
sjpb ed8f79b
enable control of 'extra' build image name
sjpb 9f04b48
bump to release lustre
sjpb 28a8297
add lnet configuration
sjpb 6d5da54
simplify lustre mount logic
sjpb 22be72c
provide lnet config
sjpb e59b2be
autodetermine lustre interface
sjpb 32e3bda
WIP: validation needs fixing for lustre_mounts removal
sjpb 37d727a
add working lnet.conf template
sjpb 5506876
refactor lustre role for multiple mounts, selectable lnet label
sjpb 2becb3a
remove unneeded comments from lustre taskfiles
sjpb 2819fc3
fix lustre net type
sjpb 75b20fa
fixup opensearch install permissions
sjpb 023c030
add docs for extra builds
sjpb 325889b
Merge branch 'main' into upstream-lustre
sjpb 98d6cab
fix packer volume size definition
sjpb 6589cb4
Merge branch 'upstream-lustre' of github.com:stackhpc/ansible-slurm-a…
sjpb 6df790b
fix missing image name for cuda build
sjpb 0cb2113
Merge branch 'main' into upstream-lustre
sjpb a62d148
bump CI image
sjpb 300fbfa
Merge branch 'main' into upstream-lustre
sjpb ed695f0
update packer README for modified image vars
sjpb 965e24a
move packer docs into docs/
sjpb 3934ecb
make packer extra build directly configurable
sjpb f54d37d
tidy packer docs
sjpb e24997e
fix build error 'Error: Unset variable extra_build_volume_size'
sjpb 177083b
fix error with null default during volume size lookup
sjpb 676d7e8
note lnet protocol limitation
sjpb d8e161b
bump CI image to test
sjpb File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -58,4 +58,5 @@ roles/* | |
!roles/squid/** | ||
!roles/tuned/ | ||
!roles/tuned/** | ||
|
||
!roles/lustre/ | ||
!roles/lustre/** |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# lustre | ||
|
||
Install and configure a Lustre client. This builds RPM packages from source. | ||
|
||
**NB:** The `install.yml` playbook in this role should only be run during image build and is not idempotent. This will install the `kernel-devel` package; if not already installed (e.g. from an `ofed` installation), this may require enabling update of DNF packages during build using `update_enable=true`, which will upgrade the kernel as well. | ||
|
||
**NB:** Currently this only supports RockyLinux 9. | ||
|
||
## Role Variables | ||
|
||
- `lustre_version`: Optional str. Version of lustre to build, default `2.15.5` which is the first version with EL9 support | ||
- `lustre_lnet_label`: Optional str. The "lnet label" part of the host's NID, e.g. `tcp0` or `o2ib1`. Default `tcp`. | ||
- `lustre_mgs_nid`: Required str. The NID(s) for the MGS, e.g. `192.168.227.11@tcp1` (separate mutiple MGS NIDs using `:`). | ||
- `lustre_mounts`: Required list. Define Lustre filesystems and mountpoints as a list of dicts with keys: | ||
- `fs_name`: Required str. The name of the filesystem to mount | ||
- `mount_point`: Required str. Path to mount filesystem at. | ||
- `mount_state`: Optional mount state, as for [ansible.posix.mount](https://docs.ansible.com/ansible/latest/collections/ansible/posix/mount_module.html#parameter-state). Default is `lustre_mount_state`. | ||
- `mount_options`: Optional mount options. Default is `lustre_mount_options`. | ||
- `lustre_mount_state`. Optional default mount state for all mounts, as for [ansible.posix.mount](https://docs.ansible.com/ansible/latest/collections/ansible/posix/mount_module.html#parameter-state). Default is `mounted`. | ||
- `lustre_mount_options`. Optional default mount options. Default values are systemd defaults from [Lustre client docs](http://wiki.lustre.org/Mounting_a_Lustre_File_System_on_Client_Nodes). | ||
|
||
The following variables control the package build and and install and should not generally be required: | ||
- `lustre_build_packages`: Optional list. Prerequisite packages required to build Lustre. See `defaults/main.yml`. | ||
- `lustre_build_dir`: Optional str. Path to build lustre at, default `/tmp/lustre-release`. | ||
- `lustre_configure_opts`: Optional list. Options to `./configure` command. Default builds client rpms supporting Mellanox OFED, without support for GSS keys. | ||
- `lustre_rpm_globs`: Optional list. Shell glob patterns for rpms to install. Note order is important as the built RPMs are not in a yum repo. Default is just the `kmod-lustre-client` and `lustre-client` packages. | ||
- `lustre_build_cleanup`: Optional bool. Whether to uninstall prerequisite packages and delete the build directories etc. Default `true`. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
lustre_version: '2.15.5' # https://www.lustre.org/lustre-2-15-5-released/ | ||
lustre_lnet_label: tcp | ||
#lustre_mgs_nid: | ||
lustre_mounts: [] | ||
lustre_mount_state: mounted | ||
lustre_mount_options: 'defaults,_netdev,noauto,x-systemd.automount,x-systemd.requires=lnet.service' | ||
|
||
# below variables are for build and should not generally require changes | ||
lustre_build_packages: | ||
- "kernel-devel-{{ ansible_kernel }}" | ||
- git | ||
- gcc | ||
- libtool | ||
- python3 | ||
- python3-devel | ||
- openmpi | ||
- elfutils-libelf-devel | ||
- libmount-devel | ||
- libnl3-devel | ||
- libyaml-devel | ||
- rpm-build | ||
- kernel-abi-stablelists | ||
- libaio | ||
- libaio-devel | ||
lustre_build_dir: /tmp/lustre-release | ||
lustre_configure_opts: | ||
- --disable-server | ||
- --with-linux=/usr/src/kernels/* | ||
- --with-o2ib=/usr/src/ofa_kernel/default | ||
- --disable-maintainer-mode | ||
- --disable-gss-keyring | ||
- --enable-mpitests=no | ||
lustre_rpm_globs: # NB: order is important here, as not installing from a repo | ||
- "kmod-lustre-client-{{ lustre_version | split('.') | first }}*" # only take part of the version as -RC versions produce _RC rpms | ||
- "lustre-client-{{ lustre_version | split('.') | first }}*" | ||
lustre_build_cleanup: true |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
- name: Gather Lustre interface info | ||
shell: | ||
cmd: | | ||
ip r get {{ _lustre_mgs_ip }} | ||
changed_when: false | ||
register: _lustre_ip_r_mgs | ||
vars: | ||
_lustre_mgs_ip: "{{ lustre_mgs_nid | split('@') | first }}" | ||
|
||
- name: Set facts for Lustre interface | ||
set_fact: | ||
_lustre_interface: "{{ _lustre_ip_r_mgs_info[4] }}" | ||
_lustre_ip: "{{ _lustre_ip_r_mgs_info[6] }}" | ||
vars: | ||
_lustre_ip_r_mgs_info: "{{ _lustre_ip_r_mgs.stdout_lines.0 | split }}" | ||
# first line e.g. "10.167.128.1 via 10.179.0.2 dev eth0 src 10.179.3.149 uid 1000" | ||
|
||
- name: Write LNet configuration file | ||
template: | ||
src: lnet.conf.j2 | ||
dest: /etc/lnet.conf # exists from package install, expected by lnet service | ||
owner: root | ||
group: root | ||
mode: u=rw,go=r # from package install | ||
register: _lnet_conf | ||
|
||
- name: Ensure lnet service state | ||
systemd: | ||
name: lnet | ||
state: "{{ 'restarted' if _lnet_conf.changed else 'started' }}" | ||
|
||
- name: Ensure mount points exist | ||
ansible.builtin.file: | ||
path: "{{ item.mount_point }}" | ||
state: directory | ||
loop: "{{ lustre_mounts }}" | ||
when: "(item.mount_state | default(lustre_mount_state)) != 'absent'" | ||
|
||
- name: Mount lustre filesystem | ||
ansible.posix.mount: | ||
fstype: lustre | ||
src: "{{ lustre_mgs_nid }}:/{{ item.fs_name }}" | ||
path: "{{ item.mount_point }}" | ||
state: "{{ (item.mount_state | default(lustre_mount_state)) }}" | ||
opts: "{{ item.mount_options | default(lustre_mount_options) }}" | ||
loop: "{{ lustre_mounts }}" | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
- name: Install lustre build prerequisites | ||
ansible.builtin.dnf: | ||
name: "{{ lustre_build_packages }}" | ||
register: _lustre_dnf_build_packages | ||
|
||
- name: Clone lustre git repo | ||
# https://git.whamcloud.com/?p=fs/lustre-release.git;a=summary | ||
ansible.builtin.git: | ||
repo: git://git.whamcloud.com/fs/lustre-release.git | ||
dest: "{{ lustre_build_dir }}" | ||
version: "{{ lustre_version }}" | ||
|
||
- name: Prepare for lustre configuration | ||
ansible.builtin.command: | ||
cmd: sh ./autogen.sh | ||
chdir: "{{ lustre_build_dir }}" | ||
|
||
- name: Configure lustre build | ||
ansible.builtin.command: | ||
cmd: "./configure {{ lustre_configure_opts | join(' ') }}" | ||
chdir: "{{ lustre_build_dir }}" | ||
|
||
- name: Build lustre | ||
ansible.builtin.command: | ||
cmd: make rpms | ||
chdir: "{{ lustre_build_dir }}" | ||
|
||
- name: Find rpms | ||
ansible.builtin.find: | ||
paths: "{{ lustre_build_dir }}" | ||
patterns: "{{ lustre_rpm_globs }}" | ||
use_regex: false | ||
register: _lustre_find_rpms | ||
|
||
- name: Check rpms found | ||
assert: | ||
that: _lustre_find_rpms.files | length | ||
fail_msg: "No lustre repos found with lustre_rpm_globs = {{ lustre_rpm_globs }}" | ||
|
||
- name: Install lustre rpms | ||
ansible.builtin.dnf: | ||
name: "{{ _lustre_find_rpms.files | map(attribute='path')}}" | ||
disable_gpg_check: yes | ||
|
||
- block: | ||
- name: Remove lustre build prerequisites | ||
# NB Only remove ones this role installed which weren't upgrades | ||
ansible.builtin.dnf: | ||
name: "{{ _new_pkgs }}" | ||
state: absent | ||
vars: | ||
_installed_pkgs: | | ||
{{ | ||
_lustre_dnf_build_packages.results | | ||
select('match', 'Installed:') | | ||
map('regex_replace', '^Installed: (.+?)-[0-9].*$', '\1') | ||
}} | ||
_removed_pkgs: | | ||
{{ | ||
_lustre_dnf_build_packages.results | | ||
select('match', 'Removed:') | | ||
map('regex_replace', '^Removed: (.+?)-[0-9].*$', '\1') | ||
}} | ||
_new_pkgs: "{{ _installed_pkgs | difference(_removed_pkgs) }}" | ||
|
||
- name: Delete lustre build dir | ||
file: | ||
path: "{{ lustre_build_dir }}" | ||
state: absent | ||
when: lustre_build_cleanup | bool |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
- name: Assert using RockyLinux 9 | ||
assert: | ||
that: ansible_distribution_major_version | int == 9 | ||
fail_msg: The 'lustre' role requires RockyLinux 9 | ||
|
||
- name: Check kernel-devel package is installed | ||
command: "dnf list --installed kernel-devel-{{ ansible_kernel }}" | ||
changed_when: false | ||
# NB: we don't check here the kernel will remain the same after reboot etc, see ofed/install.yml | ||
|
||
- name: Ensure SELinux in permissive mode | ||
assert: | ||
that: selinux_state in ['permissive', 'disabled'] | ||
fail_msg: "SELinux must be permissive for Lustre not '{{ selinux_state }}'; see variable selinux_state" | ||
|
||
- name: Ensure lustre_mgs_nid is defined | ||
assert: | ||
that: lustre_mgs_nid is defined | ||
fail_msg: Variable lustre_mgs_nid must be defined | ||
|
||
- name: Ensure lustre_mounts entries define filesystem name and mount point | ||
assert: | ||
that: | ||
- item.fs_name is defined | ||
- item.mount_point is defined | ||
fail_msg: All lustre_mounts entries must specify fs_name and mount_point | ||
loop: "{{ lustre_mounts }}" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
net: | ||
- net type: {{ lustre_lnet_label }} | ||
local NI(s): | ||
- nid: {{ _lustre_ip }}@{{ lustre_lnet_label }} | ||
interfaces: | ||
0: {{ _lustre_interface }} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.