Skip to content

Commit 9a07ff4

Browse files
wtripp180901sjpb
andauthored
Stop Lustre deleting rdma packages + add to extrabuild test (#502)
* move cuda tasks to install * pin nvidia driver to working version and autodetect os/arch * make install of cuda packages optional * don't run cuda install tasks unless during build * move doca install before cuda * update cuda docs * add cuda to extra build test CI * add cuda runtime tasks * fix typo in extras playbook * bump extra build size to 30GB for cuda * pin both cuda package version * make cuda idempotent/restartable * allow using computed tasks_from for cuda role * fix showing image summary * removed faulty cleanup and added lustre to extrabuild test * bumped lustre to supported version --------- Co-authored-by: Steve Brasier <[email protected]>
1 parent fd6eb4f commit 9a07ff4

File tree

5 files changed

+11
-38
lines changed

5 files changed

+11
-38
lines changed

.github/workflows/extra.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,14 @@ on:
88
- 'environments/.stackhpc/terraform/cluster_image.auto.tfvars.json'
99
- 'ansible/roles/doca/**'
1010
- 'ansible/roles/cuda/**'
11+
- 'ansible/roles/lustre/**'
1112
- '.github/workflows/extra.yml'
1213
pull_request:
1314
paths:
1415
- 'environments/.stackhpc/terraform/cluster_image.auto.tfvars.json'
1516
- 'ansible/roles/doca/**'
1617
- 'ansible/roles/cuda/**'
18+
- 'ansible/roles/lustre/**'
1719
- '.github/workflows/extra.yml'
1820

1921
jobs:
@@ -29,11 +31,11 @@ jobs:
2931
build:
3032
- image_name: openhpc-extra-RL8
3133
source_image_name_key: RL8 # key into environments/.stackhpc/terraform/cluster_image.auto.tfvars.json
32-
inventory_groups: doca,cuda
34+
inventory_groups: doca,cuda,lustre
3335
volume_size: 30 # needed for cuda
3436
- image_name: openhpc-extra-RL9
3537
source_image_name_key: RL9
36-
inventory_groups: doca,cuda
38+
inventory_groups: doca,cuda,lustre
3739
volume_size: 30 # needed for cuda
3840
env:
3941
ANSIBLE_FORCE_COLOR: True

ansible/fatimage.yml

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -230,14 +230,6 @@
230230
name: cloudalchemy.grafana
231231
tasks_from: install.yml
232232

233-
- hosts: doca
234-
become: yes
235-
gather_facts: yes
236-
tasks:
237-
- name: Install NVIDIA DOCA
238-
import_role:
239-
name: doca
240-
241233
- name: Run post.yml hook
242234
vars:
243235
appliances_environment_root: "{{ lookup('env', 'APPLIANCES_ENVIRONMENT_ROOT') }}"

ansible/roles/lustre/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Install and configure a Lustre client. This builds RPM packages from source.
88

99
## Role Variables
1010

11-
- `lustre_version`: Optional str. Version of lustre to build, default `2.15.5` which is the first version with EL9 support
11+
- `lustre_version`: Optional str. Version of lustre to build, default `2.15.6` which is the first version with EL9.5 support
1212
- `lustre_lnet_label`: Optional str. The "lnet label" part of the host's NID, e.g. `tcp0`. Only the `tcp` protocol type is currently supported. Default `tcp`.
1313
- `lustre_mgs_nid`: Required str. The NID(s) for the MGS, e.g. `192.168.227.11@tcp1` (separate mutiple MGS NIDs using `:`).
1414
- `lustre_mounts`: Required list. Define Lustre filesystems and mountpoints as a list of dicts with keys:

ansible/roles/lustre/defaults/main.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
lustre_version: '2.15.5' # https://www.lustre.org/lustre-2-15-5-released/
1+
lustre_version: '2.15.6' # https://www.lustre.org/lustre-2-15-6-released/
22
lustre_lnet_label: tcp
33
#lustre_mgs_nid:
44
lustre_mounts: []

ansible/roles/lustre/tasks/install.yml

Lines changed: 5 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -41,30 +41,9 @@
4141
ansible.builtin.dnf:
4242
name: "{{ _lustre_find_rpms.files | map(attribute='path')}}"
4343
disable_gpg_check: yes
44-
45-
- block:
46-
- name: Remove lustre build prerequisites
47-
# NB Only remove ones this role installed which weren't upgrades
48-
ansible.builtin.dnf:
49-
name: "{{ _new_pkgs }}"
50-
state: absent
51-
vars:
52-
_installed_pkgs: |
53-
{{
54-
_lustre_dnf_build_packages.results |
55-
select('match', 'Installed:') |
56-
map('regex_replace', '^Installed: (.+?)-[0-9].*$', '\1')
57-
}}
58-
_removed_pkgs: |
59-
{{
60-
_lustre_dnf_build_packages.results |
61-
select('match', 'Removed:') |
62-
map('regex_replace', '^Removed: (.+?)-[0-9].*$', '\1')
63-
}}
64-
_new_pkgs: "{{ _installed_pkgs | difference(_removed_pkgs) }}"
65-
66-
- name: Delete lustre build dir
67-
file:
68-
path: "{{ lustre_build_dir }}"
69-
state: absent
44+
45+
- name: Delete lustre build dir
46+
file:
47+
path: "{{ lustre_build_dir }}"
48+
state: absent
7049
when: lustre_build_cleanup | bool

0 commit comments

Comments
 (0)