v2.0
Key Changes
⚠️ BACKWARDS-INCOMPATIBLE CHANGE⚠️ To allow adding support for Multi-Instance GPUs in #656, the variableopenhpc_slurm_partitions
has been replaced byopenhpc_nodegroups
andopenhpc_partitions
. See #666 for full details and examples.- The
tuned
profilehpc-compute
is now usable for nodes with hugepages enabled. See #672. - Support for LBNL's Node Health Checks added in #654. See ansible/roles/nhc/README.md
- Support for configuring Multi-Instance GPUs (MIG) added in #656 - see docs/mig.md for full details.
- Changes to the
stackhpc.openhpc
role for the above also mean that parameters can be removed from slurm.conf. See stackhpc/ansible-role-openhpc#184. - NVIDIA (open) drivers were upgraded to version 575.57.08 with cuda 12.9.1 in #703.
- Lustre bug LU-18085 affecting Rocky Linux 9 was fixed in #688.
- The appliance will now raise an error if Ansible Galaxy installs do not match
requirements.yml
, in #700 - OS packages were updated for both Rocky Linux 8 and 9 in #707, with the latter now having the last updates for Rocky Linux 9.5.
What's Changed
All PRs, oldest first:
- Fix typo in comment by @priteau in #675
- Update appliance for stackhpc.openhpc nodegroup/partition changes by @sjpb in #666
- Bump CUDA to 12.9 and NVIDIA driver to 575 by @priteau in #687
- Fix environment creation from skeleton by @priteau in #682
- Make home volume creation optional by @sjpb in #673
- Add a simple index in the docs README. by @MoteHue in #669
- Make packer var image_disk_format functional by @sjpb in #694
- Remove description of un-implemented dummy interface/default route by @sjpb in #689
- Allow specifying instance root volume type by @sjpb in #693
- Add fix for Lustre bug LU-18085 by @sjpb in #688
- Fix docs for cacerts role to mention cacerts_cert_dir by @technowhizz in #696
- Update operations.md typo by @technowhizz in #697
- Change host definition for cacert play to allow builder by @technowhizz in #698
- Add validation for OpenTofu compute and login variables by @sjpb in #674
- Remove incorrect note re ondemand for demo deployment by @sjpb in #695
- Fix nvidia build at open driver version 575.57.08 with cuda 12.9.1 by @jovial in #703
- Fix tuned hpc-compute with hugepages and verify applied profile by @sjpb in #672
- Support fixed IP addresses for nodes by @priteau in #643
- Ensure Ansible Galaxy installs are up to date by @sjpb in #700
- Bump all Pulp snapshots to latest versions in RL 8.x, RL 9.5 by @priteau in #707
- Add support for Node Health Checks by @sjpb in #654
- Add support for configuring Multi-Instance GPUs (MIG) by @jovial in #656
- Bump fatimage after MIG PR656 merge by @sjpb in #716
Full Changelog: v1.161...test
Images
Two new images are available:
- RL8: openhpc-RL8-250624-0854-75099868
- RL9: openhpc-RL9-250624-0854-75099868