Skip to content

v2.0

Compare
Choose a tag to compare
@sjpb sjpb released this 24 Jun 16:01
· 100 commits to main since this release
5f2f931

Key Changes

  • ⚠️ BACKWARDS-INCOMPATIBLE CHANGE ⚠️ To allow adding support for Multi-Instance GPUs in #656, the variable openhpc_slurm_partitions has been replaced by openhpc_nodegroups and openhpc_partitions. See #666 for full details and examples.
  • The tuned profile hpc-compute is now usable for nodes with hugepages enabled. See #672.
  • Support for LBNL's Node Health Checks added in #654. See ansible/roles/nhc/README.md
  • Support for configuring Multi-Instance GPUs (MIG) added in #656 - see docs/mig.md for full details.
  • Changes to the stackhpc.openhpc role for the above also mean that parameters can be removed from slurm.conf. See stackhpc/ansible-role-openhpc#184.
  • NVIDIA (open) drivers were upgraded to version 575.57.08 with cuda 12.9.1 in #703.
  • Lustre bug LU-18085 affecting Rocky Linux 9 was fixed in #688.
  • The appliance will now raise an error if Ansible Galaxy installs do not match requirements.yml, in #700
  • OS packages were updated for both Rocky Linux 8 and 9 in #707, with the latter now having the last updates for Rocky Linux 9.5.

What's Changed

All PRs, oldest first:

  • Fix typo in comment by @priteau in #675
  • Update appliance for stackhpc.openhpc nodegroup/partition changes by @sjpb in #666
  • Bump CUDA to 12.9 and NVIDIA driver to 575 by @priteau in #687
  • Fix environment creation from skeleton by @priteau in #682
  • Make home volume creation optional by @sjpb in #673
  • Add a simple index in the docs README. by @MoteHue in #669
  • Make packer var image_disk_format functional by @sjpb in #694
  • Remove description of un-implemented dummy interface/default route by @sjpb in #689
  • Allow specifying instance root volume type by @sjpb in #693
  • Add fix for Lustre bug LU-18085 by @sjpb in #688
  • Fix docs for cacerts role to mention cacerts_cert_dir by @technowhizz in #696
  • Update operations.md typo by @technowhizz in #697
  • Change host definition for cacert play to allow builder by @technowhizz in #698
  • Add validation for OpenTofu compute and login variables by @sjpb in #674
  • Remove incorrect note re ondemand for demo deployment by @sjpb in #695
  • Fix nvidia build at open driver version 575.57.08 with cuda 12.9.1 by @jovial in #703
  • Fix tuned hpc-compute with hugepages and verify applied profile by @sjpb in #672
  • Support fixed IP addresses for nodes by @priteau in #643
  • Ensure Ansible Galaxy installs are up to date by @sjpb in #700
  • Bump all Pulp snapshots to latest versions in RL 8.x, RL 9.5 by @priteau in #707
  • Add support for Node Health Checks by @sjpb in #654
  • Add support for configuring Multi-Instance GPUs (MIG) by @jovial in #656
  • Bump fatimage after MIG PR656 merge by @sjpb in #716

Full Changelog: v1.161...test

Images

Two new images are available:

  • RL8: openhpc-RL8-250624-0854-75099868
  • RL9: openhpc-RL9-250624-0854-75099868