Skip to content

Release 15

Choose a tag to compare

@aphmschonewille aphmschonewille released this 27 Feb 14:44
· 509 commits to main since this release

New features

  • AlertX - commandline and graphical application to manage prometheus alerts, rules and manage Node Health Checking (NHC)
  • NHC drainer - nodes triggered by the NHC rule are drained from jobs. currently slurm supported.
  • Per Job statistics - detailed breakdown per job for resource utilization and power consumption
  • Beta ARM support. Note that currently only homogeneous clusters are supported. Controller(s) and nodes are expected to be the same architecture, ARM+ARM and x86+x86
  • additional prometheus exporters for collecting more metrics including GPU, Hardware config and state
  • OOD application for changing a user’s password
  • Improved/extended grafana panels
  • luna 2.1
  • Open Ondemand 4.0.0
  • latest OpenHPC release 2.9 for EL8 and 3.2.1 for EL9
  • Prometheus 3.1.0
  • HA setups support cross mount shared disk exports, allowing passive/standby controllers to access the shared filesystems