Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
ba8a38a
nodegroups using nodesets - doesn't handle empty nodegroups
sjpb May 7, 2025
8f9436f
cope with empty nodegroups/partitions
sjpb May 7, 2025
0abbf76
make gres work again
sjpb May 7, 2025
b8c64dc
make node/partition parameters more greppable
sjpb May 8, 2025
6dabb2f
use features to simplify nodeset configuration
sjpb May 8, 2025
ea7902a
add nodegroup.features
sjpb May 8, 2025
d16b6ba
add validation
sjpb May 8, 2025
4f3bbc8
document nodegroup.features to README
sjpb May 8, 2025
f126bba
add better examples in README
sjpb May 8, 2025
e993a54
tidy up README
sjpb May 8, 2025
e41cc84
fix validate task path
sjpb May 8, 2025
3440050
fix lint error
sjpb May 8, 2025
319ddf3
default partitions to nodegroups to make CI easier
sjpb May 8, 2025
c8e73ee
update molecule tests for openhpc_nodegroups
sjpb May 8, 2025
f5d0698
remove checks from runtime now validation defined
sjpb May 8, 2025
9f7b19d
fix NodeName= lines missing newlines between them when multiple hostl…
sjpb May 8, 2025
02ba27c
remove tests for extra_nodes
sjpb May 8, 2025
10a8ace
allow missing inventory groups (as per docs) when validating nodegroups
sjpb May 8, 2025
3c706d7
only run validation once
sjpb May 8, 2025
03dea2e
remove test14 from CI - extra_nodes feature removed
sjpb May 8, 2025
175a1c0
update complex test for new group/partition variables
sjpb May 8, 2025
6038a0c
rename openhpc_partitions.groups -> openhpc_partitions.nodegroups f…
sjpb May 9, 2025
6be1ed6
output NodeName hostlists on single line to improve large BM schedule…
sjpb May 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 33 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,30 +50,44 @@ each list element:

### slurm.conf

`openhpc_slurm_partitions`: Optional. List of one or more slurm partitions, default `[]`. Each partition may contain the following values:
* `groups`: If there are multiple node groups that make up the partition, a list of group objects can be defined here.
Otherwise, `groups` can be omitted and the following attributes can be defined in the partition object:
* `name`: The name of the nodes within this group.
* `cluster_name`: Optional. An override for the top-level definition `openhpc_cluster_name`.
* `extra_nodes`: Optional. A list of additional node definitions, e.g. for nodes in this group/partition not controlled by this role. Each item should be a dict, with keys/values as per the ["NODE CONFIGURATION"](https://slurm.schedmd.com/slurm.conf.html#lbAE) docs for slurm.conf. Note the key `NodeName` must be first.
* `ram_mb`: Optional. The physical RAM available in each node of this group ([slurm.conf](https://slurm.schedmd.com/slurm.conf.html) parameter `RealMemory`) in MiB. This is set using ansible facts if not defined, equivalent to `free --mebi` total * `openhpc_ram_multiplier`.
* `ram_multiplier`: Optional. An override for the top-level definition `openhpc_ram_multiplier`. Has no effect if `ram_mb` is set.
`openhpc_nodegroups`: Optional, default `[]`. List of mappings, each defining a
unique set of homogenous nodes:
* `name`: Required. Name of node group.
* `ram_mb`: Optional. The physical RAM available in each node of this group
([slurm.conf](https://slurm.schedmd.com/slurm.conf.html) parameter `RealMemory`)
in MiB. This is set using ansible facts if not defined, equivalent to
`free --mebi` total * `openhpc_ram_multiplier`.
* `ram_multiplier`: Optional. An override for the top-level definition
`openhpc_ram_multiplier`. Has no effect if `ram_mb` is set.
* `gres`: Optional. List of dicts defining [generic resources](https://slurm.schedmd.com/gres.html). Each dict must define:
- `conf`: A string with the [resource specification](https://slurm.schedmd.com/slurm.conf.html#OPT_Gres_1) but requiring the format `<name>:<type>:<number>`, e.g. `gpu:A100:2`. Note the `type` is an arbitrary string.
- `file`: A string with the [File](https://slurm.schedmd.com/gres.conf.html#OPT_File) (path to device(s)) for this resource, e.g. `/dev/nvidia[0-1]` for the above example.

Note [GresTypes](https://slurm.schedmd.com/slurm.conf.html#OPT_GresTypes) must be set in `openhpc_config` if this is used.

* `default`: Optional. A boolean flag for whether this partion is the default. Valid settings are `YES` and `NO`.
* `maxtime`: Optional. A partition-specific time limit following the format of [slurm.conf](https://slurm.schedmd.com/slurm.conf.html) parameter `MaxTime`. The default value is
* `params`: Optional. Mapping of additional parameters and values for
[node configuration](https://slurm.schedmd.com/slurm.conf.html#lbAE).

Each nodegroup will contain hosts from an Ansible inventory group named
`{{ openhpc_cluster_name }}_{{ group_name}}`. Note that:
- Each host may only appear in one nodegroup.
- Hosts in a nodegroup are assumed to be homogenous in terms of processor and memory.
- Hosts may have arbitrary hostnames, but these should be lowercase to avoid a
mismatch between inventory and actual hostname.
- An inventory group may be missing or empty, in which case the node group
contains no hosts.
- If the inventory group is not empty the play must contain at least one host.
This is used to set `Sockets`, `CoresPerSocket`, `ThreadsPerCore` and
optionally `RealMemory` for the nodegroup.

`openhpc_partitions`: Optional, default `[]`. List of mappings, each defining a
partition. Each partition mapping may contain:
* `name`: Required. Name of partition.
* `groups`: Optional. List of nodegroup names. If omitted, the partition name
is assumed to match a nodegroup name.
* `default`: Optional. A boolean flag for whether this partion is the default. Valid settings are `YES` and `NO`.
* `maxtime`: Optional. A partition-specific time limit following the format of [slurm.conf](https://slurm.schedmd.com/slurm.conf.html) parameter `MaxTime`. The default value is
given by `openhpc_job_maxtime`. The value should be quoted to avoid Ansible conversions.
* `partition_params`: Optional. Mapping of additional parameters and values for [partition configuration](https://slurm.schedmd.com/slurm.conf.html#SECTION_PARTITION-CONFIGURATION).

For each group (if used) or partition any nodes in an ansible inventory group `<cluster_name>_<group_name>` will be added to the group/partition. Note that:
- Nodes may have arbitrary hostnames but these should be lowercase to avoid a mismatch between inventory and actual hostname.
- Nodes in a group are assumed to be homogenous in terms of processor and memory.
- An inventory group may be empty or missing, but if it is not then the play must contain at least one node from it (used to set processor information).

* `params`: Optional. Mapping of additional parameters and values for
[partition configuration](https://slurm.schedmd.com/slurm.conf.html#SECTION_PARTITION-CONFIGURATION).

`openhpc_job_maxtime`: Maximum job time limit, default `'60-0'` (60 days). See [slurm.conf](https://slurm.schedmd.com/slurm.conf.html) parameter `MaxTime` for format. The default is 60 days. The value should be quoted to avoid Ansible conversions.

Expand Down
3 changes: 2 additions & 1 deletion defaults/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ openhpc_slurm_service_started: "{{ openhpc_slurm_service_enabled }}"
openhpc_slurm_service:
openhpc_slurm_control_host: "{{ inventory_hostname }}"
#openhpc_slurm_control_host_address:
openhpc_slurm_partitions: []
openhpc_partitions: []
openhpc_nodegroups: []
openhpc_cluster_name:
openhpc_packages:
- slurm-libpmi-ohpc
Expand Down
23 changes: 9 additions & 14 deletions templates/gres.conf.j2
Original file line number Diff line number Diff line change
@@ -1,16 +1,11 @@
AutoDetect=off
{% for part in openhpc_slurm_partitions %}
{% set nodelist = [] %}
{% for group in part.get('groups', [part]) %}
{% if 'gres' in group %}
{% for gres in group.gres %}
{% set gres_name, gres_type, _ = gres.conf.split(':') %}
{% set group_name = group.cluster_name|default(openhpc_cluster_name) ~ '_' ~ group.name %}
{% set inventory_group_hosts = groups.get(group_name, []) %}
{% for hostlist in (inventory_group_hosts | hostlist_expression) %}
{% for nodegroup in openhpc_nodegroups %}
{% for gres in nodegroup.gres | default([]) %}
{% set gres_name, gres_type, _ = gres.conf.split(':') %}
{% set inventory_group_name = openhpc_cluster_name ~ '_' ~ nodegroup.name %}
{% set inventory_group_hosts = groups.get(inventory_group_name, []) %}
{% for hostlist in (inventory_group_hosts | hostlist_expression) %}
NodeName={{ hostlist }} Name={{ gres_name }} Type={{ gres_type }} File={{ gres.file }}
{% endfor %}
{% endfor %}
{% endif %}
{% endfor %}
{% endfor %}
{% endfor %}{# hostlists #}
{% endfor %}{# gres #}
{% endfor %}{# nodegroup #}
73 changes: 37 additions & 36 deletions templates/slurm.conf.j2
Original file line number Diff line number Diff line change
Expand Up @@ -135,55 +135,56 @@ SlurmdSyslogDebug=info
#SlurmSchedLogFile=
#SlurmSchedLogLevel=
#DebugFlags=
#
#
# POWER SAVE SUPPORT FOR IDLE NODES - NOT SUPPORTED IN THIS APPLIANCE VERSION

# LOGIN-ONLY NODES
# Define slurmd nodes not in partitions for login-only nodes in "configless" mode:
{%if openhpc_login_only_nodes %}{% for node in groups[openhpc_login_only_nodes] %}
NodeName={{ node }}
{% endfor %}{% endif %}

# COMPUTE NODES
# OpenHPC default configuration
PropagateResourceLimitsExcept=MEMLOCK
Epilog=/etc/slurm/slurm.epilog.clean
{% set donehosts = [] %}
{% for part in openhpc_slurm_partitions %}
{% set nodelist = [] %}
{% for group in part.get('groups', [part]) %}
{% set group_name = group.cluster_name|default(openhpc_cluster_name) ~ '_' ~ group.name %}
# openhpc_slurm_partitions group: {{ group_name }}
{% set inventory_group_hosts = groups.get(group_name, []) %}
{% if inventory_group_hosts | length > 0 %}
{% set play_group_hosts = inventory_group_hosts | intersect (play_hosts) %}
{% set first_host = play_group_hosts | first | mandatory('Group "' ~ group_name ~ '" contains no hosts in this play - was --limit used?') %}
{% set first_host_hv = hostvars[first_host] %}
{% set ram_mb = (first_host_hv['ansible_memory_mb']['real']['total'] * (group.ram_multiplier | default(openhpc_ram_multiplier))) | int %}
{% for hostlist in (inventory_group_hosts | hostlist_expression) %}
{% set gres = ' Gres=%s' % (','.join(group.gres | map(attribute='conf') )) if 'gres' in group else '' %}
{% if hostlist not in donehosts %}
NodeName={{ hostlist }} State=UNKNOWN RealMemory={{ group.get('ram_mb', ram_mb) }} Sockets={{first_host_hv['ansible_processor_count']}} CoresPerSocket={{ first_host_hv['ansible_processor_cores'] }} ThreadsPerCore={{ first_host_hv['ansible_processor_threads_per_core'] }}{{ gres }}
{% endif %}
{% set _ = nodelist.append(hostlist) %}
{% set _ = donehosts.append(hostlist) %}
{% endfor %}{# nodes #}
{% endif %}{# inventory_group_hosts #}
{% for extra_node_defn in group.get('extra_nodes', []) %}
{{ extra_node_defn.items() | map('join', '=') | join(' ') }}
{% set _ = nodelist.append(extra_node_defn['NodeName']) %}
{% endfor %}
{% endfor %}{# group #}
{% if not nodelist %}{# empty partition #}
{% set nodelist = ['""'] %}
{% endif %}
PartitionName={{part.name}} Default={{ part.get('default', 'YES') }} MaxTime={{ part.get('maxtime', openhpc_job_maxtime) }} State=UP Nodes={{ nodelist | join(',') }} {{ part.partition_params | default({}) | dict2parameters }}
{% endfor %}{# partitions #}

# COMPUTE NODES
# OpenHPC default configuration
{% for nodegroup in openhpc_nodegroups %}
{% set inventory_group_name = openhpc_cluster_name ~ '_' ~ nodegroup.name %}
{% set inventory_group_hosts = groups.get(inventory_group_name, []) %}
{% if inventory_group_hosts | length > 0 %}
{% set play_group_hosts = inventory_group_hosts | intersect (play_hosts) %}
{% set first_host = play_group_hosts | first | mandatory('Inventory group "' ~ inventory_group_name ~ '" contains no hosts in this play - was --limit used?') %}
{% set first_host_hv = hostvars[first_host] %}
{% set ram_mb = (first_host_hv['ansible_memory_mb']['real']['total'] * (nodegroup.ram_multiplier | default(openhpc_ram_multiplier))) | int %}
{% set hostlists = (inventory_group_hosts | hostlist_expression) %}{# hosts in inventory group aren't necessarily a single hostlist expression #}
{% for hostlist in hostlists %}
NodeName={{ hostlist }} {{ '' -}}
State=UNKNOWN {{ '' -}}
RealMemory={{ nodegroup.ram_mb | default(ram_mb) }} {{ '' -}}
Sockets={{first_host_hv['ansible_processor_count'] }} {{ '' -}}
CoresPerSocket={{ first_host_hv['ansible_processor_cores'] }} {{ '' -}}
ThreadsPerCore={{ first_host_hv['ansible_processor_threads_per_core'] }} {{ '' -}}
{{ nodegroup.params | default({}) | dict2parameters }} {{ '' -}}
{% if 'gres' in nodegroup %}Gres={{ ','.join(nodegroup.gres | map(attribute='conf')) }}{% endif %}
{% endfor %}{# hostlists #}
{% endif %}{# 1 or more hosts in inventory #}

NodeSet={{ nodegroup.name }} Nodes={{ ','.join(hostlists | default(['""'])) }}{# no support for creating nodesets by Feature #}

{% endfor %}

# Define a non-existent node, in no partition, so that slurmctld starts even with all partitions empty
NodeName=nonesuch

# PARTITIONS
{% for partition in openhpc_partitions %}
PartitionName={{partition.name}} {{ '' -}}
Default={{ partition.get('default', 'YES') }} {{ '' -}}
MaxTime={{ partition.get('maxtime', openhpc_job_maxtime) }} {{ '' -}}
State=UP Nodes={{ partition.get('groups', [partition.name]) | join(',') }} {{ '' -}}
{{ partition.params | default({}) | dict2parameters }}
{% endfor %}{# openhpc_partitions #}

{% if openhpc_slurm_configless | bool %}SlurmctldParameters=enable_configless{% endif %}


ReturnToService=2
Loading