Skip to content

Commit 80ce744

Browse files
committed
add top level topology override + gate plugin on group being enabled
1 parent b00188e commit 80ce744

File tree

4 files changed

+22
-2
lines changed

4 files changed

+22
-2
lines changed

ansible/roles/topology/README.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,17 @@ Role Variables
1212

1313
- `topology_topology_nodes:`: Required list of strs. List of inventory hostnames of nodes to include in topology tree. Must be set to include all compute nodes in Slurm cluster. Default `[]`.
1414
- `topology_conf_template`: Optional str. Path to Jinja2 template of topology.conf file. Default
15-
`templates/topology.conf.j2`
15+
`templates/topology.conf.j2`
16+
- `topology_above_rack_topology`: Optionally multiline str. Used to define topology above racks/AZs if
17+
you wish to partition racks further under different logical switches. New switches above should be
18+
defined as [SwitchName lines](https://slurm.schedmd.com/topology.html#hierarchical) referencing
19+
rack Availability Zones under that switch in their `Switches fields`. These switches must themselves
20+
be under a top level switch. e.g
21+
```
22+
topology_above_rack_topology: |
23+
SwitchName=rack-group-1 Switches=rack-az-1,rack-az-2
24+
SwitchName=rack-group-2 Switches=rack-az-3,rack-az-4
25+
SwitchName=top-level Switches=rack-group-1,rack-group-2
26+
```
27+
Defaults to an empty string, which causes all AZs to be put under a
28+
single top level switch.

ansible/roles/topology/defaults/main.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,6 @@ topology_topology_nodes: []
33

44
# Override to use custom topology.conf template
55
topology_conf_template: templates/topology.conf.j2
6+
7+
topology_above_rack_topology: ""
8+

ansible/roles/topology/templates/topology.conf.j2

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,8 @@ SwitchName={{ instance_host }} Nodes={{ _topology.topology[az][instance_host] |
66
{% endfor %}
77
SwitchName={{ az }} Switches={{ _topology.topology[az].keys() | join(",") }}
88
{% endfor %}
9+
{% if topology_above_rack_topology == '' %}
910
SwitchName=master Switches={{ _topology.topology.keys() | join(",") }}
11+
{% else %}
12+
{{ topology_above_rack_topology }}
13+
{% endif %}

environments/common/inventory/group_vars/all/openhpc.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ openhpc_config_default:
5757
- enable_configless
5858
TaskPlugin: task/cgroup,task/affinity
5959
ReturnToService: 2 # workaround for templating bug TODO: Remove once on stackhpc.openhpc v1.2.0
60-
TopologyPlugin: topology/tree
60+
TopologyPlugin: "topology/{{ 'tree' if (topology_topology_nodes | length) > 0 else 'flat' }}"
6161

6262
# default additional slurm.conf parameters when "rebuild" enabled:
6363
openhpc_config_rebuild:

0 commit comments

Comments
 (0)