|
| 1 | +# Environments |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +An environment defines the configuration for a single instantiation of this Slurm appliance. Each environment is a directory in `environments/`, containing: |
| 6 | +- Any deployment automation required - e.g. OpenTofu configuration or HEAT templates. |
| 7 | +- An Ansible `inventory/` directory. |
| 8 | +- An `activate` script which sets environment variables to point to this configuration. |
| 9 | +- Optionally, additional playbooks in `hooks/` to run in addition to the default playbooks. |
| 10 | + |
| 11 | +All environments load the inventory from the `common` environment first, with the environment-specific inventory then overriding parts of this as required. |
| 12 | + |
| 13 | +### Environment-specific inventory structure |
| 14 | + |
| 15 | +The ansible inventory for the environment is in `environments/<environment>/inventory/`. It should generally contain: |
| 16 | +- A `hosts` file. This defines the hosts in the appliance. Generally it should be templated out by the deployment automation so it is also a convenient place to define variables which depend on the deployed hosts such as connection variables, IP addresses, ssh proxy arguments etc. |
| 17 | +- A `groups` file defining ansible groups, which essentially controls which features of the appliance are enabled and where they are deployed. This repository generally follows a convention where functionality is defined using ansible roles applied to a a group of the same name, e.g. `openhpc` or `grafana`. The meaning and use of each group is described in comments in `environments/common/inventory/groups`. As the groups defined there for the common environment are empty, functionality is disabled by default and must be enabled in a specific environment's `groups` file. Two template examples are provided in `environments/commmon/layouts/` demonstrating a minimal appliance with only the Slurm cluster itself, and an appliance with all functionality. |
| 18 | +- Optionally, group variable files in `group_vars/<group_name>/overrides.yml`, where the group names match the functional groups described above. These can be used to override the default configuration for each functionality, which are defined in `environments/common/inventory/group_vars/all/<group_name>.yml` (the use of `all` here is due to ansible's precedence rules). |
| 19 | + |
| 20 | +Although most of the inventory uses the group convention described above there are a few special cases: |
| 21 | +- The `control`, `login` and `compute` groups are special as they need to contain actual hosts rather than child groups, and so should generally be defined in the templated-out `hosts` file. |
| 22 | +- The cluster name must be set on all hosts using `openhpc_cluster_name`. Using an `[all:vars]` section in the `hosts` file is usually convenient. |
| 23 | +- `environments/common/inventory/group_vars/all/defaults.yml` contains some variables which are not associated with a specific role/feature. These are unlikely to need changing, but if necessary that could be done using a `environments/<environment>/inventory/group_vars/all/overrides.yml` file. |
| 24 | +- The `ansible/adhoc/generate-passwords.yml` playbook sets secrets for all hosts in `environments/<environent>/inventory/group_vars/all/secrets.yml`. |
| 25 | +- The Packer-based pipeline for building compute images creates a VM in groups `builder` and `compute`, allowing build-specific properties to be set in `environments/common/inventory/group_vars/builder/defaults.yml` or the equivalent inventory-specific path. |
| 26 | +- Each Slurm partition must have: |
| 27 | + - An inventory group `<cluster_name>_<partition_name>` defining the hosts it contains - these must be homogenous w.r.t CPU and memory. |
| 28 | + - An entry in the `openhpc_slurm_partitions` mapping in `environments/<environment>/inventory/group_vars/openhpc/overrides.yml`. |
| 29 | + See the [openhpc role documentation](https://github.com/stackhpc/ansible-role-openhpc#slurmconf) for more options. |
| 30 | +- On an OpenStack cloud, rebuilding/reimaging compute nodes from Slurm can be enabled by defining a `rebuild` group containing the relevant compute hosts (e.g. in the generated `hosts` file). |
0 commit comments