Skip to content

Commit baf92bc

Browse files
authored
Merge branch 'main' into feat/rocky-9.6
2 parents a1d03ac + 95005f9 commit baf92bc

File tree

8 files changed

+40
-6
lines changed

8 files changed

+40
-6
lines changed

ansible/fatimage.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -262,7 +262,7 @@
262262
name: grafana-dashboards
263263

264264
- name: Add support for NVIDIA GPU auto detection to Slurm
265-
hosts: cuda
265+
hosts: slurm_recompile
266266
become: yes
267267
tasks:
268268
- name: Recompile slurm

ansible/roles/dnf_repos/defaults/main.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,8 @@ dnf_repos_openhpc_repolist:
4747
file: OpenHPC
4848
base_url: "{{ dnf_repos_pulp_content_url }}/{{ appliances_pulp_repos.openhpc_updates[ansible_distribution_major_version] | appliances_repo_to_subpath }}"
4949

50-
dnf_repos_repolist: "{{ dnf_repos_default_repolist + (dnf_repos_openhpc_repolist if (openhpc_install_type | default('ohpc')) == 'ohpc' else []) }}"
50+
dnf_repos_extra_repolist: []
51+
dnf_repos_repolist: "{{ dnf_repos_default_repolist + (dnf_repos_openhpc_repolist if (openhpc_install_type | default('ohpc')) == 'ohpc' else []) + dnf_repos_extra_repolist }}"
5152

5253
dnf_repos_epel_baseurl: "{{ dnf_repos_pulp_content_url }}/{{ appliances_pulp_repos.epel[ansible_distribution_major_version] | appliances_repo_to_subpath }}"
5354
dnf_repos_epel_description: "epel"

ansible/roles/proxy/defaults/main.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# proxy_http_proxy:
22
proxy_https_proxy: "{{ proxy_http_proxy }}"
3-
proxy_no_proxy_defaults: "{{ ['localhost', '127.0.0.1'] + groups['all'] + hostvars.values() | map(attribute='ansible_host') }}"
3+
proxy_no_proxy_defaults: "{{ ['localhost', '127.0.0.1', '169.254.169.254'] + groups['all'] + hostvars.values() | map(attribute='ansible_host') }}"
44
proxy_no_proxy_extras: []
55
proxy_no_proxy: "{{ (proxy_no_proxy_defaults + proxy_no_proxy_extras) | unique | sort | join(',') }}"
66
proxy_dnf: true

docs/openondemand.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,13 @@ The above functionality is configured by running the `ansible/portal.yml` playbo
3131
See the [ansible/roles/openondemand/README.md](../ansible/roles/openondemand/README.md) for more details on the variables described below.
3232

3333
The following variables have been given default values to allow Open OnDemand to work in a newly created environment without additional configuration, but generally should be overridden in `environments/site/inventory/group_vars/all/` with site-specific values:
34-
- `openondemand_servername` - this must be defined for both `openondemand` and `grafana` hosts (when Grafana is enabled). Default is `ansible_host` (i.e. the IP address) of the first host in the `openondemand` group.
34+
- `openondemand_servername` - this must be defined for both `openondemand` and
35+
`grafana` hosts (when Grafana is enabled). The default is `ansible_host` (i.e.
36+
the IP address) of the first host in the `openondemand` group. For production
37+
environments this should probably be a DNS name.
38+
- `openondemand_ssl_cert` and `openondemand_ssl_cert_key` - by default a
39+
self-signed certificate is generated, which should probably be replaced for
40+
production environments.
3541
- `openondemand_auth` and any corresponding options. Defaults to `basic_pam`.
3642
- `openondemand_desktop_partition` and `openondemand_jupyter_partition` if the corresponding inventory groups are defined. Defaults to the first compute group defined in the `compute` OpenTofu variable in `environments/$ENV/tofu`.
3743

docs/production.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,10 @@ production-ready deployments.
2424
inventory = ../common/inventory,../site/inventory,inventory
2525
```
2626

27+
In general only the `site` environment will need an `inventory/groups` file -
28+
this is templated out by cookiecutter and should be modified as required to
29+
enable features for all environments at the site.
30+
2731
- To avoid divergence of configuration all possible overrides for group/role
2832
vars should be placed in `environments/site/inventory/group_vars/all/*.yml`
2933
unless the value really is environment-specific (e.g. DNS names for
@@ -127,7 +131,17 @@ and referenced from the `site` and `production` environments, e.g.:
127131
set the "attach" options and run `tofu apply` again - this should show there
128132
are no changes planned.
129133
130-
- Configure Open OnDemand - see [specific documentation](openondemand.md).
134+
- Consider whether Prometheus storage configuration is required. By default:
135+
- A 200GB state volume is provisioned (but see above)
136+
- The common environment [sets](../environments/common/inventory/group_vars/all/prometheus.yml)
137+
a maximum retention of 100 GB and 31 days
138+
These may or may not be appropriate depending on the number of nodes, the
139+
scrape interval, and other uses of the state volume (primarily the `slurmctld`
140+
state and the `slurmdbd` database). See [docs/monitoring-and-logging](./monitoring-and-logging.md)
141+
for more options.
142+
143+
- Configure Open OnDemand - see [specific documentation](openondemand.md) which
144+
notes specific variables required.
131145
132146
- Remove the `demo_user` user from `environments/$ENV/inventory/group_vars/all/basic_users.yml`
133147

docs/upgrades.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,10 @@ All other commands should be run on the Ansible deploy host.
5050
site-specific configuration. In general changes to existing functionality will aim to be
5151
backward compatible. Alteration of site-specific configuration will usually only be
5252
necessary to use new functionality or where functionality has been upstreamed as above.
53+
Note that the `environments/common/layouts/everything` file contains all possible
54+
groups which can be used to enable features; diff this against your e.g.
55+
`environments/site/inventory/groups` file to see new features which you may
56+
wish to enable in the latter file.
5357

5458
Make changes as necessary.
5559

@@ -60,7 +64,9 @@ All other commands should be run on the Ansible deploy host.
6064

6165
Note that some releases may not include new images. In this case use the image from the latest previous release with new images.
6266

63-
1. If required, build an "extra" image with local modifications, see [docs/image-build.md](./image-build.md).
67+
1. If an "extra" image build with local modifications is required, update the
68+
Packer build configuration to use the above new image and run a build. See
69+
[docs/image-build.md](./image-build.md).
6470

6571
1. Modify your site-specific environment to use this image, e.g. via `cluster_image_id` in `environments/$SITE_ENV/tofu/variables.tf`.
6672

environments/common/inventory/groups

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,9 @@ freeipa_client
118118
[cuda]
119119
# Hosts to install NVIDIA CUDA on - see ansible/roles/cuda/README.md
120120

121+
[slurm_recompile]
122+
# Hosts to recompile Slurm for - allows supporting Slurm autodetection method 'nvml'
123+
121124
[vgpu]
122125
# Hosts where vGPU/MIG should be configured - see docs/mig.md
123126

environments/common/layouts/everything

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,10 @@ cluster
6565
[cuda]
6666
# Hosts to install NVIDIA CUDA on - see ansible/roles/cuda/README.md
6767

68+
[slurm_recompile:children]
69+
# Hosts to recompile Slurm for - allows supporting Slurm autodetection method 'nvml'
70+
cuda
71+
6872
[eessi:children]
6973
# Hosts on which EESSI stack should be configured
7074
openhpc

0 commit comments

Comments
 (0)