Skip to content

Commit 93ea6bf

Browse files
authored
Improve pulp docs (#819)
* improve pulp and related docs * fix docs spelling * fix GHFM alerts not being rendered in lists * fix lint errors * fix markdown lint errors * fix markdown prettier errors * re-add lost changes following review on PR#812 * fix linter whitespace
1 parent b60d457 commit 93ea6bf

File tree

6 files changed

+119
-65
lines changed

6 files changed

+119
-65
lines changed

ansible/adhoc/sync-pulp.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,5 @@
55
name: pulp_site
66
tasks_from: sync.yml
77
vars:
8-
pulp_site_target_arch: "x86_64"
9-
pulp_site_target_distribution: "rocky"
108
# default distribution to *latest* specified for baseos repo:
119
pulp_site_target_distribution_version: "{{ dnf_repos_repos['baseos'].keys() | map('float') | sort | last }}"

docs/experimental/pulp.md

Lines changed: 89 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,54 +1,108 @@
11
# Pulp Server
22

3-
In order to ensure reproducible builds, the appliance can build images using repository mirrors from StackHPC's "Ark" Pulp server. The appliance can sync relevant repositories to a local Pulp server which will then be used instead of Ark.
3+
In order to ensure reproducibility, by default image builds use mirrors of DNF
4+
repositories hosted on StackHPC's "Ark" Pulp server. This page describes how to
5+
use a local Pulp server instead of Ark, which reduces network traffic and speeds
6+
up builds. The repositories on this local Pulp server are synchronised to Ark so
7+
that builds still use the same package snapshots.
48

5-
## Deploying/configuring Pulp Server
9+
It is also possible to use a local Pulp server to install packages during the
10+
`site.yml` playbook rather than during image builds, as described in [docs/operations.md](../operations.md#adding-additional-packages).
611

7-
### Deploying a Pulp server
12+
## Deploying and Configuring a Local Pulp Server
813

9-
A playbook is provided to install and configure a Pulp server on a given host. Admin credentials for this server are automatically generated through the `ansible/adhoc/generate-passwords.yml` playbook. To use this, create an inventory file
10-
defining a group `pulp_server` containing a single host, which requires at least 2 vCPUs and 4GB RAM. The group should be defined in your `site` environment's inventory so that a single Pulp server is shared between all environments and
11-
the same snapshots are tested in staging and production.
12-
Deploying and syncing Pulp has been tested on an RL9 host. The hostvar `ansible_host` should be defined, giving the IP address Ansible should use for SSH. For example, you can create an ini file at `environments/site/inventory/pulp` with the contents:
14+
The appliance can install and configure a local Pulp server on a specified host.
15+
This host should run RockyLinux 8 or 9 and have at least 2 vCPUs and 8GB RAM.
16+
Note upgrades etc. of this host will not be managed by the appliance. Access to
17+
Pulp content is not authenticated so this server should not be externally
18+
reachable.
1319

14-
```ini
15-
[pulp_server]
16-
pulp_host ansible_host=<VM-ip-address>
17-
```
20+
> [!IMPORTANT]
21+
> Commands below should be run with the `staging` environment active, as all
22+
> Pulp syncs will be done from there.
1823
19-
> [!WARNING]
20-
> The inventory hostname cannot conflict with group names i.e can't be called `pulp_site` or `pulp_server`.
24+
1. Define the host in a group `pulp_server` within the `site` inventory. This
25+
means clusters in all environments use the same Pulp server, and the synced
26+
DNF repository snapshots are tested in staging before use in production. E.g.:
2127

22-
Once complete, it will print a message giving a value to set for `appliances_pulp_url` (see example config below), assuming the `ansible_host` address is also the address the cluster
23-
should use to reach the Pulp server.
28+
```ini
29+
# environments/site/inventory/pulp:
30+
[pulp_server]
31+
pulp_host ansible_host=<VM-ip-address>
32+
```
2433

25-
Note access to this server's content isn't authenticated so this assumes the `pulp_server` host is not externally reachable.
34+
**NB:** The inventory hostname must not conflict with group names, i.e. it
35+
cannot be `pulp_site` or `pulp_server`.
2636

27-
### Using an existing Pulp server
37+
2. If adding Pulp to an existing deployment, ensure Pulp admin credentials
38+
exist:
2839

29-
An existing Pulp server can be used to host Ark repos by overriding `pulp_site_password` and `appliances_pulp_url` in the target environment. Note that this assumes the same configuration as the appliance deployed Pulp i.e no content authentication.
40+
```shell
41+
ansible-vault decrypt environments/staging/inventory/group_vars/all/secrets.yml
42+
ansible-playbook ansible/adhoc/generate-passwords.yml
43+
ansible-vault encrypt environments/staging/inventory/group_vars/all/secrets.yml
44+
```
3045

31-
## Syncing Pulp content with Ark
46+
3. Run the adhoc playbook to install and configure Pulp:
47+
48+
```shell
49+
ansible-playbook ansible/adhoc/deploy-pulp.yml
50+
```
51+
52+
Once complete, it will print a message giving a value to set for
53+
`appliances_pulp_url`, assuming the inventory `ansible_host` address is
54+
also the address the cluster should use to reach the Pulp server.
55+
56+
4. Create group vars files defining `appliances_pulp_url` and dev credentials
57+
for StackHPC's "Ark" Pulp server:
58+
59+
```yaml
60+
# environments/site/inventory/group_vars/all/pulp.yml:
61+
appliances_pulp_url: "http://<pulp-host-ip>:8080"
62+
pulp_site_upstream_username: your-ark-username
63+
pulp_site_upstream_password: "{{ vault_pulp_site_upstream_password }}"
64+
```
65+
66+
```yaml
67+
# environments/site/inventory/group_vars/all/vault_pulp.yml:
68+
vault_pulp_site_upstream_password: your-ark-password
69+
```
3270
33-
If the `pulp_site` group is added to the Packer build groups, the local Pulp server will be synced with Ark on build. You must authenticate with Ark by overriding `pulp_site_upstream_username` and `pulp_site_upstream_password` with your vault encrypted Ark dev credentials. `dnf_repos_username` and `dnf_repos_password` must remain unset to access content from the local Pulp.
71+
and vault-encrypt the latter:
3472
35-
Content can also be synced by running `ansible/adhoc/sync-pulp.yml`. By default this syncs repositories for the latest version of Rocky supported by the appliance but this can be overridden by setting extra variables for `pulp_site_target_arch`, `pulp_site_target_distribution` and `pulp_site_target_distribution_version`.
73+
```shell
74+
ansible-vault encrypt environments/site/inventory/group_vars/all/vault_pulp.yml
75+
```
76+
77+
If previously using Ark credentials directly e.g. for image builds, ensure
78+
the variables `dnf_repos_username` and `dnf_repos_password` are no longer
79+
set in any environment.
80+
81+
5. Commit changes.
82+
83+
## Using an existing Pulp server
84+
85+
Alternatively, an existing Pulp server can be used to host Ark repos by
86+
setting `appliances_pulp_url` directly. Note that this assumes the same
87+
configuration as the appliance deployed Pulp i.e no content authentication.
88+
As above, the `dnf_repos_` variables must not be set in this configuration.
89+
90+
## Syncing Pulp content with Ark
3691

37-
## Example config in site variables
92+
The appliance can synchronise repositories on local Pulp server from Ark in
93+
two ways:
3894

39-
```yaml
40-
# environments/site/inventory/group_vars/all/pulp_site.yml:
41-
appliances_pulp_url: "http://<pulp-host-ip>:8080"
42-
pulp_site_upstream_username: <Ark-username>
43-
pulp_site_upstream_password: <Ark-password>
44-
```
95+
1. If the `pulp_site` group is added to the Packer build groups, the local Pulp
96+
server will be synced with Ark during image builds.
4597

46-
## Installing packages from Pulp at runtime
98+
2. The sync can be manually be triggered by running:
4799

48-
By default, system repos are overwritten to point at Pulp repos during [image builds,](../image-build.md) so using a site Pulp server will require a new fatimage. If you instead wish to install packages at runtime,
49-
you will need to add all host groups on which you will be installing packages to the `dnf_repos` group in `environments/site/inventory/groups` e.g:
100+
```shell
101+
ansible-playbook ansible/adhoc/sync-pulp.yml
102+
```
50103

51-
```yaml
52-
[dnf_repos:children]
53-
cluster
54-
```
104+
By default this method syncs repositories for the latest version of RockyLinux
105+
supported by the appliance. This can be overridden by setting
106+
`pulp_site_target_distribution_version` to e.g. `'8.10'`, i.e the `Major.minor`
107+
version of RockyLinux the site clusters are using. **NB:** This value
108+
must be quoted to avoid an incorrect conversion to float.

docs/image-build.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -67,12 +67,12 @@ For either a site-specific fat-image build or an extra-build:
6767
- Normally the network must provide outbound internet access. However it
6868
does not need to provide access to resources used by the actual cluster
6969
nodes (e.g. Slurm control node, network filesystem servers etc.).
70-
- The flavor used must have sufficent memory for the build tasks (usually
70+
- The flavor used must have sufficient memory for the build tasks (usually
7171
8GB), but otherwise does not need to match the actual cluster node
7272
flavor(s).
7373
- By default, the build VM is volume-backed to allow control of the root
7474
disk size (and hence final image size), so the flavor's disk size does not
75-
matter. The default volume size is not sufficent if enabling `cuda` and/or
75+
matter. The default volume size is not sufficient if enabling `cuda` and/or
7676
`doca` and should be increased:
7777
```terraform
7878
volume_size = 35 # GB
@@ -93,7 +93,7 @@ For either a site-specific fat-image build or an extra-build:
9393
All possible groups are listed in `environments/common/groups` but common
9494
options for this variable will be:
9595
96-
- For a fatimage build: `fatimage`: This is defined in `enviroments/site/inventory/groups`
96+
- For a fatimage build: `fatimage`: This is defined in `environments/site/inventory/groups`
9797
and results in an update of all packages in the source image, plus
9898
installation of packages for default control, login and compute nodes.
9999
@@ -137,15 +137,15 @@ For either a site-specific fat-image build or an extra-build:
137137
In summary, Packer creates an OpenStack VM, runs Ansible on that, shuts it down, then creates an image from the root disk.
138138

139139
Many of the Packer variables defined in `openstack.pkr.hcl` control the definition of the build VM and how to SSH to it to run Ansible. These are generic OpenStack builder options
140-
and are not specific to the Slurm Appliance. Packer varibles can be set in a file at any convenient path; the build example above
140+
and are not specific to the Slurm Appliance. Packer variables can be set in a file at any convenient path; the build example above
141141
shows the use of the environment variable `$PKR_VAR_environment_root` (which itself sets the Packer variable
142142
`environment_root`) to automatically select a variable file from the current environment, but for site-specific builds
143143
using a path in a "parent" environment is likely to be more appropriate (as builds should not be environment-specific to allow testing before deployment to a production environment).
144144

145145
What is Slurm Appliance-specific are the details of how Ansible is run:
146146

147147
- The build VM is always added to the `builder` inventory group, which differentiates it from nodes in a cluster. This allows
148-
Ansible variables to be set differently during Packer builds, e.g. to prevent services starting. The defaults for this are in `environments/common/inventory/group_vars/builder/`, which could be extended or overriden for site-specific fat image builds using `builder` groupvars for the relevant environment. It also runs some builder-specific code (e.g. to clean up the image).
148+
Ansible variables to be set differently during Packer builds, e.g. to prevent services starting. The defaults for this are in `environments/common/inventory/group_vars/builder/`, which could be extended or overridden for site-specific fat image builds using `builder` groupvars for the relevant environment. It also runs some builder-specific code (e.g. to clean up the image).
149149
- The default fat image builds also add the build VM to the "top-level" `compute`, `control` and `login` groups. This ensures
150150
the Ansible specific to all of these types of nodes run. Note other inventory groups are constructed from these by `environments/common/inventory/groups file` - this is not builder-specific.
151151
- As noted above, for "extra" builds the additional groups can be specified directly. In this way an existing image can be extended with site-specific Ansible, without modifying the

docs/operations.md

Lines changed: 23 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -67,25 +67,24 @@ This is a usually a two-step process:
6767

6868
Deploying the additional nodes and applying these changes requires rerunning both OpenTofu and the Ansible site.yml playbook - follow [Deploying a Cluster](#deploying-a-cluster).
6969

70-
## Package Repositories
71-
72-
```yaml
73-
[squid:children]
74-
# Hosts to run squid proxy
75-
login
76-
```
70+
## Adding Additional Packages
7771

78-
Note that many non-default roles include package installations from repositories which the appliance overwrites to point at snapshotted mirrors on a Pulp server (by default StackHPC's Ark server), which are
79-
disabled during runtime to prevent Ark credentials from being leaked. To enable this functionality, you must therefore either:
72+
The StackHPC images provided via [GitHub releases](https://github.com/stackhpc/ansible-slurm-appliance/releases)
73+
have all DNF repositories disabled, because for reproducibility these images are
74+
build using (authenticated) mirrors hosted on StackHPC's "Ark" Pulp server and
75+
the credentials are not provided as part of the appliance.
8076

81-
- Create a site-specific fatimage (see [image build docs](image-build.md)) with the appropriate group added to the `inventory_groups` Packer variables.
82-
- If you instead wish roles to perform their installations during runtime, deploy a site Pulp server and sync it with with mirrors of the snapshots from the upstream Ark server (see [Pulp docs](experimental/pulp.md)).
77+
This means that when running the `site.yml` playbook, by default:
8378

84-
In both cases, Ark credentials will be required.
79+
- Features which are not enabled by default, e.g., `freeipa_client`, cannot
80+
install the packages they require.
81+
- It is not possible to install arbitrary packages using e.g. an `ansible.builtin.dnf`
82+
task in a hook.
8583

86-
## Adding Additional Packages
84+
The recommended way to resolve both of these issues is by carrying out a
85+
site-specific [image build](./image-build.md).
8786

88-
By default, the following utility packages are installed during the StackHPC image build:
87+
By default, the following utility packages are installed in StackHPC images:
8988

9089
- htop
9190
- nano
@@ -101,7 +100,7 @@ By default, the following utility packages are installed during the StackHPC ima
101100

102101
Additional packages can be added during image builds by:
103102

104-
1. Configuring an [image build](./image-build.md) to enable the
103+
1. Configuring the [image build](./image-build.md) to enable the
105104
`extra_packages` group:
106105

107106
```terraform
@@ -129,10 +128,13 @@ the OpenHPC installation guide (linked from the
129128
"user-facing" OpenHPC packages such as compilers, MPI libraries etc. include
130129
corresponding `lmod` modules.
131130

132-
Packages _may_ also be installed during the site.yml, by adding the `cluster`
133-
group as a child of the `extra_packages` group. An error will occur if Ark
134-
credential are defined in this case, as they are readable by unprivileged users
135-
in the `.repo` files and a local Pulp mirror must be used instead.
131+
If a site-specific image build and cluster reimage is not possible (e.g. for
132+
an urgent patch), it is possible to install packages directly during the
133+
`site.yml` playbook by adding the `cluster` group as a child of the
134+
`extra_packages` group. An error will occur if Ark credentials are defined in
135+
this case, as they are readable by unprivileged users in the `.repo` files. A
136+
local Pulp mirror must be used instead, which also has the advantage of making
137+
this approach more reproducable.
136138

137139
If additional repositories are required, these could be added/enabled as necessary in a play added to `environments/$SITE_ENV/hooks/{pre,post}.yml` as appropriate.
138140
Note such a play should NOT exclude the builder group, so that the repositories are also added to built images.
@@ -210,7 +212,7 @@ ansible-playbook ansible/adhoc/$PLAYBOOK
210212
Currently they include the following (see each playbook for links to documentation):
211213

212214
- `hpctests.yml`: MPI-based cluster tests for latency, bandwidth and floating point performance.
213-
- `rebuild.yml`: Rebuild nodes with existing or new images (NB: this is intended for development not for reimaging nodes on an in-production cluster).
215+
- `rebuild.yml`: Rebuild nodes with existing or new images (NB: this is intended for development not for re-imaging nodes on an in-production cluster).
214216
- `restart-slurm.yml`: Restart all Slurm daemons in the correct order.
215217
- `update-packages.yml`: Update specified packages on cluster nodes (NB: not recommended for routine use).
216218

@@ -220,4 +222,4 @@ The `ansible` binary [can be used](https://docs.ansible.com/ansible/latest/comma
220222
ansible [--become] <group/host> -m shell -a "<shell command>"
221223
```
222224

223-
This can be useful for debugging and development but any modifications made this way will be lost if nodes are rebuilt/reimaged.
225+
This can be useful for debugging and development but any modifications made this way will be lost if nodes are rebuilt/re-imaged.

environments/common/inventory/groups

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -229,7 +229,7 @@ doca
229229
# Host to deploy a Pulp server on and sync with mirrors of upstream Ark repositories. Should be a group containing a single VM provisioned
230230
# separately from the appliance. e.g
231231
# pulp_host ansible_host=<VM-ip-address>
232-
# Note the host name can't conflict with group names i.e can't be called `pulp` or `pulp_server`
232+
# Note the host name can't conflict with group names i.e can't be called `pulp_site` or `pulp_server`
233233

234234
[raid]
235235
# Add `builder` to configure image for software raid

environments/site/inventory/groups

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -192,7 +192,7 @@ compute
192192
# Host to deploy a Pulp server on and sync with mirrors of upstream Ark repositories. Should be a group containing a single VM provisioned
193193
# separately from the appliance. e.g
194194
# pulp_host ansible_host=<VM-ip-address>
195-
# Note inventory host name cannot conflict with group names i.e can't be called `pulp` or `pulp_server`.
195+
# Note inventory host name cannot conflict with group names i.e can't be called `pulp_site` or `pulp_server`.
196196

197197
[raid:children]
198198
# Add `builder` to configure image for software raid

0 commit comments

Comments
 (0)