Skip to content

Commit de48073

Browse files
committed
wip isolated docs
1 parent 8399793 commit de48073

File tree

1 file changed

+130
-35
lines changed

1 file changed

+130
-35
lines changed

docs/experimental/isolated-clusters.md

Lines changed: 130 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,114 @@
11
# Isolated Clusters
22

3-
This document explains how to create clusters which do not have outbound internet
4-
access by default.
3+
By default, the appliance assumes and requires that there is outbound internet
4+
access, possibly via a [proxy](../../ansible/roles/proxy/) . However it is
5+
possible to create clusters in more restrictive environments, with some
6+
limitations on functionality.
7+
8+
## No outbound internet
9+
10+
A cluster can be deployed using the upstream image (or one derived from it) without any outbound internet at all.
11+
12+
At present, this supports all roles/groups enabled:
13+
- Directly in the `common` environment
14+
- In the `environments/$ENV/inventory/groups` file created by cookiecutter for
15+
a new environment (from the "everything template").
16+
17+
plus some additional roles/groups not enabled by default listed below.
18+
19+
However the following default features are not available:
20+
21+
1. Configuration of the default Juptyer Notebook server app for Open Ondemand
22+
is not currently supported and must be disabled:
23+
24+
```yaml
25+
# environments/site/inventory/group_vars/all/openondemand.yml:
26+
ood_install_apps: {}
27+
```
28+
29+
2. The `hpl` test from the `ansible/adhoc/hpctests.yml` playbook is not
30+
functional and must be skipped using:
31+
32+
```shell
33+
ansible-playbook ansible/adhoc/hpctests.yml --skip-tags hpl-solo
34+
```
35+
36+
The full list of supported roles/groups is below, with those marked "*" from
37+
the common environment or "everything template":
38+
- alertmanager *
39+
- ansible_init *
40+
- basic_users *
41+
- cacerts
42+
- chrony
43+
- eessi *
44+
- etc_hosts *
45+
- filebeat *
46+
- grafana *
47+
- mysql *
48+
- nfs *
49+
- node_exporter *
50+
- openhpc *
51+
- opensearch *
52+
- podman *
53+
- prometheus *
54+
- proxy
55+
- rebuild
56+
- selinux **
57+
- slurm_exporter *
58+
- slurm_stats *
59+
- systemd **
60+
- tuned
61+
- fail2ban *
62+
- firewalld *
63+
- hpctests *
64+
- openondemand *
65+
- persist_hostkeys *
66+
- compute_init
67+
- nhc *
68+
- openondemand_desktop *
69+
70+
Note that for this to work, all dnf repositories are disabled at the end of
71+
image builds, so that `ansible.builtin.dnf` tasks work when running against
72+
packages already installed in the image.
73+
74+
## Outbound internet via proxy not available to cluster users
75+
If additional functionality is required it is possible configure Ansible to use
76+
an authenticated http/https proxy (e.g. [squid](https://www.squid-cache.org/)).
77+
The proxy credentials are not written to the cluster nodes so the proxy cannot
78+
be used by cluster users.
79+
80+
To do this the proxy variables required in the remote environment must be
81+
defined for the Ansible variable `appliances_remote_environment_vars`. Note
82+
some default proxy variables are provided in `environments/common/inventory/group_vars/all/proxy.yml` so generally it will be sufficient set the proxy user, password and address and to add these to the remote environment:
83+
84+
```yaml
85+
# environments/site/inventory/group_vars/all/proxy.yml:
86+
proxy_basic_user: my_squid_user
87+
proxy_basic_password: "{{ vault_proxy_basic_password }}"
88+
proxy_http_address: squid.mysite.org
89+
90+
# environments/site/inventory/group_vars/all/vault_proxy.yml:
91+
# NB: ansible vault-encrypt this file
92+
vault_proxy_basic_password: super-secret-password
93+
94+
# environments/site/inventory/group_vars/all/default.yml:
95+
appliances_remote_environment_vars:
96+
http_proxy: "{{ proxy_http_proxy }}"
97+
https_proxy: "{{ proxy_http_proxy }}"
98+
```
99+
100+
TODO: Do we need to set `no_proxy`??
101+
102+
This uses Ansible's [remote environment support](https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_environment.html). Currrently this is suported for the following roles/groups:
103+
- eessi: TODO: is this right though??
104+
- manila
105+
106+
107+
Although EESSI will install with the above configuration, as there is no
108+
outbound internet access except for Ansible tasks, making it functional will
109+
require [configuring a proxy for CVMFS](https://multixscale.github.io/cvmfs-tutorial-hpc-best-practices/access/proxy/#client-system-configuration).
5110

6-
The approach is to:
7-
- Create a squid proxy with basic authentication and add a user.
8-
- Configure the appliance to set proxy environment variables via Ansible's
9-
[remote environment support](https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_environment.html).
10111

11-
This means that proxy environment variables are not present on the hosts at all
12-
and are only injected when running Ansible, meaning the basic authentication
13-
credentials are not exposed to cluster users.
14112

15113
## Deploying Squid using the appliance
16114
If an external squid is not available, one can be deployed by the cluster on a
@@ -112,22 +210,17 @@ if the calculated parameters are not appropriate.
112210

113211
## Image build
114212

115-
TODO: probably not currently functional!
213+
TODO: describe proxy setup for that
116214

117215
## EESSI
118216

119-
Although EESSI will install with the above configuration, as there is no
120-
outbound internet access except for Ansible tasks, making it functional will
121-
require [configuring a proxy for CVMFS](https://multixscale.github.io/cvmfs-tutorial-hpc-best-practices/access/proxy/#client-system-configuration).
122217

123-
## Isolation Using Security Group Rules
218+
## Network considerations
124219

125-
The below shows the security groups/rules (as displayed by Horizon ) which can
126-
be used to "isolate" a cluster when using a network which has a subnet gateway
127-
provided by a router to an external network. It therefore also indicates what
128-
access is required for a different networking configuration.
220+
Note that even when outbound internet access is not required, the following
221+
(shown as OpenStack security groups/rules as displayed by Horizon) outbound access from nodes is still required to enable deployment
129222

130-
Security group `isolated`:
223+
Assuming nodes have a security group `isolated` applied:
131224

132225
# allow outbound DNS
133226
ALLOW IPv4 53/tcp to 0.0.0.0/0
@@ -141,24 +234,26 @@ Security group `isolated`:
141234
ALLOW IPv4 80/tcp to 169.254.169.254/32
142235

143236
# allow hosts to reach squid proxy:
144-
ALLOW IPv4 3128/tcp to 10.179.2.123/32
237+
ALLOW IPv4 3128/tcp to <squid cidr>
238+
239+
Note that DNS is required (and is configured by OpenStack if the subnet has
240+
a gateway) because name resolution happens on the hosts, not on the proxy.
145241

146-
Security group `isolated-ssh-https` allows inbound ssh and https (for OpenOndemand):
242+
For nodes running OpenOndemand, inbound ssh and https are also required:
147243

148244
ALLOW IPv4 443/tcp from 0.0.0.0/0
149245
ALLOW IPv4 22/tcp from 0.0.0.0/0
150246

151-
152-
Then OpenTofu is configured as:
153-
154-
155-
login_security_groups = [
156-
"isolated", # allow all in-cluster services
157-
"isolated-ssh-https", # access via ssh and ondemand
158-
]
159-
nonlogin_security_groups = [
160-
"isolated"
161-
]
162-
163-
Note that DNS is required (and is configured by the cloud when the subnet has
164-
a gateway) because name resolution happens on the hosts, not on the proxy.
247+
Note the OpenTofu variables `login_security_groups` and
248+
`nonlogin_security_groups` can be used to set security groups if requried:
249+
250+
```terraform
251+
# environments/site/tofu/cluster.auto.tfvars:
252+
login_security_groups = [
253+
"isolated", # allow all in-cluster services
254+
"isolated-ssh-https", # access via ssh and ondemand
255+
]
256+
nonlogin_security_groups = [
257+
"isolated"
258+
]
259+
```

0 commit comments

Comments
 (0)