Update example environment + small tweaks to docs (azimuth-cloud#124)

Matt Pryor · web-flow · commit da81c140f6b6 · 2024-03-19T21:50:08.000Z
* Update example environment + small tweaks to docs

* Replace local docs URL

* Address review comments
diff --git a/docs/configuration/03-kubernetes-config.md b/docs/configuration/03-kubernetes-config.md
@@ -179,18 +179,7 @@ is enabled, then Kubernetes clusters should be configured to use the OVN provide
 any load-balancers that are created:
 
 ```yaml  title="environments/my-site/inventory/group_vars/all/variables.yml"
-#### For the HA cluster ####
-
-# The provider for the API server load-balancer created by Cluster API
-capi_cluster_apiserver_loadbalancer_provider: ovn
-# The provider for load-balancers created for LoadBalancer services
-capi_cluster_addons_openstack_loadbalancer_provider: ovn
-
-#### For tenant clusters ####
-
-# Tenant API servers are load-balanced using Zenith
-# This variable applies to load-balancers created for LoadBalancer services
-azimuth_capi_operator_capi_helm_openstack_loadbalancer_provider: ovn
+openstack_loadbalancer_provider: ovn
 ```
 
 !!! tip
diff --git a/docs/configuration/04-target-cloud.md b/docs/configuration/04-target-cloud.md
@@ -6,7 +6,11 @@ for the target OpenStack cloud.
 Azimuth uses the
 [Keystone Service Catalog](https://docs.openstack.org/keystone/latest/contributor/service-catalog.html)
 to discover the endpoints for OpenStack services, so only needs to be told where to find the
-Keystone v3 endpoint:
+Keystone v3 endpoint.
+
+By default, the auth URL from the application credential used to deploy Azimuth will be used.
+If you want Azimuth to target a different OpenStack cloud than the one it is deployed in, this
+can be overridden:
 
 ```yaml  title="environments/my-site/inventory/group_vars/all/variables.yml"
 azimuth_openstack_auth_url: https://openstack.example-cloud.org:5000/v3
@@ -24,8 +28,8 @@ trustroots, TLS verification must be disabled:
 azimuth_openstack_verify_ssl: false
 ```
 
-If you use a domain other than `default`, you will also need to tell Azimuth the name of the
-domain to use when authenticating:
+If you are using the password authenticator and use a domain other than `default`,
+you will also need to tell Azimuth the name of the domain to use when authenticating:
 
 ```yaml  title="environments/my-site/inventory/group_vars/all/variables.yml"
 azimuth_openstack_domain: my-domain
@@ -129,25 +133,25 @@ azimuth_openstack_internal_net_cidr: 10.0.3.0/24
 
 ## Monitoring Cloud Capacity
 
-Azimuth is able to federate cloud metrics from a prometheus running within
-your cloud enviroment, such as the one deployed by:
-https://github.com/stackhpc/stackhpc-kayobe-config
+Azimuth is able to federate cloud metrics from a Prometheus running within
+your OpenStack cloud enviroment, such as the one deployed by
+[stackhpc-kayobe-config](https://github.com/stackhpc/stackhpc-kayobe-config).
+
+We also assume the [os-capacity exporter](https://github.com/stackhpc/os-capacity)
+is being used to query the current capacity of your cloud, mostly using data from
+OpenStack placement.
 
-Typically we also assume the following exporter is being used to
-query the current capacity of your cloud, mostly using data from
-OpenStack placement:
-https://github.com/stackhpc/os-capacity
+First you need to enable the project metrics and cloud metrics links within
+Azimuth by configuring:
 
-First you need to enable the project metrics and cloud metrics
-links within Azimuth by configuring:
-```yaml
+```yaml  title="environments/my-site/inventory/group_vars/all/variables.yml"
 # Defaults to no
 cloud_metrics_enabled: yes
 ```
 
-To make sure Azimuth knows how to access the prometheus running
-in your cloud, you need to configure:
-```yaml
+You then need to tell Azimuth how to access the OpenStack cloud Prometheus:
+
+```yaml  title="environments/my-site/inventory/group_vars/all/variables.yml"
 # hostname needed to match TLS certificate name
 cloud_metrics_prometheus_host: "mycloud.example.com"
 # ip that matches the above hostname
diff --git a/docs/configuration/10-kubernetes-clusters.md b/docs/configuration/10-kubernetes-clusters.md
@@ -170,6 +170,7 @@ The Harbor registry can be disabled entirely:
 
 ```yaml  title="environments/my-site/inventory/group_vars/all/variables.yml"
 harbor_enabled: no
+```
 
 ### Additional proxy caches
 
diff --git a/environments/base/inventory/group_vars/all.yml b/environments/base/inventory/group_vars/all.yml
@@ -172,6 +172,8 @@ infra_external_network_id: >-
     if __os_external_networks | length == 1
     else undef(hint = 'Unable to determine external network ID')
   }}
+capi_cluster_external_network_id: "{{ infra_external_network_id }}"
+azimuth_capi_operator_external_network_id: "{{ infra_external_network_id }}"
 
 # If there is only one load balancer provider, use it by default
 # Note that 'octavia' is excluded as it is an alias of 'amphora'
diff --git a/environments/example/inventory/group_vars/all/secrets.yml b/environments/example/inventory/group_vars/all/secrets.yml
@@ -2,22 +2,36 @@
 # This file contains environment-specific secrets for an Azimuth deployment
 #
 # It should be encrypted if stored in version control
+# https://stackhpc.github.io/azimuth-config/repository/secrets/
 #####
 
-# The password for the Harbor admin account
-harbor_admin_password: "<secure password>"
-# The secret key for Harbor
-harbor_secret_key: "<secure secret key>"
-# The admin password for the cloud metrics Grafana
-cloud_metrics_grafana_admin_password: "<secure password>"
-# The admin password for the Keycloak master realm
-keycloak_admin_password: "<secure password>"
+# https://stackhpc.github.io/azimuth-config/configuration/05-secret-key/
 # The secret key for signing Azimuth cookies
 azimuth_secret_key: "<secure secret key>"
+
+# https://stackhpc.github.io/azimuth-config/configuration/07-platform-identity/#keycloak-admin-password
+# The admin password for the Keycloak master realm
+keycloak_admin_password: "<secure password>"
+
+# https://stackhpc.github.io/azimuth-config/configuration/08-zenith/
 # The secret key for signing Zenith registrar tokens
 zenith_registrar_subdomain_token_signing_key: "<secure secret key>"
+
+# https://stackhpc.github.io/azimuth-config/configuration/10-kubernetes-clusters/#harbor-registry
+# The password for the Harbor admin account
+harbor_admin_password: "<secure password>"
+# The secret key for Harbor
+harbor_secret_key: "<secure secret key>"
+
+# https://stackhpc.github.io/azimuth-config/configuration/14-monitoring/#accessing-web-interfaces
 # The admin password for Azimuth administrative dashboards
 admin_dashboard_ingress_basic_auth_password: "<secure password>"
 
-# The Slack webhook URL for monitoring alerts (optional)
-# alertmanager_config_slack_webhook_url: https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX
+# https://stackhpc.github.io/azimuth-config/configuration/14-monitoring/#slack-alerts
+# The Slack webhook URL for monitoring alerts
+alertmanager_config_slack_webhook_url: https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX
+
+# https://stackhpc.github.io/azimuth-config/configuration/15-disaster-recovery/
+# The S3 access key and secret for backups
+velero_aws_access_key_id: "<access key id>"
+velero_aws_secret_access_key: "<secret key>"
diff --git a/environments/example/inventory/group_vars/all/variables.yml b/environments/example/inventory/group_vars/all/variables.yml
@@ -1,124 +1,103 @@
 #####
-# Configuration for the seed node (HA) or single node
+# This file, combined with secrets.yml, shows an example configuration for a
+# minimal, but still best-practice, Azimuth deployment on a "well-behaved" cloud
+#
+# https://stackhpc.github.io/azimuth-config/best-practice/
+#
+# It is recommended to read the "Configuration" section of the Azimuth Operator
+# Documentation in full to understand all the available options
+#
+# https://stackhpc.github.io/azimuth-config/configuration/
 #####
 
-# The ID of an existing network to create the node on
-infra_network_id: "<internal network id>"
-# OR
-# The CIDR of the subnet that should be created
-infra_network_cidr: 192.168.100.0/24
-# The ID of the external network to connect to via a router
-infra_external_network_id: "<external network id>"
+## Configuration for OpenTofu state
+## https://stackhpc.github.io/azimuth-config/repository/opentofu/
 
-# The fixed floating IP to associate with the machine
-# This IP must be pre-allocated to the project
-# For a single node deployment, this IP should have the wildcard ingress domain assigned to it
-infra_fixed_floatingip: "<pre-allocated floating ip>"
-# OR
-# The name of the floating IP pool to allocate a floating IP from
-infra_floatingip_pool: "<floating ip pool>"
-# OR
-# The ID of a provisioning network that will be used to access the seed node
-infra_provisioning_network_id: "<provisioning network id>"
-
-# The image id of an Ubuntu 20.04 image to use for the node
-#   N.B. This is populated automatically using community images by default
-# infra_image_id: "<image id>"
+# The Terraform backend type to use (HTTP and S3 supported)
+terraform_backend_type: "<http or s3>"
+
+# The backend configuration (depends on the selected backend type)
+terraform_backend_config: {}
+
+
+## Configuration for the seed node (HA) or single node deployment
+## https://stackhpc.github.io/azimuth-config/configuration/02-deployment-method/
+
+# The ID of the external network to use
+# This network must provide _egress_ to the internet
+# https://stackhpc.github.io/azimuth-config/configuration/01-prerequisites/#networking
+infra_external_network_id: "<network id>"
 
 # The id of the flavor to use for the node
 # For a seed node for an HA cluster, 8GB RAM is fine (maybe even 4GB)
 # For a single node deployment, >= 16GB RAM is recommended
 infra_flavor_id: "<flavor id>"
 
-# The size in GB for the data volume
-# This will hold all cluster data, including Kubernetes resources, and also PVC data
+# The size of the volume to use for K3S cluster data
 infra_data_volume_size: 100
 
-#####
-# Configuration for the HA cluster
-#####
+# SINGLE NODE DEPLOYMENT ONLY
+# The fixed floating IP to associate with the machine
+# Must be pre-allocated to the project and have the wildcard ingress domain assigned to it
+# infra_fixed_floatingip: "<pre-allocated floating ip>"
+
+
+## Configuration for the HA cluster
+## https://stackhpc.github.io/azimuth-config/configuration/02-deployment-method/
+## https://stackhpc.github.io/azimuth-config/configuration/03-kubernetes-config/
 
-# The Kubernetes version that will be used for the HA cluster
-#   N.B. This is populated automatically using community images by default
-# capi_cluster_kubernetes_version: 1.23.8
-# The ID of the image that will be used for the nodes of the HA cluster
-#   N.B. This is populated automatically using community images by default
-# capi_cluster_machine_image_id: "<image id>"
 # The name of the flavor to use for control plane nodes
+# A flavor with at least 2 CPUs, 8GB RAM and 100GB root disk is recommended
 capi_cluster_control_plane_flavor: "<flavor name>"
+
 # The name of the flavor to use for worker nodes
+# A flavor with at least 4 CPUs, 16GB RAM and 100GB root disk is recommended
 capi_cluster_worker_flavor: "<flavor name>"
+
 # The number of worker nodes
 capi_cluster_worker_count: 3
-# The fixed floating IP to associate with the load balancer for the ingress controller
-# This IP must be pre-allocated to the project and should have the wildcard ingress domain assigned to it
-capi_cluster_addons_ingress_load_balancer_ip: "<pre-allocated floating ip>"
 
-#####
-# Ingress configuration
-#####
-# The base domain to use for ingress resources
-ingress_base_domain: "<base domain>"
-
-# Indicates if cert-manager should be enabled
-# Currently, TLS is enabled for ingress iff cert-manager is enabled
-certmanager_enabled: yes
+# The floating IP to which to wildcard DNS entry has been assigned
+capi_cluster_addons_ingress_load_balancer_ip: "<pre-allocated floating ip>"
 
-# Indicates if Harbor should be enabled to provide pull-through caches
-harbor_enabled: no
 
-#####
-# Azimuth configuration
-#####
-# Indicates if the Zenith app proxy should be enabled
-azimuth_apps_enabled: yes
-# Indicates if Kubernetes support should be enabled
-azimuth_kubernetes_enabled: yes
-# Indicates if Cluster-as-a-Service (CaaS) should be enabled
-azimuth_clusters_enabled: yes
+## Target cloud configuration
+## https://stackhpc.github.io/azimuth-config/configuration/04-target-cloud/
 
 # The name of the current cloud
 azimuth_current_cloud_name: example
+
 # The label for the current cloud
 azimuth_current_cloud_label: Example
-# The auth URL for the target OpenStack cloud
-azimuth_openstack_auth_url: https://cloud.example.com:5000/v3
 
-#####
-# Configuration of authenticators / authentication methods
-#####
-# Whether the password authenticator should be enabled (enabled by default)
-azimuth_authenticator_password_enabled: true
-# The label for the password authenticator
-azimuth_authenticator_password_label: "Username + Password"
-
-# Whether the appcred authenticator should be enabled (not enabled by default)
-azimuth_authenticator_appcred_enabled: false
-# The label for the appcred authenticator
-azimuth_authenticator_appcred_label: "Application Credential"
-
-# Whether the federated authenticator should be enabled (not enabled by default)
-azimuth_authenticator_federated_enabled: false
-# The label for the federated authenticator
-azimuth_authenticator_federated_label: "Federated"
-# The provider for the federated authenticator
-# This should correspond to the Keystone federation URL, e.g. <auth url>/auth/OS-FEDERATION/websso/<provider>
-azimuth_authenticator_federated_provider: oidc
 
-#####
-# Configuration for CaaS appliances
-#####
-# If CaaS is enabled and the StackHPC Slurm appliance is enabled (the default), this
-# is the id of a Rocky 8 image that will be used for Slurm clusters
-#   N.B. This is populated automatically using community images by default
-# azimuth_caas_stackhpc_slurm_appliance_image: "<image id>"
-
-# The ID of the desktop or webconsole image to use for the workstation appliance
-# See https://object.arcus.openstack.hpc.cam.ac.uk/swift/v1/AUTH_f0dc9cb312144d0aa44037c9149d2513/azimuth-images-prerelease/
-#   N.B. This is populated automatically using community images by default
-# azimuth_caas_stackhpc_workstation_image: "<image id>"
-
-# The ID of the repo2docker image to use for the repo2docker appliance
-# See https://object.arcus.openstack.hpc.cam.ac.uk/swift/v1/AUTH_f0dc9cb312144d0aa44037c9149d2513/azimuth-images-prerelease/
-#   N.B. This is populated automatically using community images by default
-# azimuth_caas_stackhpc_repo2docker_image: "<image id>"
+## Ingress configuration
+## https://stackhpc.github.io/azimuth-config/configuration/06-ingress/
+
+# The base domain to use for ingress resources
+ingress_base_domain: "<base domain>"
+
+
+## Persistence and retention for monitoring (HA only)
+## https://stackhpc.github.io/azimuth-config/configuration/14-monitoring/#persistence-and-retention
+
+# Prometheus retention and volume size
+capi_cluster_addons_monitoring_prometheus_retention: 90d
+capi_cluster_addons_monitoring_prometheus_volume_size: 50Gi
+
+# Loki retention and volume size
+capi_cluster_addons_monitoring_loki_retention: 744h
+capi_cluster_addons_monitoring_loki_volume_size: 50Gi
+
+
+## Disaster recovery
+## https://stackhpc.github.io/azimuth-config/configuration/15-disaster-recovery/
+
+# Enable Velero for backup
+velero_enabled: true
+
+# The URL of the S3 endpoint to use for backups
+velero_s3_url: "<endpoint URL>"
+
+# The name of the S3 bucket to use for backups (must already exist)
+velero_bucket_name: "<bucket name>"