(PE-40691) docuemnt automated steps for failed postgres and adding compilers

davidmalloncares · davidmalloncares · commit f6752e841e08 · 2025-02-27T14:09:11.000Z
diff --git a/documentation/recovery/automated_recovery.md b/documentation/recovery/automated_recovery.md
@@ -28,7 +28,7 @@ This procedure uses the following placeholder references.
 
 ## Replace failed PE-PostgreSQL server (A or B side)
 
-The procedure for replacing a failed PE-PostgreSQL server is the same regardless of which PE-PostgreSQL server is missing or if the name of the PE-PostgrSQL server is the same or different. This procedure uses the following placeholder references.
+The procedure for replacing a failed PE-PostgreSQL server is the same regardless of which PE-PostgreSQL server is missing or if the name of the PE-PostgreSQL server is the same or different. This procedure uses the following placeholder references.
 
 * _\<replacement-postgres-server-fqdn\>_ - The FQDN and certname of the new server being brought in to replace the failed PE-PostgreSQL server
 * _\<working-postgres-server-fqdn\>_ - The FQDN and certname of the still-working PE-PostgreSQL server
@@ -38,25 +38,14 @@ The procedure for replacing a failed PE-PostgreSQL server is the same regardless
 
 Procedure:
 
-1. Stop `puppet.service` on Puppet server primary and replica
+1. Run the `peadm::replace_failed_postgresql` plan to replace the failed PE-PostgreSQL server:
 
-        bolt task run service name=puppet.service action=stop --targets <primary-server-fqdn>,<replica-server-fqdn>
-
-2. Temporarily set both primary and replica server nodes so that they use the remaining healthy PE-PostgreSQL server
-
-        bolt plan run peadm::util::update_db_setting --target <primary-server-fqdn>,<replica-server-fqdn> postgresql_host=<working-postgres-server-fqdn> override=true
-
-3. Restart `pe-puppetdb.service` on Puppet server primary and replica
-
-        bolt task run service name=pe-puppetdb.service action=restart --targets <primary-server-fqdn>,<replica-server-fqdn>
-
-4. Purge failed PE-PostgreSQL node from PuppetDB
-
-        bolt command run "/opt/puppetlabs/bin/puppet node purge <failed-postgres-server-fqdn>" --targets <primary-server-fqdn>
-
-5. Run `peadm::add_database` plan to deploy replacement PE-PostgreSQL server
-
-        bolt plan run peadm::add_database -t <replacement-postgres-server-fqdn> primary_host=<primary-server-fqdn>
+        bolt plan run peadm::replace_failed_postgresql \
+                primary_host=<primary-server-fqdn> \
+                replica_host=<replica-server-fqdn> \
+                working_postgresql_host=<working-postgres-server-fqdn> \
+                failed_postgresql_host=<failed-postgres-server-fqdn> \
+                replacement_postgresql_host=<replacement-postgres-server-fqdn>
 
 ## Replace failed replica puppet server AND failed replica pe-postgresql server
 
@@ -71,3 +60,36 @@ This procedure uses the following placeholder references.
 
 2. [Replace failed PE-PostgreSQL server (A or B side)](#replace-failed-pe-postgresql-server-a-or-b-side)
 3. [Replace missing or failed replica Puppet server](#replace-missing-or-failed-replica-puppet-server)
+
+## Add or replace compilers
+
+This procedure uses the following placeholder references.
+
+* _\<avail-group-letter\>_ - Either A or B; whichever of the two letter designations the compiler is being assigned to
+* _\<compiler-hosts\>_ - A comma-separated list of FQDN and certname of the new compiler(s)
+* _\<dns-alt-names\>_ - A comma-separated list of DNS alt names for the compiler
+* _\<primary-server-fqdn\>_ - The FQDN and certname of the primary Puppet server
+* _\<postgresql-server-fqdn\>_ - The FQDN and certname of the PE-PostgreSQL server with availability group _\<avail-group-letter\>_
+
+Procedure:
+
+1. Run the `peadm::add_compilers` plan to add the compilers:
+
+        bolt plan run peadm::add_compilers \
+                primary_host=<primary-server-fqdn> \
+                compiler_hosts=<compiler-hosts> \
+                avail_group_letter=<avail-group-letter> \
+                dns_alt_names=<dns-alt-names> \
+                primary_postgresql_host=<postgresql-server-fqdn>
+
+Please note, the optional parameters and values of the plan are as follows.
+
+<!-- table -->
+
+| Parameter                 | Default value | Description                                                                                                                    |
+| ------------------------- | ------------- | ------------------------------------------------------------------------------------------------------------------------------ |
+| `avail_group_letter`      | `A`           | By default, each compiler will be added to the primary group A.                                                                |
+| `dns_alt_names`           | `undef`       |                                                                                                                                |
+| `primary_postgresql_host` | `undef`       | By default, this will pre-populate to the required value depending if your architecture contains HA and or external databases. |
+
+For more information around adding compilers to your infrastructure [Expanding Your Deployment](expanding.md#adding-compilers-with-peadmadd_compiler)
diff --git a/documentation/recovery/recovery.md b/documentation/recovery/recovery.md
@@ -4,6 +4,8 @@ These instructions all assume that the failed server is destroyed, and being rep
 
 The new system needs to be provisioned with the same certificate name as the system it is replacing.
 
+Automated procedures are documented in [automated_recovery.md](automated_recovery.md)
+
 ## Recover from failed primary Puppet server
 
 1. Promote the replica ([official docs](https://puppet.com/docs/pe/2019.8/dr_configure.html#dr-promote-replica))