CMSgov · keithdadkins · Mar 17, 2024
@@ -1,69 +1,102 @@
 # Reclaiming Disk Space in FHIRDB Aurora Clusters using pg_repack
 
-Over time our fhirdb Aurora clusters can become bloated with unused disk space, leading to performance issues and increased storage costs. `VACUUM FULL` requires exclusive locks on tables while it runs, so it is not something we run on a production db. Instead, we can use the `pg_repack` extension to reclaim this space without downtime and with minimal performance impact.
+Over time our databases can become bloated with unused billable storage space, leading to performance issues and wasteful spending. Unlike `VACUUM FULL`, which requires exclusive locks on tables while it runs, we can use the `pg_repack` extension to reclaim this space without downtime and minimal performance impact to the FHIR API.
 
-Note: There was some confusion about "dynamic resizing" of Aurora storage, especially with the very misleading news announcement from AWS. [Automatic resizing](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Managing.Performance.html?ref=timescale.com#Aurora.Managing.Performance.StorageScaling):
+## Overview
 
-> applies to operations that physically remove or resize tablespaces within the cluster volume. Thus, it applies to SQL statements such as DROP TABLE, DROP DATABASE, TRUNCATE TABLE, and ALTER TABLE ... DROP PARTITION. It doesn't apply to deleting rows using the DELETE statement.
+### How it works
 
-This means that while Aurora will automatically reclaim space when you delete a table, it will not automatically reclaim space when you delete (or update) rows. This is where `pg_repack` comes in.
+`pg_repack` essentially creates a new table with the same schema as the original table, and then copies all the rows from the original table to the new table. It then builds indexes on the new table and applies all changes which have accrued in the log table to the new table. Finally, it swaps the tables, including indexes and toast tables, using the system catalogs, and drops the original table. You can read more about the process in the [pg_repack documentation](https://reorg.github.io/pg_repack/1.4/).
 
-## How it works
+It should be noted that `pg_repack` does take ACCESS EXCLUSIVE locks, but only for very short durations (milliseconds) before and after processing a table. For the rest of its time, pg_repack only needs to hold an ACCESS SHARE lock on the original table, so INSERT's, UPDATE's, and DELETE's may proceed as usual. Only DDL commands that would require an ACCESS EXCLUSIVE lock on the original table will be blocked. So all schema changes and flyway migrations should be avoided while repack is running.
 
-Essentially, `pg_repack` works by creating a new table with the same structure as the original table, and then copying the data from the original table to the new table. Once the data is copied, the old table is dropped and the new table is renamed to the original table's name. This process is done in a way that is transparent to the application, so it can be run on a live database without downtime (sub-second locks on each table).
+### How often, when, and how long does it take?
 
+**How often** will depend on the amount of churn in a table. The more churn, the more bloat. The more bloat, the more often we should run `pg_repack` (and some tables may need to be repacked more often than others). Also, BFD's data tables are huge, so repacking is an IO intensive process that can a long time to run and can cause read latency issues. You should repack all tables only as minimally as needed to keep the database costs down, and target tables with lots of UPDATEs and DELETEs more frequently. The `repack_all.sh generate` command will build a list of tables sorted by size to a `repack_pick_list.txt` file, which you can then edit to remove tables you don't want to repack. The script will see this file and prompt you to use it before running.
 
-There is two components to `pg_repack`:
+At the time of this writing, we have only ran the repack_all.sh on all tables once, and it reclaimed several terrabytes of bloat. But without a good understanding of how long it took our tables to get so bloated initially, a good starting point would be to run it once a quarter and adjust as needed.
 
-- The `pg_repack` extension, which is installed on the database cluster
-- The `pg_repack` binary, which is used to run the repack process
+**When best to run or not run** is a bit more complicated. While the performance impacts are negligble they still exist. So it is _safest_ to run this when the database is not under heavy load and between CCW data loads. While this may be hard to predict, we can use the following as a general rule of thumb for all database maintenance activities:
 
-The latter must be built from source, and the version of the binary must match the version of the extension installed on the cluster. These instructions will guide you through the process of building the binary and running the repack process.
+ - Avoid overlapping a CCW data load
+ - Avoid running database migrations
+ - Keep an eye on the database metrics and look for signs of trouble
 
-## Notes
+Pay particular attention to the Aurora Replication Lag metric. If a reader node gets too far behind the writer node, Aurora may disconnect it and any requests in flight will fail.
 
-- `pg_repack` _does_ take an exclusive lock on each table as it prepares to process, but only for a very short period of time (in the milliseconds)
-- `pg_repack` _does_ takes an ACCESS SHARE lock on each table while it is processing, this is the least restrictive lock and will not block reads or writes, but it will block DDL operations on the table (no migrations, schema changes, etc)
-- `pg_repack` _does_ requires roughly double the space of the original table to run, as it creates a new table and copies the data to it before dropping the original table (including indexes)
-- `pg_repack` needs to be run using the database superuser credentials
+TODO: Update the repack_all.sh script to check for replication lag and other key metrics while running. Adding a sleep statement to allow the readers to catch up (similar to the [bulk_delete_aged_claims.sql](../pg-bulk-delete-aged-claims/bulk_delete_aged_claims.sql) script) would be a good start.
 
-## How long does it take?
+Either way, if you are running this on a prod cluster, you should be prepared to monitor the database and be ready to stop the process.
 
-This will depend on the size of the tables and amount of bloat, but expect repacking all tables in BFD and RDA to take many hours to complete. Some tables take 30 minutes, some take hours. To prevent running out of space or causing performance issues, it's best to run the process on a dedicated ec2 instance over a period of days.
+**How long** it takes to repack all the tables will depend on how large and bloated the tables are. For prod, running on 4x `db.r6i.12xlarge` instances, our largest tables took over 5 hours to repack each.
 
-## Process Overview
+To give you an idea for a baseline, the first time we ran repack on all tables live (March 2024), some tables took a few minutes and others took 8 hours. Expect 2-3 days to repack all tables in the fhirdb database.
 
-1. Install `pg_repack` extension on the database cluster and note the version that was installed
+## Installing and running
+
+There are two components to the `pg_repack` process:
+
+- The `pg_repack` database extension (installed on the database with `CREATE EXTENSION pg_repack;`)
+- And the `pg_repack` command line application, which an operator runs to repack the tables
+
+The former is dictated by AWS Aurora and what Engine version we are on. The latter must be built from source and the version must match the extension version installed.
+
+### Installing Overview
+
+The steps to install and run `pg_repack` are as follows:
+
+1. Install the `pg_repack` extension on the database cluster and note the version that AWS installed
 2. Setup a dedicated ec2 instance and install pg_repack requirements
-3. Download and build the `pg_repack` binary
-4. Run `pg_repack` on the desired cluster, either directly or via the provided script
+3. Download and build the `pg_repack` binary on the instance
+4. Configure and source your .env file
+4. Run the `repack_all.sh` script or the `pg_repack` binary to repack the tables
+5. Repeat for each database cluster as needed (tip: you can build it once and copy the pg_repack binary over to other instances, but you still need to install the requirements)
 
-## Install `pg_repack` extension
+TODO: if you find yourself running this often, consider creating a repack AMI or container image using docker, ansible, et al.
 
-Installing the extension is as simple as running `CREATE EXTENSION pg_repack;` on each database cluster we want to run repack on. This can be done using the `psql` command line tool or any other tool that can execute SQL commands on the database.
+### Install `pg_repack` database extension
+
+`pg_repack` is a supported extension in Aurora PostgreSQL, so we can install it using the following command:
 
 ```sql
 -- install the extension
 CREATE EXTENSION pg_repack;
+```
+
+We need to get the pg_repack extension version so we know which version of the binary to build. This is important because the extension and binary versions must match and we do not control which version of the extension AWS installs.
 
+To get `pg_repack` extension version, run the following query in the database cluster you want to repack:
+
+```sql
 -- get the version
 SELECT extversion FROM pg_extension where extname = 'pg_repack';
 ```
 
-We will use `1.4.7` in all examples going forward.
+Now we download and build the source for the matching binary.
 
-## Setup a dedicated ec2 instance and install requirements
+_Note: we will use version `1.4.7` for all examples below_
 
-The repack process can take a long time to run against all tables. How long will laregly depend on the size of the tables and how bloated they are. Some tables may only take a few minutes, while others may take hours.
+### Setup a dedicated ec2 instance and install requirements
 
-**Either way, you should not run this from your local workstation!** Instead, run it from a dedicated ec2 instance. The only requirement is the instance needs to be in the same VPC as the cluster. The easiest way to get this going is to simply deploy an ephemeral migrator. See [TODO](link-to-ephemeral-instructions) for instructions on how to do this.
+The repack process **should not be run from your workstation** as it takes a long time to run and requires superuser access to the database. The easiest method to run it is by using a migrator instance as a host, as they are ephemeral in nature anyway and we will terminate it when done. Ie,
 
+```sh
+cd ops/terraform # tfswitch or tfenv
+cd services/migrator
+terraform workspace select test # prod-sbx, prod
+terraform apply -var=create_migrator_instance=true -var=migrator_monitor_enabled_override=false
+...scp, ssh, repack, etc...
+terraform destroy -var=create_migrator_instance=false -var=migrator_monitor_enabled_override=true
+```
+### Install system requirements
 The following instructions assumes we are on a fresh migrator instance running Amazon Linux 2 or similar. Once the instance is up and running:
 
 1. Connect to the instance
 2. Disable the migrator service
 3. Install the required packages
 
+Note: there was a bit of trial and error getting things to build on AL2 and requirements may change over time, so use the following as a general guide but be prepared to figure things out as you go:
+
 ```bash
 sudo systemctl stop bfd-migrator
 sudo systemctl disable bfd-migrator
@@ -72,12 +105,14 @@ sudo yum install \
   screen curl unzip \
   postgresql-devel postgresql-server-devel postgresql-common postgresql-static \
   readline-devel openssl-devel lz4-devel zlib1g-devel libzstd-devel
+sudo amazon-linux-extras install postgresql14
 ```
 
-## Download and build the `pg_repack` binary
+### Download and build the `pg_repack` binary
 
-The `pg_repack` binary must be built from source, and the version of the binary must match the version of the extension installed on the cluster. The following instructions will guide you through the process of building the binary and running the repack process using the version of the extension we identified in step 1 (1.4.7 in this example).
+Note: The versions of the `pg_repack` extension and the `pg_repack` binary must match, and the requirements may change between versions, so use the following as a general guide but be prepared to figure things out as you go.
 
+Using the version of the installed extension noted above:
 
 1. Visit https://reorg.github.io/pg_repack
 2. At the top, select the desired version ([1.4.7](https://reorg.github.io/pg_repack/1.4/) in this example)
@@ -120,13 +155,15 @@ Then, source the file to set the environment variables:
 . .env
 ```
 
-You will be prompted for the suepruser password when you run the script if you do not provide it in the `.env` file.
+You will be prompted for the superuser password when you run the script if you do not provide it in the `.env` file.
 
 ### Running `pg_repack` on all tables using the provided script
 
-The following describes how to install and run the repack_all.sh script. This script will build a list of all tables sorted by size ascending, and then run `pg_repack` on each table in the list. We process smaller tables first to help ensure we have enough free space to handle processing our largest tables.
+The following describes how to install and run the repack_all.sh script. This script will build a list of all tables sorted by size ascending, and will repack each in the same order. We are processing smaller tables first to help ensure we have enough free space to handle our larger tables.
+
+Once a table is successfully repacked, it will be removed from the repack pick list and the script will continue to the next table. If a table fails to repack, the script will exit and the offending table will be the first in the list, which you should remove before running the script again. Everything is logged to a file called `repack.log`. The script will also log the time it took to repack each table, so you can get an idea of how long it will take to run.
 
-We run the script in a `screen` session to ensure it continues running if we disconnect from the instance.
+We run the script in a `screen` session to ensure it continues running if we disconnect from the instance. Do not run this from your workstation.
 
 1. Copy the repack_all.sh script to the instance
 2. Make the file executable with `chmod +x repack_all.sh`
@@ -135,6 +172,19 @@ We run the script in a `screen` session to ensure it continues running if we dis
 5. Monitor the script by tailing the repack.log file with `tail -f repack.log`
 6. Reattach to the screen session with `screen -r repack` to check on the progress
 
+
+Note:
+
+- If repack_all.sh fails to repack a table, you will need to remove the table from the pick list and run the script again.
+- If you stop or repack fails for any reason, you will need to clean up manually as pg_repack will leave behind temp tables, triggers, etc.
+
+You can do this by running `./repack_all.sh reset` or by executing the following sql (repack_all.sh will do this automatically after a successful run):
+
+```sql
+DROP EXTENSION pg_repack CASCADE;
+CREATE EXTENSION pg_repack;
+```
+
 ## References
 
 - https://jira.cms.gov/browse/BFD-3302

@@ -21,8 +21,15 @@ log() {
   echo "$1" | prepend_timestamp >> "$LOGFILE"
 }
 
+cleanup() {
+  echo "Cleaning up.." | prepend_timestamp | tee -a "$LOGFILE"
+  rm -f "$TO_REPACK" || true
+  reset_repack || echo "Failed to reset pg_repack extension.. please reset manually" | prepend_timestamp | tee -a "$LOGFILE"
+}
+
 error_exit() {
   echo "Error: $1" >&2 | prepend_timestamp | tee -a "$LOGFILE"
+  cleanup
   exit 1
 }
 
@@ -135,13 +142,14 @@ if [ -f "$TO_REPACK" ] ; then
     esac
   done
 else
-  # query the database for a list of schemas
+  # NOTE: the following repacking by schema bit is untested.. YMMV
+  # list schemas in the database
   if [ -z "$SCHEMAS" ]; then
     SCHEMAS=$(psql -t -A -c "SELECT schema_name FROM information_schema.schemata WHERE schema_name NOT IN ('pg_catalog', 'information_schema');")
   fi
-  # if no schemas are found, exit
   [ -n "$SCHEMAS" ] || error_exit "No schemas found in the database"
-  # prompt to pick a schema
+
+  # select a schema to repack
   echo "Select a schema to repack"
   select schema in $SCHEMAS; do
     SCHEMAS=$schema
@@ -150,24 +158,28 @@ else
   generate_repack_file "$SCHEMAS"
   load_tables_from_file
 fi
-[ -f "$TO_REPACK" ] || error_exit "No $TO_REPACK file found to process.. exiting"
+
+# short circuit if there are no tables to repack
+[ -n "$TABLES" ] || error_exit "No tables found to repack.. exiting"
 
 # prompt to continue
 echo "Press any key to continue or Ctrl+C to quit"
 read -r -n 1 -s
 
 echo "Repacking! Tables will be removed from the $TO_REPACK files after they are successfully repacked. Follow along by tailing the log file: tail -f $LOGFILE"
+
 for table in $TABLES; do
   start_time=$(date '+%Y-%m-%d %H:%M:%S')
 
-  # Run pg_repack and prepend timestamps, including capturing stderr
-  if pg_repack -e -h "$PGHOST" -U "$PGUSER" -t "$table" -k "$PGDATABASE" 2>&1 | prepend_timestamp > "$LOGFILE" 2>&1; then
+  # Run pg_repack and redirect stdout and stderr to the log file
+  if pg_repack -e -h "$PGHOST" -U "$PGUSER" -t "$table" -k "$PGDATABASE" 2>&1 | prepend_timestamp >> "$LOGFILE" 2>&1; then
     end_time=$(date '+%Y-%m-%d %H:%M:%S')
-    echo "${table} Start time: $start_time, End time: $end_time" | prepend_timestamp
-    # Remove the table from the TO_REPACK file
+    log "${table} Start time: $start_time, End time: $end_time"
+    # Remove the table from the TO_REPACK fil
     sed -i "/^$table|/d" "$TO_REPACK"
   else
-    error_exit "Failed to repack $table" | prepend_timestamp
+    error_exit "Failed to repack $table"
   fi
 done
-echo "Repack complete! All tables have been repacked."
+cleanup
+log "repack completed successfully"