[DPE-9356] Update PostgreSQL mount paths to /var/lib/pg with 16/main subdirectory#1186
[DPE-9356] Update PostgreSQL mount paths to /var/lib/pg with 16/main subdirectory#1186marceloneppel merged 40 commits into16/edgefrom
Conversation
Migrate storage mount paths from /var/lib/postgresql/* to /var/lib/pg/* and configure PostgreSQL 16 data directories to use the 16/main subfolder structure. Path changes: - Archive: /var/lib/postgresql/archive → /var/lib/pg/archive - Data: /var/lib/postgresql/data/pgdata → /var/lib/pg/data/16/main - Logs: /var/lib/postgresql/logs → /var/lib/pg/logs/16/main (WAL) - Temp: /var/lib/postgresql/temp → /var/lib/pg/temp/16/main Key changes: - Update metadata.yaml storage mount locations - Update POSTGRESQL_DATA_PATH constant - Configure pgdata_path, waldir, and temp tablespace paths - Add make_parents=True to Pebble make_dir for nested directory creation - Update pgbackrest.conf.j2 with new data path - Update patroni.yml.j2 with new data_dir and waldir - Update async_replication.py cleanup paths - Update authorisation_rules_observer.py Patroni config path Note: Archive path (/var/lib/pg/archive) intentionally does not use the 16/main subfolder as it is not used in Patroni or PostgreSQL configuration. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Codecov Report❌ Patch coverage is
❌ Your project status has failed because the head coverage (68.55%) is below the target coverage (70.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## 16/edge #1186 +/- ##
===========================================
- Coverage 68.96% 68.55% -0.42%
===========================================
Files 16 16
Lines 3809 3864 +55
Branches 574 588 +14
===========================================
+ Hits 2627 2649 +22
- Misses 980 1008 +28
- Partials 202 207 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…ix_pod Patch the OCI image health-check script (/scripts/self-signed-checker.py) to use the charm's storage path for the peer CA file instead of the hardcoded /var/lib/postgresql/data/peer_ca.pem. Make the operation idempotent and non-blocking, and call it from pebble-ready and early pod-fix flows to avoid race conditions with Pebble layer installation. Add _fix_health_check_script(container) to read, replace, and write the script. Log and handle pull/push errors; on push failure attempt an in-container sed fallback (exec ["sh","-c","sed -i 's|OLD|NEW|g' ..."]). Invoke the fixer from _on_postgresql_pebble_ready (after _create_pgdata) and from _fix_pod (early) so the script is corrected as early as possible. Update unit tests to cover the patching behaviour and the exec fallback. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
- Replace broad try/except/pass patterns with contextlib.suppress where appropriate (fixes SIM105 and S110 warnings). - Ensure the _fix_health_check_script fallback exec wait is wrapped with contextlib.suppress(Exception) to avoid swallowing exceptions without intent. - Use contextlib.suppress to defensively guard _fix_pod health-check fixes while still logging inner failures. - Re-order and tidy imports to satisfy ruff import ordering rules. - Apply ruff format across src/, tests/, and scripts/ to normalize formatting. Notes: - These are non-functional changes (linting / formatting); behavior unchanged. - Formatting touched multiple test and helper files; no logic was altered. - Tests were not executed as part of this change; please run the unit/integration suites locally (tox -e unit / tox -e integration) before pushing if desired. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
- Apply ruff/auto-formatting to integration test files to satisfy lint/format checks: - tests/integration/ha_tests/conftest.py - tests/integration/ha_tests/helpers.py - tests/integration/test_invalid_database.py - tests/integration/test_invalid_extra_user_roles.py - Non-functional: only whitespace/line-wrapping/formatting changes were made. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Temporarily disable the upgrade-related integration tests by constraining their spread task configurations so they will not be scheduled in CI. Changes: - Added a `systems:` block to the following spread task files to prevent these tasks from running across the broader CI fleet: - tests/spread/test_upgrade.py/task.yaml - tests/spread/test_upgrade_skip_pre_upgrade_check.py/task.yaml - tests/spread/test_upgrade_from_stable.py/task.yaml - tests/spread/test_async_replication_upgrade.py/task.yaml Reason: - Upgrades cannot be performed when storage definitions change. These tests fail in situations where storage definitions are modified, so they are being disabled until the upgrade path for storage changes is resolved. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
- Add PGDATA_PATH = f"{STORAGE_PATH}/16/main"
- Update tests/integration/test_storage.py and tests/integration/test_charm.py to use PGDATA_PATH
- Fix import ordering and lint issues
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
* Update canonical/data-platform-workflows action to v41 (#1211) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * feat(storage): add symlink from default PostgreSQL data path Create a symlink from /var/lib/postgresql/16/main to /var/lib/pg/data/16/main to provide compatibility with the standard PostgreSQL data directory location. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com> --------- Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…ql-k8s-operator into change-storage-mounts Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Update POSTGRESQL_DATA_PATH and pgdata_path to use the symlink path (/var/lib/postgresql/16/main) instead of the actual storage path (/var/lib/pg/data/16/main) for Patroni, pgbackrest, and tests. This ensures all components reference the canonical symlink path that PostgreSQL and Patroni expect. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Non-leader units in a standby cluster now clear pgdata only after the standby leader is confirmed running, preventing system ID mismatch errors during pg_basebackup. Also fixes disk usage check to use actual pgdata path and handles Patroni API unavailability gracefully during slot sync. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…erations Update pgbackrest config template to use the proper POSTGRESQL_DATA_PATH constant and fix _empty_data_files to remove the actual directory contents rather than just the symlink, ensuring backup and restore operations work correctly. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
- Update data directory paths to match current structure (/var/lib/pg/data/16/main, etc.) - Replace rm -rf with find -delete to preserve mount point directories Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
… compatibility Clear archive, data, logs, and temp directories during backup restore preparation. This ensures pg_basebackup can use --waldir and --tablespace options when new replicas join after a restore, as these options require empty directories. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Remove the logic that cleared logs and temp directories when a non-leader unit rejoined with an empty data directory. This cleanup was intended for pg_basebackup during restore operations but is no longer needed. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Clear WAL and tablespace directories when a replica joins an initialized cluster, as PersistentVolumes may retain data from previous pods and pg_basebackup requires empty target directories. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…n standby When a new unit joins an existing standby cluster, check if the database is already running before clearing pgdata. Previously, _clear_pgdata() was called unconditionally before the member_started check, which could delete the data directory while PostgreSQL was running - causing it to crash and leaving the unit stuck in "awaiting for member to start". Now the order is: 1. If member is already started, return early (unit is working) 2. If member is not started, clear pgdata if needed and continue setup Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
The `_fix_health_check_script` method was a workaround for an incorrect hardcoded certificate path in the OCI image. This is no longer needed as the upstream image has been fixed. Update the OCI image reference accordingly. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…s are cleared on retry The OCI image ships /var/lib/postgresql/16/main as a real directory, which prevents the symlink from being created on replicas. Always recreate the symlink (removing the real directory first if needed) instead of skipping when the path already exists. Also change stale storage directory cleanup to repeat until pgdata is fully populated (PG_VERSION exists), rather than running only once per unit lifecycle. This ensures retries after a failed pg_basebackup find clean directories. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
There was a problem hiding this comment.
Exceptionally well done! Nothing found after incentive testing. 🥳
One question: PR description says /var/lib/pg/archive/16/main
but the deployed system has no 16/main for archive:
root@postgresql-k8s-3:/var/lib/postgresql# ls -la /var/lib/pg/archive/
total 24
drwxr-xr-x 3 postgres postgres 4096 Mar 2 23:31 .
drwxr-xr-x 6 root root 4096 Mar 2 23:32 ..
drwx------ 2 root root 16384 Mar 2 23:31 lost+found
Outdated PR description? @marceloneppel
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
7f50e73
Replace CHARM_BASE references with CHARM_BASE_NOBLE in HA self-healing and new relations integration tests Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
16b0af6
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
|
We can merge either this PR or #1325 first, then rebase, because they touch similar places regarding the base fix for the test app.
About that, I synced with @taurus-forever to create a follow-up PR to fix that. |
Issue
The database uses the mount points provided by Juju as the data directories. This fails the database bootstrap process when there is some content in those mount points, like
lost+found.Also, the storage mount paths use
/var/lib/postgresql/*, which doesn't follow the16/mainsubdirectory convention expected by PostgreSQL tooling.Solution
Following the spec DA230, migrate storage mount paths from
/var/lib/postgresql/*to/var/lib/pg/*with a16/mainsubdirectory structure, and create a compatibility symlink from the standard PostgreSQL data path.Path changes
/var/lib/postgresql/archive/var/lib/pg/archive/16/main/var/lib/postgresql/data/pgdata/var/lib/pg/data/16/main/var/lib/postgresql/logs/var/lib/pg/logs/16/main/pg_wal/var/lib/postgresql/temp/var/lib/pg/temp/16/main/pgsql_tmpKey changes
metadata.yaml): Update all four storage mount locations to/var/lib/pg/*constants.py): Extract storage paths intoARCHIVE_STORAGE_PATH,DATA_STORAGE_PATH,LOGS_STORAGE_PATH,TEMP_STORAGE_PATHand use them consistently across the codebase/var/lib/postgresql/16/main→/var/lib/pg/data/16/mainso PostgreSQL and Patroni reference the canonical path; always recreate the symlink (removing any real directory the OCI image ships first)patroni.yml.j2andpgbackrest.conf.j2with newdata_dir,waldir, and temp tablespace pathsbackups.py): Clear all storage directories during restore preparation sopg_basebackup --waldir/--tablespaceoptions work on replicas; use actual pgdata path (not symlink) for file operationsasync_replication.py): Prevent system ID mismatch by only clearing pgdata on non-leaders after the standby leader is confirmed running; checkmember_startedbefore clearing pgdata on standby units; fix disk usage check to use the actual pgdata pathcharm.py): Defer replica startup until its endpoint is present in the members list, preventingpg_basebackuprejections; clear stale WAL/temp directories repeatedly until pgdata is fully populated (PG_VERSIONexists)test_symlinks.pyto verify the data directory symlink works correctlyChecklist
Fixes: #1009