Skip to content

Commit c6df39e

Browse files
authored
Merge pull request #769 from EnterpriseDB/DOCS-3266
PSR to PGD migration guide
2 parents ef59c39 + aac52aa commit c6df39e

File tree

14 files changed

+865
-11
lines changed

14 files changed

+865
-11
lines changed

advocacy_docs/supported-open-source/warehousepg/warehousepg/admin_guide/index.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,4 +30,4 @@ Information about configuring, managing and monitoring WarehousePG installations
3030
- **[Loading and Unloading Data](load/topics)**
3131
The topics in this section describe methods for loading and writing data into and out of a WarehousePG, and how to format data files.
3232
- **[Managing Performance](performance.mdx)**
33-
The topics in this section cover WarehousePG performance management, including how to monitor performance and how to configure workloads to prioritize resource utilization.
33+
The topics in this section cover WarehousePG performance management, including how to monitor performance and how to configure workloads to prioritize resource utilization.

product_docs/docs/pgd/6.2/index.mdx

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,6 @@ navigation:
1414
- overview
1515
- planning
1616
- lifecycle
17-
- deployments
18-
- installing
19-
- connections
20-
- production-best-practices
21-
- terminology
2217
- "#Configuration and Management"
2318
- nodes
2419
- node_management
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
---
2+
title: Migrating to Postgres Distributed (PGD)
3+
navTitle: Migrating to PGD
4+
description: Migrate from PSR to EDB Postgres Distributed.
5+
navigation:
6+
- migration-psr
7+
---
8+
9+
Moving from a traditional architecture to EDB Postgres Distributed (PGD) allows your organization to achieve high availability, geographically distributed workloads, and multi-master write capabilities.
10+
11+
The following guide details the tested and supported pathways for transitioning your existing infrastructure into a highly available PGD cluster.
12+
13+
- [Migrate from Postgres Physical Streaming Replication (PSR) to PGD:](/pgd/latest/lifecycle/migrating/migration-psr/) Transition a standard primary-standby architecture to a PGD cluster using a seed-node approach to minimize downtime during data transfer and version upgrades.
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
title: Preparing your environment
3+
navTitle: Preparing your environment
4+
description: Preparing your environment to perform a migration from PSR to PGD.
5+
---
6+
7+
Before beginning the migration from Physical Streaming Replication (PSR) to Postgres Distributed (PGD), you must identify the key roles of your nodes and establish your target topology. This ensures the environment can handle the temporary overhead of the migration and provides a clear path for the cluster transition.
8+
9+
## Choosing a source node
10+
11+
You must choose a source node from your existing cluster to provide the initial data and subsequent updates for the PGD cluster. Because logical replication cannot be cascaded from a standby node, only the current primary node is a viable source.
12+
13+
Consider the following:
14+
- **Resource overhead:** During migration, the source node node must perform additional logical decoding for at least two PGD nodes, requiring extra CPU capacity.
15+
- **Storage:** The source node will need to retain additional WAL data to keep replicas in sync, necessitating more disk space than standard operations.
16+
- **Optimization:** If the current primary is underpowered, consider a switchover (promotion) to a more favorable node before starting the migration.
17+
- **Failover handling:** If the primary node fails during the migration, you must abort and restart the process using a new primary as the source.
18+
19+
## Choosing a seed node
20+
21+
The seed node is a new node selected from the group that will form your PGD cluster. This node serves a unique transitional role during the migration:
22+
23+
1. It receives a full copy of the database from the source node.
24+
1. It performs the in-place major version upgrade.
25+
1. It initializes the final PGD cluster.
26+
27+
28+
## Planning your PGD cluster topology
29+
30+
Establish your target topology in advance. Define locations, hostnames, PGD node names, and connection strings (DSNs) before proceeding. This guide assumes a two-node PGD start: the seed node (first node) and one additional node. Adding further nodes follows the same process as the second node.
31+
32+
Use the following table to map the required environment variables for your cluster, ensuring consistency across all configuration scripts.
33+
34+
| Variable | Description | Example |
35+
| -------- | ----------- | ------- |
36+
| `${SEED_NODE_PGD_NAME}` | The PGD-specific name assigned to the seed node. | node-1 |
37+
| `${SEED_NODE_DSN}` | Connection string to reach the seed node. | host=node-1 port=5444 dbname=pgddb user=postgres |
38+
| `${ADD_NODE_PGD_NAME}` | The PGD-specific name for the second node. | node-2 |
39+
| `${ADD_NODE_DSN}` | Connection string to reach the additional node. | host=node-2 port=5432 dbname=pgddb user=postgres |
40+
| `${PGD_CLUSTER_NAME}` | The name of the PGD node group containing all nodes. | pgd_cluster_main |
41+
42+
## Command environment setup
43+
44+
The migration steps specify which node to use for each command.
45+
46+
To ensure the migration commands execute correctly, verify your environment meets the following requirements:
47+
48+
- **Shell permissions:** Execute all commands as `root` unless the instructions explicitly specify using `su`.
49+
- **Environment variables:** Ensure `${PGDATA}` is set to your Postgres data directory (e.g., `/var/lib/edb/as14/data` for EPAS 14). You may also need to set `${PGPORT}`.
50+
- **Target database:** Connect to the specific database to migrate (not the default `postgres` or `edb` databases) by setting `${PGDATABASE}` before running `psql`.
51+
52+
53+
Next step: [Transfer data to the seed node](2-data-transfer).
Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
---
2+
title: Transferring data to the seed node
3+
navTitle: Transferring data to the seed node
4+
description: Initialize the seed node through manual physical replication..
5+
---
6+
7+
In this phase, you initialize the seed node by creating a standby copy of the database on the seed node that mirrors the source node via physical replication. This standby serves as the seed node for the eventual PGD cluster. To minimize interference with the existing cluster—particularly if it is managed by Failover Manager (EFM)—this replica is added manually.
8+
9+
## Preparing the source node for logical replication
10+
11+
You must prepare the seed node to transition to logical replication. If not already enabled, set the `wal_level` to `logical` on the source node. Since this change requires a database restart, it is best to perform this step ahead of time.
12+
13+
1. Run the following command on the source node:
14+
15+
```SQL
16+
ALTER SYSTEM SET wal_level = 'logical';
17+
```
18+
19+
1. Restart the source node Postgres service for the configuration change to take effect:
20+
21+
```bash
22+
su -u enterprisedb --command "pg_ctl restart"
23+
```
24+
25+
1. After the restart, verify that the `wal_level` is correct and that there are sufficient replication slots and worker processes available. For a setup with **M** existing standbys and **N** future PGD nodes, ensure these parameters are at least what is indicated below:
26+
27+
```SQL
28+
SHOW wal_level; -- Expected 'logical'
29+
30+
SHOW max_replication_slots; -- Must be at least M + N
31+
SHOW max_wal_senders; -- Must be at least M + N
32+
SHOW max_logical_replication_workers; -- Must be at least N
33+
```
34+
35+
## Allowing outbound logical replication
36+
37+
Configure the source environment to permit external logical connections and establish the necessary authentication credentials for the migration stream.
38+
39+
1. The source node requires a role with replication privileges to facilitate data migration. You can use an existing role or create a separate role, for example:
40+
41+
```SQL
42+
CREATE ROLE repl LOGIN NOSUPERUSER REPLICATION PASSWORD 'your_password_here';
43+
```
44+
45+
EDB recommends using a `.pgpass` file to manage credentials securely without including passwords in connection strings (DSNs).
46+
47+
1. Verify connectivity from the PGD nodes using the following command to ensure they can connect without interactive password entry:
48+
49+
```bash
50+
su -u enterprisedb -c "psql --no-password '${SOURCE_DSN} replication=1' \
51+
--command 'IDENTIFY SYSTEM'"
52+
```
53+
54+
Ensure you test connectivity between the seed node and all future PGD nodes; this may require updating `pg_hba.conf` on the source node.
55+
56+
## Preparing the seed node
57+
58+
Install the same distribution and version of Postgres on the seed node to accommodate a physical backup. See [Installing EPAS](/epas/latest/installing/), [Installing PGE](/pge/latest/installing/), or [Installing Postgres](/supported-open-source/postgresql/installing/) for details.
59+
60+
!!! Note
61+
Don't install EFM on the seed node, as it must be managed manually during the migration.
62+
63+
## Creating a physical backup
64+
65+
Once the source is configured, execute the following command from the seed node to pull a physical snapshot of the database from the source node. Ensure the destination `${PGDATA}` directory does not exist before starting the transfer.
66+
67+
```bash
68+
su -u enterprisedb -c "pg_basebackup '${SOURCE_DSN}' \
69+
--pgdata=$(PGDATA) \
70+
--write-recovery-conf \
71+
--wal-method=stream \
72+
--create-slot \
73+
--slot=migration_phy_slot \
74+
--progress \
75+
--verbose"
76+
```
77+
78+
This command uses a spread checkpoint by default to minimize the I/O impact on the source node. For faster transfers at the cost of higher disk I/O, you can append `--checkpoint=fast`.
79+
80+
The transfer duration depends on the total volume of data and the network bandwidth between the source and seed nodes. For environments with limited bandwidth, you can enable compression using the `--compress-level=[0-9]` flag to optimize the transfer speed; refer to the official [pg_basebackup documentation](https://www.postgresql.org/docs/14/app-pgbasebackup.html) for detailed configuration options.
81+
82+
## Configuring and starting the seed node
83+
84+
1. Once the transfer is complete, adjust the seed node's `postgresql.conf` and `pg_hba.conf` to reflect its local environment (hostnames, IP addresses, and paths). Validate that the replication settings on the seed node match the requirements for PGD:
85+
86+
```SQL
87+
SHOW wal_level; -- Expected: 'logical'
88+
SHOW max_replication_slots; -- Should be at least M + N
89+
SHOW max_wal_senders; -- Should be at least M + N
90+
SHOW max_logical_replication_workers; -- Should be at least N
91+
```
92+
93+
1. Start the database on the seed node to initiate streaming:
94+
95+
```bash
96+
su -u enterprisedb --command "pg_ctl start"
97+
```
98+
99+
## Verifying physical replication
100+
101+
Monitor the replication status from both the source and seed nodes to ensure they are synchronized.
102+
103+
On the source node:
104+
105+
1. Check the migration slot status. It must show the slot assigned to the seed node with a low or decreasing `lag_size`:
106+
107+
```SQL
108+
SELECT slot_name, active, restart_lsn, confirmed_flush_lsn,
109+
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag_size
110+
FROM pg_replication_slots
111+
WHERE slot_name = 'migration_phy_slot';
112+
```
113+
114+
1. Verify active connections. Confirm that the seed node's IP address appears in the replication statistics:
115+
116+
```sql
117+
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn,
118+
pg_size_pretty(pg_wal_lsn_diff(sent_lsn, replay_lsn)) AS replay_lag
119+
FROM pg_stat_replication;
120+
```
121+
122+
On the seed node:
123+
124+
1. Confirm the recovery mode. The node must be correctly acting as a standby:
125+
126+
```sql
127+
SELECT pg_is_in_recovery(); -- Must return true
128+
```
129+
130+
1. Verify the WAL receiver status. Check the health of the connection to the primary (source node):
131+
132+
```sql
133+
SELECT slot_name, sender_host || ':' || sender_port AS sender, status,
134+
now() - last_msg_send_time AS last_msg_send_age,
135+
now() - last_msg_receipt_time AS last_msg_receipt_age,
136+
now() - latest_end_time AS last_end_age,
137+
pg_size_pretty(pg_wal_lsn_diff(latest_end_lsn, written_lsn)) AS write_lag_size,
138+
pg_size_pretty(pg_wal_lsn_diff(latest_end_lsn, flushed_lsn)) AS flush_lag_size
139+
FROM pg_stat_wal_receiver;
140+
```
141+
142+
1. Compare receive vs. replay position. This identifies the replay lag — the amount of data received but not yet applied to the database:
143+
144+
```sql
145+
SELECT pg_size_pretty(pg_wal_lsn_diff(pg_last_wal_receive_lsn(),
146+
pg_last_wal_replay_lsn())) AS lag_bytes;
147+
```
148+
149+
1. Monitor lag by time. Measure the time difference between the current time and the last replayed transition:
150+
151+
```sql
152+
SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) AS lag_seconds;
153+
```
154+
155+
Next step: [Convert to logical replication](3-convert-to-logical-replication).
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
---
2+
title: Converting to logical replication
3+
navTitle: Converting to logical replication
4+
description: Transitioning the seed node from physical to logical replication.
5+
---
6+
7+
To facilitate a major version upgrade of your Postgres distribution and the subsequent installation of Postgres Distributed (PGD), you must convert the replication stream to the seed node to logical replication. For a seamless transition without data loss, logical replication must resume at the exact Log Sequence Number (LSN) within the WAL stream of the source node where physical replication was interrupted. This is known as the switch-over LSN.
8+
9+
## Creating a publication on the source node
10+
11+
Prepare the source node by defining which data will be replicated and creating a dedicated slot to hold WAL files.
12+
13+
1. Create a publication for all tables in the database:
14+
15+
```SQL
16+
CREATE PUBLICATION migration_seed_pub FOR ALL TABLES;
17+
```
18+
19+
1. Verify the publications:
20+
21+
```sql
22+
SELECT * FROM pg_publication;
23+
SELECT * FROM pg_publication_tables;
24+
```
25+
26+
1. Create a logical replication slot:
27+
28+
```sql
29+
SELECT pg_create_logical_replication_slot('migration_node_${SEED_NODE_NAME}', 'pgoutput');
30+
```
31+
32+
Creating the slot at this stage ensures that the source node begins retaining the necessary WAL data before the switch-over operation. Because the source node will eventually support multiple PGD nodes, each slot is named specifically to identify the target node it serves.
33+
34+
## Promoting the seed node
35+
36+
Promoting the seed node stops the incoming physical replication stream and converts the node into a standalone, writable instance.
37+
38+
!!! Warning
39+
To maintain data integrity, you must prevent any application writes to the seed node at this stage. Applications should continue to write exclusively to the source node.
40+
41+
On the seed node, run the following command:
42+
43+
```bash
44+
su -u enterprisedb --command "pg_ctl promote"
45+
```
46+
47+
## Enabling logical replication to the seed node
48+
49+
After promotion, you must align the new logical replication slot with the point where physical replication ended.
50+
51+
1. On the seed node, identify the last LSN replayed from the physical stream:
52+
53+
```sql
54+
SELECT pg_last_wal_replay_lsn();
55+
```
56+
57+
1. On the source node, manually move the logical slot forward to that LSN. Replace `${SWITCH_OVER_LSN}` with the value retrieved from the step above.
58+
59+
```sql
60+
SELECT pg_replication_slot_advance('migration_node_${SEED_NODE_NAME}', '${SWITCH_OVER_LSN}');
61+
```
62+
63+
1. Create a logical subscription on the seed node:
64+
65+
```SQL
66+
CREATE SUBSCRIPTION migration_seed_sub
67+
CONNECTION '${SOURCE_DSN}'
68+
PUBLICATION migration_seed_pub
69+
WITH (
70+
enabled = false,
71+
copy_data = false,
72+
create_slot = false,
73+
slot_name = 'migration_node_${SEED_NODE_NAME}'
74+
);
75+
```
76+
77+
1. Verify that the subscription has correctly identified the tables for replication:
78+
79+
```sql
80+
SELECT n.nspname AS schemaname,
81+
c.relname AS tablename,
82+
sr.srsubstate AS state,
83+
s.subname AS subscription_name
84+
FROM pg_subscription_rel sr
85+
JOIN pg_class c ON sr.srrelid = c.oid
86+
JOIN pg_namespace n ON c.relnamespace = n.oid
87+
JOIN pg_subscription s ON sr.srsubid = s.oid
88+
WHERE s.subname = 'migration_seed_sub'
89+
ORDER BY n.nspname, c.relname;
90+
```
91+
92+
1. Enable logical replication:
93+
94+
```SQL
95+
ALTER SUBSCRIPTION migration_seed_sub ENABLE;
96+
```
97+
98+
## Cleaning up physical replication
99+
100+
1. Once the logical stream is verified and active, run the following command on the source node to remove the legacy physical replication components and free up resources:
101+
102+
```SQL
103+
SELECT pg_drop_replication_slot('migration_phy_slot');
104+
```
105+
106+
1. On the seed node, run the following commands to remove any remaining configuration parameters for physical replication:
107+
108+
```SQL
109+
ALTER SYSTEM RESET primary_conninfo;
110+
ALTER SYSTEM RESET primary_slot_name;
111+
```
112+
113+
114+
Next step: [Upgrade the seed node](4-upgrade-seed-node).

0 commit comments

Comments
 (0)