Skip to content

Commit b3db254

Browse files
authored
More resharding docs (#28)
1 parent 6a58eb1 commit b3db254

File tree

5 files changed

+120
-45
lines changed

5 files changed

+120
-45
lines changed

docs/features/sharding/resharding/hash.md

Lines changed: 105 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -4,55 +4,126 @@ icon: material/database-export-outline
44

55
# Move data
66

7-
If you're using the `HASH` sharding function, adding a new node to the cluster will change the modulo number by 1. The number returned by the hash function is uniformly distributed across the entire integer range, which makes it considerably larger than the modulo. Therefore, changing it will more often than not result in most rows remapped to different shard numbers.
7+
Moving data from the source to the destination database is done using logical replication. This is an online operation, and doesn't require a maintenance window or pausing query traffic.
88

9-
You can visualize this phenomenon with a bit of Python:
9+
The underlying mechanism is very similar to Postgres [subscriptions](https://www.postgresql.org/docs/current/sql-createsubscription.html), with some improvements, and happens in two steps:
1010

11-
=== "2 shards"
11+
1. Copy data in the [publication](schema.md#publication) to the destination database
12+
2. Stream row changes in real-time
1213

13-
```python
14-
>>> list(map(lambda x: x % 2, [1000, 1001, 1002, 1003, 1004]))
15-
[0, 1, 0, 1, 0]
16-
```
14+
Once the replication stream synchronizes the two database clusters, the data on the destination cluster will be identical, within a few milliseconds, to the source cluster.
1715

18-
=== "3 shards"
19-
```python
20-
>>> list(map(lambda x: x % 3, [1000, 1001, 1002, 1003, 1004]))
21-
[1, 2, 0, 1, 2]
22-
```
16+
## CLI
17+
18+
PgDog has a command line interface you can call by running it directly:
19+
20+
```bash
21+
pgdog data-sync \
22+
--from-database <name> \
23+
--from-user <name> \
24+
--to-database <name> \
25+
--to-user <name> \
26+
--publication <publication>
27+
```
2328

24-
Since most rows will have to be moved, resharding a cluster in-place would put a lot of load on an already overextended system.
29+
Required (*) and optional parameters for this command are as follows:
2530

26-
PgDog's strategy for resharding is to **move data** from an existing cluster to a completely new one, while rehashing the rows in-flight. This process is parallelizable and fast, and since most of the work is done by the new cluster, production databases are not affected.
31+
| Option | Description |
32+
|-|-|
33+
| `--from-database`* | Name of the source database cluster. |
34+
| `--from-user`* | Name of the user configured in `users.toml` for the source database cluster. |
35+
| `--to-database`* | Name of the destination database cluster. |
36+
| `--to-user`* | Name of the user configured in `users.toml` for the destination database cluster. |
37+
| `--publication`* | Name of the Postgres [publication](schema.md#publication) for tables to be copied and sharded. It should exist on the **source** database. |
2738

28-
## Data sync
39+
## How it works
2940

30-
Moving data online is a 2-step process:
41+
The first thing PgDog will do when data sync is started is create a replication slot on each primary database in the source cluster. This will prevent Postgres from removing the WAL, while we copy data for each table to the destination.
3142

32-
1. Copy data from tables using Postgres `COPY`
33-
2. Stream real-time changes using logical replication
43+
Next, each table will be copied, in parallel, to the destination database, using [sharded COPY](../copy.md). Once that's done, table changes are synchronized, in real-time, with logical replication from the replication slot created earlier.
3444

35-
To make sure no rows are lost in the process, PgDog follows a similar strategy used by Postgres in logical replication subscriptions, with some improvements.
45+
The whole process happens entirely online, and doesn't require database reboots or pausing writes to the source database.
3646

37-
### Copying tables
47+
### Replication slot
3848

39-
Copying table data from the source database cluster is done using Postgres `COPY` and logical replication slots. This is implemented in the `data-sync` command:
49+
PostgreSQL replication works on the basis of slots. They are virtual annotations in the Write-Ahead Log which prevent Postgres from recycling WAL segments and deleting the history of changes made to the database.
4050

41-
```bash
42-
pgdog data-sync --help
51+
<center>
52+
<img src="/images/resharding-slot-2.png" width="75%" alt="Cross-shard queries" />
53+
</center>
54+
55+
With logical replication, any client that speaks the protocol (like PgDog) can connect to the server and stream changes made to the database, starting at the position marked by the slot.
56+
57+
Before copying table data, we create a slot to mark a consistent starting point for our replication process. The slot is **permanent**, so even if resharding is interrupted, Postgres won't lose any of the WAL segments we need to resume it.
58+
59+
!!! note "Unused replication slots"
60+
Since permanent replication slots are not automatically removed by Postgres, if you choose to abort the resharding process, make sure to manually drop the replication slot to prevent excessive WAL accumulation on the source database.
61+
62+
Once the slot is created, PgDog starts copying data from all tables in the [publication](schema.md#publication), and resharding it in-flight.
63+
64+
### Copying data
65+
66+
Tables are copied from source to destination database using standard PostgreSQL `COPY` commands, with a few improvements.
67+
68+
#### Parallelization
69+
70+
If you are running PostgreSQL 16 or later and have configured replicas on the source database, PgDog can copy multiple tables in parallel, dramatically accelerating this process.
71+
72+
<center>
73+
<img src="/images/resharding-16x.png" width="75%" alt="Cross-shard queries" />
74+
</center>
75+
76+
To set this up, make sure to add your read replicas to [`pgdog.toml`](../../../configuration/pgdog.toml/databases.md), for example:
77+
78+
```toml
79+
[[databases]]
80+
name = "source"
81+
host = "10.0.0.1"
82+
role = "replica"
83+
84+
[[databases]]
85+
name = "source"
86+
host = "10.0.0.2"
87+
role = "replica"
4388
```
4489

45-
| Option | Description | Example |
46-
|-|-|-|
47-
| `--from-database` | Name of the **source** database cluster. | `prod` |
48-
| `--from-user` | Name of the user configured in `users.toml` for the **source** database cluster. | `postgres` |
49-
| `--to-database` | Name of the **destination** database cluster. | `prod-sharded` |
50-
| `--to-user` | Name of the user configured in `users.toml` for the **destination** database cluster. | `postgres` |
51-
| `--publication` | Name of the Postgres [publication](https://www.postgresql.org/docs/current/sql-createpublication.html) for tables to be copied and sharded. It should exist on the **source** database. | `all_tables` |
90+
PgDog will distribute the table copy load evenly between all replicas in the configuration. The more replicas are available for resharding, the faster it will complete.
91+
92+
!!! note "Dedicated replicas"
93+
To prevent the resharding process from impacting production queries, you can create a separate set of replicas just for resharding.
94+
95+
Managed clouds (e.g., AWS RDS) make this especially easy, but require a warm-up period to fetch all the data from the backup snapshot, before they can read data at full speed of their storage volumes.
96+
97+
#### Binary `COPY`
98+
99+
PgDog uses the binary `COPY` format to send and receive data, which has been shown to be consistently faster than text, because it avoids the (de)serialization overhead of sending tuples in text form. PgDog decodes tuples in-flight and splits them evenly between destination shards, using the [sharded COPY](../copy.md) implementation.
100+
101+
!!! note "Binary compatibility"
102+
While the data format used by PostgreSQL databases hasn't changed in decades, binary `COPY` sends rows exactly as they are stored on disk.
103+
104+
Therefore, sending binary data between two PostgreSQL databases running different
105+
versions of Postgres, however unlikely, could result in incompatibilities. We recommend to _not_ change major versions of the server while resharding.
106+
107+
Once all tables are copied and resharded on the destination database cluster, PgDog will begin streaming real-time row updates from the [replication slot](#replication-slot).
108+
109+
### Streaming updates
110+
111+
Once tables are copied over to the destination database, PgDog will stream any changes made to those tables from the [replication slot](#replication-slot) created previously. If it took a while to copy tables and the source database received a large volume of writes, this process could take some time.
112+
113+
You can check on the streaming process and estimate its ETA by querying the `pg_replication_slots` view on the __source__ database:
114+
115+
=== "Source database"
116+
```postgresql
117+
SELECT confirmed_flush_lsn, pg_current_wal_lsn() FROM pg_replication_slots;
118+
```
119+
120+
| Column | Description |
121+
|-|-|
122+
| `confirmed_flush_lsn` | The transaction identifier that has been written to the destination database cluster. |
123+
| `pg_current_wal_lsn()` | Current position in the Write-Ahead Log for this database. |
52124

53-
All databases and users must be configured in `pgdog.toml` and `users.toml`.
125+
The replication delay between the two database clusters is measured in bytes. When that number reaches zero, the two databases are byte-for-byte identical, and traffic can be [cut over](cutover.md) to the destination database.
54126

55-
### Real time changes
127+
## Next steps
56128

57-
After data sync is complete, changes for all tables in the publication will be streamed in real-time. Keep this connection
58-
open until you are ready to cut traffic over to the new database cluster.
129+
- [Traffic cutover](cutover.md)

docs/features/sharding/resharding/schema.md

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@ icon: material/database-edit-outline
33
---
44
# Schema sync
55

6-
PgDog can copy tables, indices and other entities from your production database to the new, sharded database automatically. This is faster than using `pg_dump`, because we separate this process into two parts:
6+
PgDog can copy tables, indexes and other entities from your production database to the new, sharded database automatically. This is faster than using `pg_dump`, because we separate this process into two parts:
77

8-
1. [Create tables](#tables-and-primary-keys), primary key indices, and sequences
9-
2. Create [secondary indices](#secondary-indices)
8+
1. [Create tables](#tables-and-primary-keys), primary key indexes, and sequences
9+
2. Create [secondary indexes](#secondary-indexes)
1010

11-
The first step needs to be performed first, before [copying data](hash.md). The second step is performed once the data sync is almost complete.
11+
The create tables step needs to be performed first, before [copying data](hash.md). The second step is performed once the data sync is almost complete.
1212

1313
## CLI
1414

@@ -30,11 +30,11 @@ Required (*) and optional parameters for this command are as follows:
3030
| `--publication`* | The name of the Postgres table [publication](#publication) with the tables you want to sync. |
3131
| `--dry-run` | Print the SQL statements that will be executed on the destination database and exit. |
3232
| `--ignore-errors` | Execute SQL statements and ignore any errors. |
33-
| `--data-sync-complete` | Run the second step to create secondary indices and sequences. |
33+
| `--data-sync-complete` | Run the second step to create secondary indexes and sequences. |
3434

3535
## Tables and primary keys
3636

37-
The first step in the schema sync copies over tables and their primary key indices from the source database to the new, resharded cluster. This has to be done separately, because Postgres's logical replication only copies data and doesn't manage table schemas.
37+
The first step in the schema sync copies over tables and their primary key indexes from the source database to the new, resharded cluster. This has to be done separately, because Postgres's logical replication only copies data and doesn't manage table schemas.
3838

3939
### Primary keys
4040

@@ -60,7 +60,7 @@ This will make sure _all_ tables in your database will be copied and resharded i
6060

6161
## Schema admin
6262

63-
Schema sync creates tables, indices, and other entities on the destination database. To make sure that's done with a user with sufficient privileges (e.g., `CREATE` permission on the database), you need to add it to [`users.toml`](../../../configuration/users.toml/users.md) and mark it as the schema administrator:
63+
Schema sync creates tables, indexes, and other entities on the destination database. To make sure that's done with a user with sufficient privileges (e.g., `CREATE` permission on the database), you need to add it to [`users.toml`](../../../configuration/users.toml/users.md) and mark it as the schema administrator:
6464

6565
```toml
6666
[[users]]
@@ -74,7 +74,7 @@ PgDog will use that user to connect to the source and destination databases, so
7474

7575
### `pg_dump` version
7676

77-
PgDog is using `pg_dump` under the hood to export schema definitions. Postgres requires the version of `pg_dump` and the server to be identical. Our [Docker image](../../../installation.md) comes with `pg_dump` for PostgreSQL 16, but your database server may run a different version.
77+
PgDog is using `pg_dump` under the hood to export schema definitions. Postgres requires the version of `pg_dump` and the Postgres server to be identical. Our [Docker image](../../../installation.md) comes with `pg_dump` for PostgreSQL 16, but your database server may run a different version.
7878

7979
Before proceeding, make sure to install the correct version of `pg_dump` for your source database. If you have multiple versions of `pg_dump` installed on the same host, you can specify the path to the right one in `pgdog.toml`:
8080

@@ -83,6 +83,10 @@ Before proceeding, make sure to install the correct version of `pg_dump` for you
8383
pg_dump_path = "/path/to/pg_dump"
8484
```
8585

86-
## Secondary indices
86+
## Secondary indexes
8787

88-
This step is performed after [data sync](hash.md) is complete. Running this step will create secondary indices on all your tables, which will take some time.
88+
This step is performed after [data sync](hash.md) is complete. Running this step will create secondary indexes on all your tables, which will take some time, depending on the number of indexes in your schema.
89+
90+
## Next steps
91+
92+
- [Move data](hash.md)

docs/images/resharding-16x.png

53.9 KB
Loading

docs/images/resharding-slot-2.png

23.3 KB
Loading

docs/roadmap.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ Support for sorting rows in [cross-shard](features/sharding/cross-shard.md) quer
8787

8888
| Feature | Status | Notes |
8989
|-|-|-|
90-
| [Data sync](features/sharding/resharding/hash.md#data-sync) | :material-wrench: | Sync table data with logical replication. Not benchmarked yet. |
90+
| [Data sync](features/sharding/resharding/hash.md) | :material-wrench: | Sync table data with logical replication. Not benchmarked yet. |
9191
| [Schema sync](features/sharding/resharding/schema.md) | :material-wrench: | Sync table, index and constraint definitions. Not benchmarked yet. |
9292
| Online rebalancing | :material-calendar-check: | Not automated yet, requires manual orchestration. |
9393

0 commit comments

Comments
 (0)