Skip to content

Commit 3cd29c6

Browse files
committed
v2.14.2: Significantly speed up moving of large microshards; queries refactoring in the code
1 parent 3220818 commit 3cd29c6

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

85 files changed

+2387
-6153
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ dist
44
node_modules
55
package-lock.json
66
yarn.lock
7+
pnpm-lock.yaml
78
.DS_Store
89
*.log
910
*.tmp

.npmignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ dist/*.tsbuildinfo
77
node_modules
88
package-lock.json
99
yarn.lock
10+
pnpm-lock.yaml
1011
.DS_Store
1112
*.log
1213
*.tmp

README.md

Lines changed: 69 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -128,18 +128,26 @@ pg-microsharding list
128128

129129
This action prints the list of all PostgreSQL islands (pointed by DNSn), microshards and some statistics.
130130

131-
In `--verbose` mode, also prints detailed statistics anout insert/update/delete, index scans and seqscans.
131+
In `--verbose` mode, also prints detailed statistics about insert/update/delete, index scans and seqscans.
132+
133+
<div align="center"><figure><img src="https://raw.githubusercontent.com/dimikot/ent-framework/refs/heads/main/gitbook/.gitbook/assets/pg-microsharding-list.png" alt="" width="563"><figcaption></figcaption></figure></div>
132134

133135
### Allocate New Microshards: pg-microsharding allocate
134136

135137
```typescript
136-
pg-microsharding allocate --shards=301-399 --activate=yes
138+
pg-microsharding allocate --shards=301-309 --activate=yes
137139
```
138140

139141
This action allows you to create more microshard schemas in the cluster. The microshards are created on PostgreSQL the host pointed by the 1st DSN, so after it's done, run `pg-microsharding rebalance` to spread that new schemas across other nodes.
140142
141143
Each microshard can either be "active" or "inactive". When you create them, you tell the tool, should the microshards become active immediately (and thus, visible to `microsharding_list_active_shards()` API) or not. You can always activate the schemas later using the same exact command (it is idempotent).
142144
145+
The tool runs `--migrate-cmd` command right after creating the inactive microshards, assuming that your migration tool will initialize them properly.
146+
147+
<figure><img src="https://raw.githubusercontent.com/dimikot/ent-framework/refs/heads/main/gitbook/.gitbook/assets/pg-microsharding-allocate.png" alt=""><figcaption></figcaption></figure>
148+
149+
<figure><img src="https://raw.githubusercontent.com/dimikot/ent-framework/refs/heads/main/gitbook/.gitbook/assets/pg-microsharding-allocate-list.png" alt="" width="375"><figcaption></figcaption></figure>
150+
143151
### Move One Microshard: pg-microsharding move
144152
145153
```bash
@@ -148,12 +156,22 @@ pg-microsharding move \
148156
--activate-on-destination=yes
149157
```
150158
151-
Microshards can be moved from one PostgreSQL node to another. There is no need to stop writes while moving microshards: the tool uses PostgreSQL logical replication to stream each microshard table's data, and in the very end, acquires a quick exclusive lock to finalize the move.
159+
Microshards can be moved from one PostgreSQL node to another. There is no need to stop writes while moving microshards: the tool uses PostgreSQL logical replication to stream each microshard table's data, and in the very end, acquires a quick write lock to finalize the move.
160+
161+
There are many aspects and corner cases addressed in the move action, here are some of them:
162+
163+
* The move is fast even for large microshards. The tool internally uses the same approach for data copying as `pg_dump`. First recreates the tables structure on the destination, except most of the indexes and foreign key constraints (only the primary key indexes or REPLICA IDENTITY indexes are created at this stage, since they are required for the logical replication to work). Then, it copies the data, utilizing the built-in PostgreSQL tablesync worker; this process is fast, since it inserts the data in bulk and doesn't update indexes. In the end, the tool creates the remaining indexes and foreign key constraints (this is where you may want to increase [maintenance\_work\_mem](https://www.postgresql.org/docs/current/runtime-config-resource.html) for the role you pass to pg-microsharding, since it directly affects the indexes creation time). Overall, this approach speeds up the copying by \~10x comparing to the naive way of using logical subscriptions.
164+
* At each long running step, the tool shows a descriptive progress information: how many tuples are copied so far, what is the elapsed %, how much time is left, what are the SQL queries it executes (dynamically updatable block in console) etc.
165+
* It also shows replication lag statistics for all physical replicas of the source and the destination, plus the logical replication lag of the temporary subscription.
166+
* In the end, the tool activates the microshard on the destination and deactivates on the source, but it does it only when the replication lag in seconds dropped below some reasonable threshold (defaults to 20 seconds, but you can pass a lower value to be on a safe side). So the write lock is guaranteed to be acquired for only a brief moment.
167+
* The tool runs it all in an automatically created tmux session. If you accidentally disconnect, then just connect back and rerun the same command line: instead of running another move action, if will jump you back in the existing session.
152168
153169
If you're unsure, you can practice with the move without activating the microshard on the destination (and without deactivating it on the source) by passing `--activate-on-destination=no` option. This is like a "dry-run" mode, where the tool does all the work, except the very last step. The moved schema on the destination won't be activated, and it will also be renamed using some long descriptive prefix (including the move date).
154170
155171
At any moment, you can abort the move with ^C. It is safe: half-moved data will remain on the destination, but the microshard schema will remaim invisible there for e.g. `microsharding_list_active_shards()` API (see below). If you then rerun the `move` action, it will start from scratch.
156172
173+
<figure><img src="https://raw.githubusercontent.com/dimikot/ent-framework/refs/heads/main/gitbook/.gitbook/assets/pg-microsharding-move.png" alt=""><figcaption></figcaption></figure>
174+
157175
### Clean Old Moved Copies: pg-microsharding cleanup
158176
159177
```bash
@@ -174,18 +192,30 @@ This action runs multiple "move" sub-actions in parallel, utilizing [tmux](https
174192

175193
Before running the moves, the action calculates weights of each shard (by default, the weight is the microshard tables size in bytes, mutuplied by per-shard "weight factor"; see below). Then, it estimates, which microshards need to be moved to what islands, to achieve a more or less uniform distribution. The algorithm is complicated: among other heuristics, it tries to make sure that each island gets approximately the same number of microshards with comparable sizes (e.g. if you allocate 100 new empty microshards, then rebalancing will spread them across islands uniformly).
176194

195+
<figure><img src="https://raw.githubusercontent.com/dimikot/ent-framework/refs/heads/main/gitbook/.gitbook/assets/pg-microsharding-rebalance-plan.png" alt="" width="563"><figcaption></figcaption></figure>
196+
177197
Once the rebalancing plan is ready, the tool will print it to you and ask for your confirmation. You can always run `pg-microsharding rebalance` and then press ^C to just see, what _would_ happen if you rebalance.
178198

199+
After rebalancing, the result may look like:
200+
201+
<figure><img src="https://raw.githubusercontent.com/dimikot/ent-framework/refs/heads/main/gitbook/.gitbook/assets/pg-microsharding-rebalance-after.png" alt="" width="375"><figcaption></figcaption></figure>
202+
179203
At any time, you can abort the rebalancing with ^C in any of the tmux panes. It is as safe as aborting the `move` action.
180204

181-
### Evacuate All Microshards from an Island
205+
### Evacuate All Microshards from an Island (Decommissioning)
182206

183207
```bash
184208
pg-microsharding rebalance \
185-
--decommission=host1 --activate-on-destination=yes
209+
--decommission=host2 --activate-on-destination=yes
186210
```
187211

188-
This mode of "rebalance" action allows you to remove a PostgreSQL host from the cluster, or even upgrade PostgreSQL to the next major version with no downtime. It moves all the microshards from the provided DSN, so the host becomes "empty". After the decommissioning is done, you can remove the host from the cluster or upgrade PostgreSQL, then rebalance the microshards back (rebalancing works fine across different major PostgreSQL versions).
212+
This mode of "rebalance" action allows you to remove a PostgreSQL host from the cluster, or even upgrade PostgreSQL to the next major version with no downtime. It moves all the microshards from the provided DSN, so the host becomes "empty".&#x20;
213+
214+
E.g. after decommissioning, the result may look like (notice that one node became empty):
215+
216+
<figure><img src="https://raw.githubusercontent.com/dimikot/ent-framework/refs/heads/main/gitbook/.gitbook/assets/pg-microsharding-decommission.png" alt="" width="375"><figcaption></figcaption></figure>
217+
218+
Now, you can remove the host from the cluster or upgrade PostgreSQL, then rebalance the microshards back (rebalancing works fine across different major PostgreSQL versions).
189219

190220
### Tweak Island Weights: pg-microsharding factor
191221

@@ -244,3 +274,36 @@ export const cluster = new Cluster({
244274
...
245275
});
246276
```
277+
278+
### Microsharding Debug Views
279+
280+
The `microsharding_migration_after()` function creates so-called "debug views" for each sharded table in your cluster. For instance, it you have `sh0001.users`, `sh0002.users` etc. tables. then it will create a debug view `public.users` with the definition like:
281+
282+
```sql
283+
-- This is what pg-microsharding creates automatically.
284+
CREATE VIEW public.users AS
285+
SELECT * FROM sh0001.users
286+
UNION ALL
287+
SELECT * FROM sh0002.users
288+
UNION ALL
289+
...;
290+
```
291+
292+
Even more, if you pass the list of all PostgreSQL hosts, and those hosts can access each other without a password (e.g. they have `/var/lib/postgresql/N/.pgpass` files), then those debug views will work **across all shards on all nodes, including the remote ones** (using [foreign-data wrapper](https://www.postgresql.org/docs/current/postgres-fdw.html) functionality).
293+
294+
So **for debugging purposes**, you'll be able to run queries across all microshards in your `psql` sessions. This is typically very convenient.
295+
296+
Of course those **debug views are not suitable for production traffic**: cross-node communication in PostgreSQL, as well as query planning, work not enough inefficiently. Do not even try, use application-level microshards routing, like e.g. [Ent Framework](https://ent-framework.org/) provides.
297+
298+
```
299+
$ psql
300+
postgres=# SELECT shard, email FROM users
301+
WHERE created_at > now() - '1 hour'::interval;
302+
-- Prints all recent users from all microshards, including
303+
-- the microshards on other PosgreSQL nodes! Use for
304+
-- debugging purposes only.
305+
```
306+
307+
As of `microsharding_migration_before()`, you must call it before any changes are applied to your microsharded tables. The function drops all of the debug views mentioned above. E.g. if you remove a column from a table, PostgreSQL would not allow you to do it it this column is mentioned in any of the views, so it's important to drop the views and re-create them afterwards.
308+
309+
Typically, you just call `microsharding_migration_before()` in your pre-migration sequence and then call `microsharding_migration_after()` in your post-migration steps.

0 commit comments

Comments
 (0)