Skip to content

Commit 05aa4b5

Browse files
authored
Update parallel_initial_load.md
1 parent 76afd35 commit 05aa4b5

File tree

1 file changed

+7
-1
lines changed

1 file changed

+7
-1
lines changed

docs/integrations/data-ingestion/clickpipes/postgres/parallel_initial_load.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ sidebar_label: 'How parallel snapshot works'
66
---
77

88
import snapshot_params from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/snapshot_params.png'
9+
import Image from '@theme/IdealImage';
910

1011
This document explains parallelized snapshot/initial load in the Postgres ClickPipe works and talks about the snapshot parameters that can be used to control it.
1112

@@ -22,20 +23,25 @@ The Postgres ClickPipe uses the CTID column to logically partition source tables
2223

2324
Let's talk about the below settings:
2425

25-
<img src={snapshot_params} alt="Snapshot parameters" />
26+
<Image src={snapshot_params} alt="Snapshot parameters" size="md"/>
2627

2728
#### Snapshot number of rows per partition {#numrows-pg-snapshot}
29+
2830
This setting controls how many rows constitute a partition. The ClickPipe will read the source table in chunks of this size, and chunks will be processed in parallel based on the initial load parallelism set. The default value is 100,000 rows per partition.
2931

3032
#### Initial load parallelism {#parallelism-pg-snapshot}
33+
3134
This setting controls how many partitions will be processed in parallel. The default value is 4, which means that the ClickPipe will read 4 partitions of the source table in parallel. This can be increased to speed up the initial load, but it is recommended to keep it to a reasonable value depending on your source instance specs to avoid overwhelming the source database. The ClickPipe will automatically adjust the number of partitions based on the size of the source table and the number of rows per partition.
3235

3336
#### Snapshot number of tables in parallel {#tables-parallel-pg-snapshot}
37+
3438
Not really related to parallel snapshot, but this setting controls how many tables will be processed in parallel during the initial load. The default value is 1. Note that is on top of the parallelism of the partitions, so if you have 4 partitions and 2 tables, the ClickPipe will read 8 partitions in parallel.
3539

3640
### Monitoring parallel snapshot in Postgres {#monitoring-parallel-pg-snapshot}
41+
3742
You can analyze **pg_stat_activity** to see the parallel snapshot in action. The ClickPipe will create multiple connections to the source database, each reading a different partition of the source table. If you see **FETCH** queries with different CTID ranges, it means that the ClickPipe is reading the source tables. You can also see the COUNT(*) and the partitioning query in here.
3843

3944
### Limitations {#limitations-parallel-pg-snapshot}
45+
4046
- The snapshot parameters cannot be edited after pipe creation. If you want to change them, you will have to create a new ClickPipe.
4147
- When adding tables to an existing ClickPipe, you cannot change the snapshot parameters. The ClickPipe will use the existing parameters for the new tables.

0 commit comments

Comments
 (0)