You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/integrations/data-ingestion/clickpipes/postgres/controlling_sync.md
+8Lines changed: 8 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,32 +20,40 @@ Database ClickPipes have an architecture that consists of two parallel processes
20
20
There are two main ways to control the sync of a database ClickPipe. The ClickPipe will start pushing when one of the below settings kicks in.
21
21
22
22
### Sync interval {#interval-pg-sync}
23
+
23
24
The sync interval of the pipe is the amount of time (in seconds) for which the ClickPipe will pull records from the source database. The time to push what we have to ClickHouse is not included in this interval.
24
25
25
26
The default is **1 minute**.
26
27
Sync interval can be set to any positive integer value, but it is recommended to keep it above 10 seconds.
27
28
28
29
### Pull batch size {#batch-size-pg-sync}
30
+
29
31
The pull batch size is the number of records that the ClickPipe will pull from the source database in one batch. Records mean inserts, updates and deletes done on the tables that are part of the pipe.
30
32
31
33
The default is **100,000** records.
32
34
A safe maximum is 10 million.
33
35
34
36
### An exception: Long-running transactions on source {#transactions-pg-sync}
37
+
35
38
When a transaction is run on the source database, the ClickPipe waits until it receives the COMMIT of the transaction before it moves forward. This with **overrides** both the sync interval and the pull batch size.
### Tweaking the sync settings to help with replication slot growth {#tweaking-pg-sync}
56
+
49
57
Let's talk about how to use these settings to handle a large replication slot of a CDC pipe.
50
58
The pushing time to ClickHouse does not scale linearly with the pulling time from the source database. This can be leveraged to reduce the size of a large replication slot.
51
59
By increasing both the sync interval and pull batch size, the ClickPipe will pull a whole lot of data from the source database in one go, and then push it to ClickHouse.
0 commit comments