Skip to content

Commit 97a134b

Browse files
authored
Merge pull request #4160 from ClickHouse/Blargian-patch-72
Image fix
2 parents fcef07d + fd3b577 commit 97a134b

File tree

8 files changed

+49
-29
lines changed

8 files changed

+49
-29
lines changed

docs/integrations/data-ingestion/clickpipes/mysql/controlling_sync.md

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ import edit_sync_button from '@site/static/images/integrations/data-ingestion/cl
99
import create_sync_settings from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/create_sync_settings.png'
1010
import edit_sync_settings from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/sync_settings_edit.png'
1111
import cdc_syncs from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/cdc_syncs.png'
12+
import Image from '@theme/IdealImage';
1213

1314
This document describes how to control the sync of a database ClickPipe (Postgres, MySQL etc.) when the ClickPipe is in **CDC (Running) mode**.
1415

@@ -19,32 +20,40 @@ Database ClickPipes have an architecture that consists of two parallel processes
1920
There are two main ways to control the sync of a database ClickPipe. The ClickPipe will start pushing when one of the below settings kicks in.
2021

2122
### Sync interval {#interval-mysql-sync}
23+
2224
The sync interval of the pipe is the amount of time (in seconds) for which the ClickPipe will pull records from the source database. The time to push what we have to ClickHouse is not included in this interval.
2325

2426
The default is **1 minute**.
2527
Sync interval can be set to any positive integer value, but it is recommended to keep it above 10 seconds.
2628

2729
### Pull batch size {#batch-size-mysql-sync}
30+
2831
The pull batch size is the number of records that the ClickPipe will pull from the source database in one batch. Records mean inserts, updates and deletes done on the tables that are part of the pipe.
2932

3033
The default is **100,000** records.
3134
A safe maximum is 10 million.
3235

3336
### An exception: Long-running transactions on source {#transactions-pg-sync}
37+
3438
When a transaction is run on the source database, the ClickPipe waits until it receives the COMMIT of the transaction before it moves forward. This with **overrides** both the sync interval and the pull batch size.
3539

3640
### Configuring sync settings {#configuring-mysql-sync}
41+
3742
You can set the sync interval and pull batch size when you create a ClickPipe or edit an existing one.
3843
When creating a ClickPipe it will be seen in the second step of the creation wizard, as shown below:
39-
<img src={create_sync_settings} alt="Create sync settings" />
44+
45+
<Image img={create_sync_settings} alt="Create sync settings" size="md"/>
4046

4147
When editing an existing ClickPipe, you can head over to the **Settings** tab of the pipe, pause the pipe and then click on **Configure** here:
42-
<img src={edit_sync_button} alt="Edit sync button" />
48+
49+
<Image img={edit_sync_button} alt="Edit sync button" size="md"/>
4350

4451
This will open a flyout with the sync settings, where you can change the sync interval and pull batch size:
45-
<img src={edit_sync_settings} alt="Edit sync settings" />
52+
53+
<Image img={edit_sync_settings} alt="Edit sync settings" size="md"/>
4654

4755
### Monitoring sync control behaviour {#monitoring-mysql-sync}
56+
4857
You can see how long each batch takes in the **CDC Syncs** table in the **Metrics** tab of the ClickPipe. Note that the duration here includes push time and also if there are no rows incoming, the ClickPipe waits and the wait time is also included in the duration.
4958

50-
<img src={cdc_syncs} alt="CDC Syncs table" />
59+
<Image img={cdc_syncs} alt="CDC Syncs table" size="md"/>

docs/integrations/data-ingestion/clickpipes/mysql/parallel_initial_load.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ sidebar_label: 'How parallel snapshot works'
77

88
import snapshot_params from '@site/static/images/integrations/data-ingestion/clickpipes/mysql/snapshot_params.png'
99
import partition_key from '@site/static/images/integrations/data-ingestion/clickpipes/mysql/partition_key.png'
10+
import Image from '@theme/IdealImage';
1011

1112
This document explains parallelized snapshot/initial load in the MySQL ClickPipe works and talks about the snapshot parameters that can be used to control it.
1213

@@ -18,7 +19,7 @@ However, the MySQL ClickPipe can parallelize this process, which can significant
1819
### Partition key column {#key-mysql-snapshot}
1920

2021
Once we've enabled the feature flag, you should see the below setting in the ClickPipe table picker (both during creation and editing of a ClickPipe):
21-
<img src={partition_key} alt="Partition key column" />
22+
<Image img={partition_key} alt="Partition key column" size="md"/>
2223

2324
The MySQL ClickPipe uses a column on your source table to logically partition the source tables. This column is called the **partition key column**. It is used to divide the source table into partitions, which can then be processed in parallel by the ClickPipe.
2425

@@ -30,7 +31,7 @@ The partition key column must be indexed in the source table to see a good perfo
3031

3132
Let's talk about the below settings:
3233

33-
<img src={snapshot_params} alt="Snapshot parameters" />
34+
<Image img={snapshot_params} alt="Snapshot parameters" size="md"/>
3435

3536
#### Snapshot number of rows per partition {#numrows-mysql-snapshot}
3637
This setting controls how many rows constitute a partition. The ClickPipe will read the source table in chunks of this size, and chunks will be processed in parallel based on the initial load parallelism set. The default value is 100,000 rows per partition.

docs/integrations/data-ingestion/clickpipes/mysql/pause_and_resume.md

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,31 +19,26 @@ There are scenarios where it would be useful to pause a MySQL ClickPipe. For exa
1919
1. In the Data Sources tab, click on the MySQL ClickPipe you wish to pause.
2020
2. Head over to the **Settings** tab.
2121
3. Click on the **Pause** button.
22-
<br/>
2322

2423
<Image img={pause_button} border size="md"/>
2524

2625
4. A dialog box should appear for confirmation. Click on Pause again.
27-
<br/>
2826

2927
<Image img={pause_dialog} border size="md"/>
3028

3129
4. Head over to the **Metrics** tab.
3230
5. In around 5 seconds (and also on page refresh), the status of the pipe should be **Paused**.
33-
<br/>
3431

3532
<Image img={pause_status} border size="md"/>
3633

3734
## Steps to resume a MySQL ClickPipe {#resume-clickpipe-steps}
3835
1. In the Data Sources tab, click on the MySQL ClickPipe you wish to resume. The status of the mirror should be **Paused** initially.
3936
2. Head over to the **Settings** tab.
4037
3. Click on the **Resume** button.
41-
<br/>
4238

4339
<Image img={resume_button} border size="md"/>
4440

4541
4. A dialog box should appear for confirmation. Click on Resume again.
46-
<br/>
4742

4843
<Image img={resume_dialog} border size="md"/>
4944

docs/integrations/data-ingestion/clickpipes/mysql/resync.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,12 @@ sidebar_label: 'Resync ClickPipe'
66
---
77

88
import resync_button from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/resync_button.png'
9+
import Image from '@theme/IdealImage';
910

1011
### What does Resync do? {#what-mysql-resync-do}
1112

1213
Resync involves the following operations in order:
14+
1315
1. The existing ClickPipe is dropped, and a new "resync" ClickPipe is kicked off. Thus, changes to source table structures will be picked up when you resync.
1416
2. The resync ClickPipe creates (or replaces) a new set of destination tables which have the same names as the original tables except with a `_resync` suffix.
1517
3. Initial load is performed on the `_resync` tables.
@@ -18,6 +20,7 @@ Resync involves the following operations in order:
1820
All the settings of the original ClickPipe are retained in the resync ClickPipe. The statistics of the original ClickPipe are cleared in the UI.
1921

2022
### Use cases for resyncing a ClickPipe {#use-cases-mysql-resync}
23+
2124
Here are a few scenarios:
2225

2326
1. You may need to perform major schema changes on the source tables which would break the existing ClickPipe and you would need to restart. You can just click Resync after performing the changes.
@@ -29,13 +32,14 @@ since initial load with parallel threads is involved each time.
2932
:::
3033

3134
### Resync ClickPipe Guide {#guide-mysql-resync}
35+
3236
1. In the Data Sources tab, click on the MySQL ClickPipe you wish to resync.
3337
2. Head over to the **Settings** tab.
3438
3. Click on the **Resync** button.
35-
<br/>
36-
<img img={resync_button} border size="md"/>
39+
40+
<Image img={resync_button} border size="md"/>
41+
3742
4. A dialog box should appear for confirmation. Click on Resync again.
38-
<br/>
3943
5. Head over to the **Metrics** tab.
4044
6. In around 5 seconds (and also on page refresh), the status of the pipe should be **Setup** or **Snapshot**.
4145
7. The initial load of the resync can be monitored in the **Tables** tab - in the **Initial Load Stats** section.

docs/integrations/data-ingestion/clickpipes/postgres/controlling_sync.md

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ import edit_sync_button from '@site/static/images/integrations/data-ingestion/cl
99
import create_sync_settings from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/create_sync_settings.png'
1010
import edit_sync_settings from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/sync_settings_edit.png'
1111
import cdc_syncs from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/cdc_syncs.png'
12+
import Image from '@theme/IdealImage';
1213

1314
This document describes how to control the sync of a database ClickPipe (Postgres, MySQL etc.) when the ClickPipe is in **CDC (Running) mode**.
1415

@@ -19,37 +20,45 @@ Database ClickPipes have an architecture that consists of two parallel processes
1920
There are two main ways to control the sync of a database ClickPipe. The ClickPipe will start pushing when one of the below settings kicks in.
2021

2122
### Sync interval {#interval-pg-sync}
23+
2224
The sync interval of the pipe is the amount of time (in seconds) for which the ClickPipe will pull records from the source database. The time to push what we have to ClickHouse is not included in this interval.
2325

2426
The default is **1 minute**.
2527
Sync interval can be set to any positive integer value, but it is recommended to keep it above 10 seconds.
2628

2729
### Pull batch size {#batch-size-pg-sync}
30+
2831
The pull batch size is the number of records that the ClickPipe will pull from the source database in one batch. Records mean inserts, updates and deletes done on the tables that are part of the pipe.
2932

3033
The default is **100,000** records.
3134
A safe maximum is 10 million.
3235

3336
### An exception: Long-running transactions on source {#transactions-pg-sync}
37+
3438
When a transaction is run on the source database, the ClickPipe waits until it receives the COMMIT of the transaction before it moves forward. This with **overrides** both the sync interval and the pull batch size.
3539

3640
### Configuring sync settings {#configuring-pg-sync}
41+
3742
You can set the sync interval and pull batch size when you create a ClickPipe or edit an existing one.
3843
When creating a ClickPipe it will be seen in the second step of the creation wizard, as shown below:
39-
<img src={create_sync_settings} alt="Create sync settings" />
44+
45+
<Image img={create_sync_settings} alt="Create sync settings" size="md"/>
4046

4147
When editing an existing ClickPipe, you can head over to the **Settings** tab of the pipe, pause the pipe and then click on **Configure** here:
42-
<img src={edit_sync_button} alt="Edit sync button" />
48+
49+
<Image img={edit_sync_button} alt="Edit sync button" size="md"/>
4350

4451
This will open a flyout with the sync settings, where you can change the sync interval and pull batch size:
45-
<img src={edit_sync_settings} alt="Edit sync settings" />
52+
53+
<Image img={edit_sync_settings} alt="Edit sync settings" size="md"/>
4654

4755
### Tweaking the sync settings to help with replication slot growth {#tweaking-pg-sync}
56+
4857
Let's talk about how to use these settings to handle a large replication slot of a CDC pipe.
4958
The pushing time to ClickHouse does not scale linearly with the pulling time from the source database. This can be leveraged to reduce the size of a large replication slot.
5059
By increasing both the sync interval and pull batch size, the ClickPipe will pull a whole lot of data from the source database in one go, and then push it to ClickHouse.
5160

5261
### Monitoring sync control behaviour {#monitoring-pg-sync}
5362
You can see how long each batch takes in the **CDC Syncs** table in the **Metrics** tab of the ClickPipe. Note that the duration here includes push time and also if there are no rows incoming, the ClickPipe waits and the wait time is also included in the duration.
5463

55-
<img src={cdc_syncs} alt="CDC Syncs table" />
64+
<Image img={cdc_syncs} alt="CDC Syncs table" size="md"/>

docs/integrations/data-ingestion/clickpipes/postgres/parallel_initial_load.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ sidebar_label: 'How parallel snapshot works'
66
---
77

88
import snapshot_params from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/snapshot_params.png'
9+
import Image from '@theme/IdealImage';
910

1011
This document explains parallelized snapshot/initial load in the Postgres ClickPipe works and talks about the snapshot parameters that can be used to control it.
1112

@@ -22,20 +23,25 @@ The Postgres ClickPipe uses the CTID column to logically partition source tables
2223

2324
Let's talk about the below settings:
2425

25-
<img src={snapshot_params} alt="Snapshot parameters" />
26+
<Image img={snapshot_params} alt="Snapshot parameters" size="md"/>
2627

2728
#### Snapshot number of rows per partition {#numrows-pg-snapshot}
29+
2830
This setting controls how many rows constitute a partition. The ClickPipe will read the source table in chunks of this size, and chunks will be processed in parallel based on the initial load parallelism set. The default value is 100,000 rows per partition.
2931

3032
#### Initial load parallelism {#parallelism-pg-snapshot}
33+
3134
This setting controls how many partitions will be processed in parallel. The default value is 4, which means that the ClickPipe will read 4 partitions of the source table in parallel. This can be increased to speed up the initial load, but it is recommended to keep it to a reasonable value depending on your source instance specs to avoid overwhelming the source database. The ClickPipe will automatically adjust the number of partitions based on the size of the source table and the number of rows per partition.
3235

3336
#### Snapshot number of tables in parallel {#tables-parallel-pg-snapshot}
37+
3438
Not really related to parallel snapshot, but this setting controls how many tables will be processed in parallel during the initial load. The default value is 1. Note that is on top of the parallelism of the partitions, so if you have 4 partitions and 2 tables, the ClickPipe will read 8 partitions in parallel.
3539

3640
### Monitoring parallel snapshot in Postgres {#monitoring-parallel-pg-snapshot}
41+
3742
You can analyze **pg_stat_activity** to see the parallel snapshot in action. The ClickPipe will create multiple connections to the source database, each reading a different partition of the source table. If you see **FETCH** queries with different CTID ranges, it means that the ClickPipe is reading the source tables. You can also see the COUNT(*) and the partitioning query in here.
3843

3944
### Limitations {#limitations-parallel-pg-snapshot}
45+
4046
- The snapshot parameters cannot be edited after pipe creation. If you want to change them, you will have to create a new ClickPipe.
4147
- When adding tables to an existing ClickPipe, you cannot change the snapshot parameters. The ClickPipe will use the existing parameters for the new tables.

docs/integrations/data-ingestion/clickpipes/postgres/pause_and_resume.md

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -19,37 +19,30 @@ There are scenarios where it would be useful to pause a Postgres ClickPipe. For
1919
1. In the Data Sources tab, click on the Postgres ClickPipe you wish to pause.
2020
2. Head over to the **Settings** tab.
2121
3. Click on the **Pause** button.
22-
<br/>
2322

2423
<Image img={pause_button} border size="md"/>
2524

2625
4. A dialog box should appear for confirmation. Click on Pause again.
27-
<br/>
2826

2927
<Image img={pause_dialog} border size="md"/>
3028

3129
4. Head over to the **Metrics** tab.
3230
5. In around 5 seconds (and also on page refresh), the status of the pipe should be **Paused**.
33-
<br/>
3431

3532
:::warning
3633
Pausing a Postgres ClickPipe will not pause the growth of replication slots.
3734
:::
3835

39-
<br/>
40-
4136
<Image img={pause_status} border size="md"/>
4237

4338
## Steps to resume a Postgres ClickPipe {#resume-clickpipe-steps}
4439
1. In the Data Sources tab, click on the Postgres ClickPipe you wish to resume. The status of the mirror should be **Paused** initially.
4540
2. Head over to the **Settings** tab.
4641
3. Click on the **Resume** button.
47-
<br/>
4842

4943
<Image img={resume_button} border size="md"/>
5044

5145
4. A dialog box should appear for confirmation. Click on Resume again.
52-
<br/>
5346

5447
<Image img={resume_dialog} border size="md"/>
5548

docs/integrations/data-ingestion/clickpipes/postgres/resync.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ sidebar_label: 'Resync ClickPipe'
66
---
77

88
import resync_button from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/resync_button.png'
9+
import Image from '@theme/IdealImage';
910

1011
### What does Resync do? {#what-postgres-resync-do}
1112

@@ -18,6 +19,7 @@ Resync involves the following operations in order:
1819
All the settings of the original ClickPipe are retained in the resync ClickPipe. The statistics of the original ClickPipe are cleared in the UI.
1920

2021
### Use cases for resyncing a ClickPipe {#use-cases-postgres-resync}
22+
2123
Here are a few scenarios:
2224

2325
1. You may need to perform major schema changes on the source tables which would break the existing ClickPipe and you would need to restart. You can just click Resync after performing the changes.
@@ -30,13 +32,14 @@ since initial load with parallel threads is involved each time.
3032
:::
3133

3234
### Resync ClickPipe Guide {#guide-postgres-resync}
35+
3336
1. In the Data Sources tab, click on the Postgres ClickPipe you wish to resync.
3437
2. Head over to the **Settings** tab.
3538
3. Click on the **Resync** button.
36-
<br/>
37-
<img img={resync_button} border size="md"/>
39+
40+
<Image img={resync_button} border size="md"/>
41+
3842
4. A dialog box should appear for confirmation. Click on Resync again.
39-
<br/>
4043
5. Head over to the **Metrics** tab.
4144
6. In around 5 seconds (and also on page refresh), the status of the pipe should be **Setup** or **Snapshot**.
4245
7. The initial load of the resync can be monitored in the **Tables** tab - in the **Initial Load Stats** section.

0 commit comments

Comments
 (0)