Skip to content

Commit fdfdc98

Browse files
authored
Merge pull request #4048 from ChocZoe/bp_integration
Add BladePipe to integration/data ingestion
2 parents 188c357 + 71bae3b commit fdfdc98

File tree

21 files changed

+598
-5
lines changed

21 files changed

+598
-5
lines changed

docs/chdb/guides/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,4 @@ in the table of contents, please edit the frontmatter of the files directly.
1414
-->
1515

1616
<!--AUTOGENERATED_START-->
17-
<!--AUTOGENERATED_END-->
17+
<!--AUTOGENERATED_END-->

docs/cloud/manage/jan2025_faq/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,4 @@ in the table of contents, please edit the frontmatter of the files directly.
1212
-->
1313

1414
<!--AUTOGENERATED_START-->
15-
<!--AUTOGENERATED_END-->
15+
<!--AUTOGENERATED_END-->

docs/cloud/reference/release-notes-index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,4 @@ description: 'Landing page for Cloud release notes'
1010
-->
1111

1212
<!--AUTOGENERATED_START-->
13-
<!--AUTOGENERATED_END-->
13+
<!--AUTOGENERATED_END-->

docs/getting-started/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,5 @@ functions in ClickHouse. The sample datasets include:
2323
by https://github.com/ClickHouse/clickhouse-docs/blob/main/scripts/autogenerate-table-of-contents.sh -->
2424

2525
<!--AUTOGENERATED_START-->
26+
2627
<!--AUTOGENERATED_END-->

docs/integrations/data-ingestion/data-ingestion-index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ For more information check out the pages below:
1818
| [Azure Synapse](/integrations/azure-synapse) | A fully managed, cloud-based analytics service provided by Microsoft Azure, combining big data and data warehousing to simplify data integration, transformation, and analytics at scale using SQL, Apache Spark, and data pipelines. |
1919
| [Azure Data Factory](/integrations/azure-data-factory) | A cloud-based data integration service that enables you to create, schedule, and orchestrate data workflows at scale. |
2020
| [Apache Beam](/integrations/apache-beam) | An open-source, unified programming model that enables developers to define and execute both batch and stream (continuous) data processing pipelines. |
21+
| [BladePipe](/integrations/bladepipe) | A real-time end-to-end data integration tool with sub-second latency, boosting seamless data flow across platforms. |
2122
| [dbt](/integrations/dbt) | Enables analytics engineers to transform data in their warehouses by simply writing select statements. |
2223
| [dlt](/integrations/data-ingestion/etl-tools/dlt-and-clickhouse) | An open-source library that you can add to your Python scripts to load data from various and often messy data sources into well-structured, live datasets. |
2324
| [Fivetran](/integrations/fivetran) | An automated data movement platform moving data out of, into and across your cloud data platforms. |
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
---
2+
sidebar_label: 'BladePipe'
3+
sidebar_position: 20
4+
keywords: ['clickhouse', 'BladePipe', 'connect', 'integrate', 'cdc', 'etl', 'data integration']
5+
slug: /integrations/bladepipe
6+
description: 'Stream data into ClickHouse using BladePipe data pipelines'
7+
title: 'Connect BladePipe to ClickHouse'
8+
---
9+
10+
import Image from '@theme/IdealImage';
11+
import bp_ck_1 from '@site/static/images/integrations/data-ingestion/etl-tools/bp_ck_1.png';
12+
import bp_ck_2 from '@site/static/images/integrations/data-ingestion/etl-tools/bp_ck_2.png';
13+
import bp_ck_3 from '@site/static/images/integrations/data-ingestion/etl-tools/bp_ck_3.png';
14+
import bp_ck_4 from '@site/static/images/integrations/data-ingestion/etl-tools/bp_ck_4.png';
15+
import bp_ck_5 from '@site/static/images/integrations/data-ingestion/etl-tools/bp_ck_5.png';
16+
import bp_ck_6 from '@site/static/images/integrations/data-ingestion/etl-tools/bp_ck_6.png';
17+
import bp_ck_7 from '@site/static/images/integrations/data-ingestion/etl-tools/bp_ck_7.png';
18+
import bp_ck_8 from '@site/static/images/integrations/data-ingestion/etl-tools/bp_ck_8.png';
19+
import bp_ck_9 from '@site/static/images/integrations/data-ingestion/etl-tools/bp_ck_9.png';
20+
import CommunityMaintainedBadge from '@theme/badges/CommunityMaintained';
21+
22+
# Connect BladePipe to ClickHouse
23+
24+
<CommunityMaintainedBadge/>
25+
26+
27+
<a href="https://www.bladepipe.com/" target="_blank">BladePipe</a> is a real-time end-to-end data integration tool with sub-second latency, boosting seamless data flow across platforms.
28+
29+
ClickHouse is one of BladePipe's pre-built connectors, allowing users to integrate data from various sources into ClickHouse automatically. This page will show how to load data into ClickHouse in real time step by step.
30+
31+
## Supported sources {#supported-sources}
32+
Currently BladePipe supports for data integration to ClickHouse from the following sources:
33+
- MySQL/MariaDB/AuroraMySQL
34+
- Oracle
35+
- PostgreSQL/AuroraPostgreSQL
36+
- MongoDB
37+
- Kafka
38+
- PolarDB-MySQL
39+
- OceanBase
40+
- TiDB
41+
42+
More sources are to be supported.
43+
44+
45+
<VerticalStepper headerLevel="h2">
46+
## Download and run BladePipe {#1-run-bladepipe}
47+
1. Log in to <a href="https://www.bladepipe.com/" target="_blank">BladePipe Cloud</a>.
48+
49+
2. Follow the instructions in <a href="https://doc.bladepipe.com/productOP/byoc/installation/install_worker_docker" target="_blank">Install Worker (Docker)</a> or <a href="https://doc.bladepipe.com/productOP/byoc/installation/install_worker_binary" target="_blank">Install Worker (Binary)</a> to download and install a BladePipe Worker.
50+
51+
:::note
52+
Alternatively, you can download and deploy <a href="https://doc.bladepipe.com/productOP/onPremise/installation/install_all_in_one_binary" target="_blank">BladePipe Enterprise</a>.
53+
:::
54+
55+
## Add ClickHouse as a target {#2-add-clickhouse-as-a-target}
56+
57+
:::note
58+
1. BladePipe supports ClickHouse version `20.12.3.3` or above.
59+
2. To use ClickHouse as a target, make sure that the user has SELECT, INSERT and common DDL permissions.
60+
:::
61+
62+
1. In BladePipe, click "DataSource" > "Add DataSource".
63+
64+
2. Select `ClickHouse`, and fill out the settings by providing your ClickHouse host and port, username and password, and click "Test Connection".
65+
66+
<Image img={bp_ck_1} size="lg" border alt="Add ClickHouse as a target" />
67+
68+
3. Click "Add DataSource" at the bottom, and a ClickHouse instance is added.
69+
70+
## Add MySQL as a source {#3-add-mysql-as-a-source}
71+
In this tutorial, we use a MySQL instance as the source, and explain the process of loading MySQL data to ClickHouse.
72+
73+
:::note
74+
To use MySQL as a source, make sure that the user has the <a href="https://doc.bladepipe.com/dataMigrationAndSync/datasource_func/MySQL/privs_for_mysql" target="_blank">required permissions</a>.
75+
:::
76+
77+
1. In BladePipe, click "DataSource" > "Add DataSource".
78+
79+
2. Select `MySQL`, and fill out the settings by providing your MySQL host and port, username and password, and click "Test Connection".
80+
81+
<Image img={bp_ck_2} size="lg" border alt="Add MySQL as a source" />
82+
83+
3. Click "Add DataSource" at the bottom, and a MySQL instance is added.
84+
85+
86+
## Create a pipeline {#4-create-a-pipeline}
87+
88+
1. In BladePipe, click "DataJob" > "Create DataJob".
89+
90+
2. Select the added MySQL and ClickHouse instances and click "Test Connection" to ensure BladePipe is connected to the instances. Then, select the databases to be moved.
91+
<Image img={bp_ck_3} size="lg" border alt="Select source and target" />
92+
93+
3. Select "Incremental" for DataJob Type, together with the "Full Data" option.
94+
<Image img={bp_ck_4} size="lg" border alt="Select sync type" />
95+
96+
97+
4. Select the tables to be replicated.
98+
<Image img={bp_ck_5} size="lg" border alt="Select tables" />
99+
100+
5. Select the columns to be replicated.
101+
<Image img={bp_ck_6} size="lg" border alt="Select columns" />
102+
103+
104+
6. Confirm the DataJob creation, and the DataJob runs automatically.
105+
<Image img={bp_ck_8} size="lg" border alt="DataJob is running" />
106+
107+
108+
## Verify the data {#5-verify-the-data}
109+
1. Stop data write in MySQL instance and wait for ClickHouse to merge data.
110+
:::note
111+
Due to the unpredictable timing of ClickHouse's automatic merging, you can manually trigger a merging by running the `OPTIMIZE TABLE xxx FINAL;` command. Note that there is a chance that this manual merging may not always succeed.
112+
113+
Alternatively, you can run the `CREATE VIEW xxx_v AS SELECT * FROM xxx FINAL;` command to create a view and perform queries on the view to ensure the data is fully merged.
114+
:::
115+
116+
2. Create a <a href="https://doc.bladepipe.com/operation/job_manage/create_job/create_period_verification_correction_job" target="_blank">Verification DataJob</a>. Once the Verification DataJob is completed, review the results to confirm that the data in ClickHouse is the same as the data in MySQL.
117+
<Image img={bp_ck_9} size="lg" border alt="Verify data" />
118+
119+
</VerticalStepper>
120+

docs/integrations/index.mdx

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,12 @@ import Amazonmsksvg from '@site/static/images/integrations/logos/amazon_msk.svg'
1111
import Astratosvg from '@site/static/images/integrations/logos/astrato_logo.svg';
1212
import apachestreamparkpng from '@site/static/images/integrations/logos/apache-streampark.png';
1313
import Azureeventhubssvg from '@site/static/images/integrations/logos/azure_event_hubs.svg';
14+
import BladePipesvg from '@site/static/images/integrations/logos/bladepipe.svg';
1415
import blinkopspng from '@site/static/images/integrations/logos/blinkops_logo.png';
16+
import CloudCanalsvg from '@site/static/images/integrations/logos/cloudcanal.svg';
1517
import Chdbsvg from '@site/static/images/integrations/logos/chdb.svg';
1618
import Clickhousesvg from '@site/static/images/integrations/logos/clickhouse.svg';
1719
import Clickhousemonitoringdashboardsvg from '@site/static/images/integrations/logos/clickhouse-monitoring-dashboard.svg';
18-
import Cloudcanalsvg from '@site/static/images/integrations/logos/clougence.svg';
1920
import Cloudquerysvg from '@site/static/images/integrations/logos/cloudquery_logo.svg';
2021
import Confluentsvg from '@site/static/images/integrations/logos/confluent.svg';
2122
import Csharpsvg from '@site/static/images/integrations/logos/csharp.svg';
@@ -264,13 +265,14 @@ We are actively compiling this list of ClickHouse integrations below, so it's no
264265
|Atlas|<Image img={atlas_logo} size="logo" alt="Atlas logo"/>|Schema management|Manage your ClickHouse schema as code. |[Documentation](https://atlasgo.io/guides/clickhouse?utm_source=clickhouse&utm_term=docs)|
265266
|Astrato|<Astratosvg alt="Astrato logo" style={{width: '3rem', height: '48px'}}/>|Data visualization|Astrato brings true Self-Service BI to Enterprises & Data Businesses by putting analytics in the hands of every user, enabling them to build dashboards, reports & data apps without IT.|[Documentation](/integrations/astrato)|
266267
|AutoMQ|<Image img={automq_logo} size="logo" alt="AutoMQ logo"/>|Data ingestion|Cloud-native Kafka and RocketMQ alternatives that decouples durability to S3 and EBS|[Website](https://www.automq.com/)|
268+
|BladePipe|<BladePipesvg alt="BladePipe logo" style={{width: '5rem'}}/>|Data ingestion|A real-time end-to-end data integration tool with sub-second latency.|[Documentation](/integrations/data-ingestion/etl-tools/bladepipe-and-clickhouse.md)|
267269
|BlinkOps|<Image img={blinkops_logo} size="logo" alt="BlinkOps Logo"/>|Security automation|Create automations to manage data and user permissions.|[Documentation](https://docs.blinkops.com/docs/integrations/clickhouse)|
268270
|Bytewax|<Bytewaxsvg alt="ByteWax Logo" style={{width: '3rem'}}/>|Data ingestion|Open source Python stream processor for transforming and ingesting data to ClickHouse|[Documentation](https://bytewax.io/blog/building-a-click-house-sink-for-bytewax)|
269271
|Calyptia (Fluent Bit)|<Image img={calyptia_logo} size="logo" alt="Calyptia logo"/>|Data ingestion|CNCF graduated open-source project for the collection, processing, and delivery of logs, metrics, and traces|[Blog](https://clickhouse.com/blog/kubernetes-logs-to-clickhouse-fluent-bit)|
270272
|Chartbrew|<Image img={chartbrew_logo} size="logo" alt="Chartbrew logo" style={{width: '3rem', 'backgroundColor': 'transparent', 'boxShadow': 'none'}}/>|Data visualization|Chartbrew is a data visualization platform that allows users to create dashboards and monitor data in real time.|[Documentation](/integrations/chartbrew-and-clickhouse),<br />[Website](https://chartbrew.com/integrations/clickhouse),<br />[Blog](https://chartbrew.com/blog/visualizing-clickhouse-data-with-chartbrew-a-step-by-step-guide/)|
271-
|CloudCanal|<Cloudcanalsvg className="image" alt="CloudCanal logo" style={{width: '3rem'}}/>|Data integration|A data synchronization and migration tool.|[Website](https://www.cloudcanalx.com/us/)|
272273
|CloudQuery|<Cloudquerysvg className="image" alt="CloudQuery logo" style={{width: '3rem'}}/>|Data ingestion|Open source high-performance ELT framework.|[Documentation](https://www.cloudquery.io/docs/plugins/destinations/clickhouse/overview)|
273274
|Cube.js|<Cubejssvg alt="Cubejs logo" style={{width: '3rem'}}/>|Data visualization|Cube is the Semantic Layer for building data apps.|[Website](https://cube.dev/for/clickhouse-dashboard)|
275+
|CloudCanal|<CloudCanalsvg alt="CloudCanal logo" style={{width: '5rem'}}/>|Data ingestion|A real-time data replication tool for seamless and easy data flow.|[Website](https://www.clougence.com/)|
274276
|DBeaver|<Dbeaversvg alt="Dbeaver logo" style={{width: '3rem'}}/>|SQL client|Free multi-platform database administration tool. Connects to Clickhouse through JDBC driver.|[Documentation](/integrations/sql-clients/dbeaver.md)|
275277
|DataGrip|<Datagripsvg alt="Datagrip logo" style={{width: '3rem'}}/>|SQL client|DataGrip is a powerful database IDE with dedicated support for ClickHouse.|[Documentation](/integrations/sql-clients/datagrip.md)|
276278
|Dataddo|<Image img={dataddo_logo} size="logo" alt="Dataddo logo"/>|Data integration|Data integration platform|[Website](https://www.dataddo.com/storage/clickhouse)|

scripts/aspell-ignore/en/aspell-dict.txt

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,8 @@ Authy
5353
AutoFDO
5454
AutoML
5555
Autocompletion
56+
AuroraMySQL
57+
AuroraPostgreSQL
5658
AvroConfluent
5759
AzureQueue
5860
Azurite
@@ -98,6 +100,8 @@ BetaBadge
98100
Beyonce's
99101
Bigtable
100102
Blazingly
103+
BladePipe
104+
BladePipe's
101105
BlockActiveTime
102106
BlockDiscardBytes
103107
BlockDiscardMerges
@@ -274,6 +278,8 @@ Databricks
274278
Datadog
275279
Datasource
276280
Datasources
281+
DataSource
282+
DataJob
277283
DateTime
278284
DateTimeInputFormat
279285
DateTimeOutputFormat
@@ -797,6 +803,7 @@ NumberOfDatabases
797803
NumberOfDetachedByUserParts
798804
NumberOfDetachedParts
799805
NumberOfTables
806+
OceanBase
800807
ODBCDriver
801808
OFNS
802809
OLAP
@@ -890,6 +897,7 @@ PREWHERE
890897
PROCESSLIST
891898
PROXYv
892899
PRQL
900+
prql
893901
PSUN
894902
Packetbeat
895903
PagerDuty
@@ -928,6 +936,7 @@ Poess
928936
PointDistKm
929937
PointDistM
930938
PointDistRads
939+
PolarDB
931940
PostHistory
932941
PostHog
933942
PostLink
@@ -2079,6 +2088,7 @@ generateULID
20792088
generateUUIDv
20802089
geoDistance
20812090
geoToH
2091+
geotoh
20822092
geoToS
20832093
geobase
20842094
geobases

sidebars.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1029,6 +1029,7 @@ const sidebars = {
10291029
},
10301030
],
10311031
},
1032+
"integrations/data-ingestion/etl-tools/bladepipe-and-clickhouse",
10321033
"integrations/data-ingestion/etl-tools/dbt/index",
10331034
"integrations/data-ingestion/etl-tools/dlt-and-clickhouse",
10341035
"integrations/data-ingestion/etl-tools/fivetran/index",
176 KB
Loading

0 commit comments

Comments
 (0)