Skip to content

Commit abd8e3c

Browse files
authored
Merge pull request #3524 from ClickHouse/supported_badges
Community Maintained + ClickHouse Supported Badges
2 parents 6d58818 + e5a21d4 commit abd8e3c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+316
-66
lines changed

docs/integrations/data-ingestion/etl-tools/airbyte-and-clickhouse.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,12 @@ import airbyte06 from '@site/static/images/integrations/data-ingestion/etl-tools
1717
import airbyte07 from '@site/static/images/integrations/data-ingestion/etl-tools/airbyte_07.png';
1818
import airbyte08 from '@site/static/images/integrations/data-ingestion/etl-tools/airbyte_08.png';
1919
import airbyte09 from '@site/static/images/integrations/data-ingestion/etl-tools/airbyte_09.png';
20+
import CommunityMaintainedBadge from '@theme/badges/CommunityMaintained';
2021

2122
# Connect Airbyte to ClickHouse
2223

24+
<CommunityMaintainedBadge/>
25+
2326
:::note
2427
Please note that the Airbyte source and destination for ClickHouse are currently in Alpha status and not suitable for moving large datasets (> 10 million rows)
2528
:::

docs/integrations/data-ingestion/etl-tools/apache-beam.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,12 @@ description: 'Users can ingest data into ClickHouse using Apache Beam'
55
title: 'Integrating Apache Beam and ClickHouse'
66
---
77

8+
import ClickHouseSupportedBadge from '@theme/badges/ClickHouseSupported';
9+
810
# Integrating Apache Beam and ClickHouse
911

12+
<ClickHouseSupportedBadge/>
13+
1014
**Apache Beam** is an open-source, unified programming model that enables developers to define and execute both batch and stream (continuous) data processing pipelines. The flexibility of Apache Beam lies in its ability to support a wide range of data processing scenarios, from ETL (Extract, Transform, Load) operations to complex event processing and real-time analytics.
1115
This integration leverage ClickHouse's official [JDBC connector](https://github.com/ClickHouse/clickhouse-java) for the underlying insertion layer.
1216

@@ -77,7 +81,7 @@ public class Main {
7781
PCollection<Row> rows = lines.apply("ConvertToRow", ParDo.of(new DoFn<String, Row>() {
7882
@ProcessElement
7983
public void processElement(@Element String line, OutputReceiver<Row> out) {
80-
84+
8185
String[] values = line.split(",");
8286
Row row = Row.withSchema(SCHEMA)
8387
.addValues(values[0], Short.parseShort(values[1]), DateTime.now())
@@ -141,10 +145,10 @@ You can adjust the `ClickHouseIO.Write` configuration with the following setter
141145

142146
Please consider the following limitations when using the connector:
143147
* As of today, only Sink operation is supported. The connector doesn't support Source operation.
144-
* ClickHouse performs deduplication when inserting into a `ReplicatedMergeTree` or a `Distributed` table built on top of a `ReplicatedMergeTree`. Without replication, inserting into a regular MergeTree can result in duplicates if an insert fails and then successfully retries. However, each block is inserted atomically, and the block size can be configured using `ClickHouseIO.Write.withMaxInsertBlockSize(long)`. Deduplication is achieved by using checksums of the inserted blocks. For more information about deduplication, please visit [Deduplication](/guides/developer/deduplication) and [Deduplicate insertion config](/operations/settings/settings#insert_deduplicate).
148+
* ClickHouse performs deduplication when inserting into a `ReplicatedMergeTree` or a `Distributed` table built on top of a `ReplicatedMergeTree`. Without replication, inserting into a regular MergeTree can result in duplicates if an insert fails and then successfully retries. However, each block is inserted atomically, and the block size can be configured using `ClickHouseIO.Write.withMaxInsertBlockSize(long)`. Deduplication is achieved by using checksums of the inserted blocks. For more information about deduplication, please visit [Deduplication](/guides/developer/deduplication) and [Deduplicate insertion config](/operations/settings/settings#insert_deduplicate).
145149
* The connector doesn't perform any DDL statements; therefore, the target table must exist prior insertion.
146150

147151

148152
## Related Content {#related-content}
149153
* `ClickHouseIO` class [documentation](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/clickhouse/ClickHouseIO.html).
150-
* `Github` repository of examples [clickhouse-beam-connector](https://github.com/ClickHouse/clickhouse-beam-connector).
154+
* `Github` repository of examples [clickhouse-beam-connector](https://github.com/ClickHouse/clickhouse-beam-connector).

docs/integrations/data-ingestion/etl-tools/dbt/index.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,12 @@ import dbt_04 from '@site/static/images/integrations/data-ingestion/etl-tools/db
1515
import dbt_05 from '@site/static/images/integrations/data-ingestion/etl-tools/dbt/dbt_05.png';
1616
import dbt_06 from '@site/static/images/integrations/data-ingestion/etl-tools/dbt/dbt_06.png';
1717
import dbt_07 from '@site/static/images/integrations/data-ingestion/etl-tools/dbt/dbt_07.png';
18+
import ClickHouseSupportedBadge from '@theme/badges/ClickHouseSupported';
1819

1920
# Integrating dbt and ClickHouse
2021

22+
<ClickHouseSupportedBadge/>
23+
2124
**dbt** (data build tool) enables analytics engineers to transform data in their warehouses by simply writing select statements. dbt handles materializing these select statements into objects in the database in the form of tables and views - performing the T of [Extract Load and Transform (ELT)](https://en.wikipedia.org/wiki/Extract,_load,_transform). Users can create a model defined by a SELECT statement.
2225

2326
Within dbt, these models can be cross-referenced and layered to allow the construction of higher-level concepts. The boilerplate SQL required to connect models is automatically generated. Furthermore, dbt identifies dependencies between models and ensures they are created in the appropriate order using a directed acyclic graph (DAG).

docs/integrations/data-ingestion/etl-tools/dlt-and-clickhouse.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,19 @@ title: 'Connect dlt to ClickHouse'
66
slug: /integrations/data-ingestion/etl-tools/dlt-and-clickhouse
77
---
88

9+
import CommunityMaintainedBadge from '@theme/badges/CommunityMaintained';
10+
911
# Connect dlt to ClickHouse
1012

13+
<CommunityMaintainedBadge/>
14+
1115
<a href="https://dlthub.com/docs/intro" target="_blank">dlt</a> is an open-source library that you can add to your Python scripts to load data from various and often messy data sources into well-structured, live datasets.
1216

1317
## Install dlt with ClickHouse {#install-dlt-with-clickhouse}
1418

1519
### To Install the `dlt` library with ClickHouse dependencies: {#to-install-the-dlt-library-with-clickhouse-dependencies}
1620
```bash
17-
pip install "dlt[clickhouse]"
21+
pip install "dlt[clickhouse]"
1822
```
1923

2024
## Setup Guide {#setup-guide}
@@ -98,7 +102,7 @@ All [write dispositions](https://dlthub.com/docs/general-usage/incremental-loadi
98102

99103
Write dispositions in the dlt library define how the data should be written to the destination. There are three types of write dispositions:
100104

101-
**Replace**: This disposition replaces the data in the destination with the data from the resource. It deletes all the classes and objects and recreates the schema before loading the data. You can learn more about it <a href="https://dlthub.com/docs/general-usage/full-loading">here</a>.
105+
**Replace**: This disposition replaces the data in the destination with the data from the resource. It deletes all the classes and objects and recreates the schema before loading the data. You can learn more about it <a href="https://dlthub.com/docs/general-usage/full-loading">here</a>.
102106

103107
**Merge**: This write disposition merges the data from the resource with the data at the destination. For `merge` disposition, you would need to specify a `primary_key` for the resource. You can learn more about it <a href="https://dlthub.com/docs/general-usage/incremental-loading">here</a>.
104108

@@ -107,7 +111,7 @@ Write dispositions in the dlt library define how the data should be written to t
107111
## Data Loading {#data-loading}
108112
Data is loaded into ClickHouse using the most efficient method depending on the data source:
109113

110-
- For local files, the `clickhouse-connect` library is used to directly load files into ClickHouse tables using the `INSERT` command.
114+
- For local files, the `clickhouse-connect` library is used to directly load files into ClickHouse tables using the `INSERT` command.
111115
- For files in remote storage like `S3`,` Google Cloud Storage`, or `Azure Blob Storage`, ClickHouse table functions like s3, gcs and azureBlobStorage are used to read the files and insert the data into tables.
112116

113117
## Datasets {#datasets}
@@ -220,4 +224,3 @@ Integration with <a href="https://dlthub.com/docs/dlt-ecosystem/transformations/
220224

221225
### Syncing of `dlt` state {#syncing-of-dlt-state}
222226
This destination fully supports <a href="https://dlthub.com/docs/general-usage/state#syncing-state-with-destination">dlt</a> state sync.
223-

docs/integrations/data-ingestion/etl-tools/fivetran/index.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,15 @@ description: 'Users can transform and model their data in ClickHouse using dbt'
66
title: 'Fivetran and ClickHouse Cloud'
77
---
88

9+
import ClickHouseSupportedBadge from '@theme/badges/ClickHouseSupported';
10+
911
# Fivetran and ClickHouse Cloud
1012

13+
<ClickHouseSupportedBadge/>
14+
1115
## Overview {#overview}
1216

13-
[Fivetran](https://www.fivetran.com) is the automated data movement platform moving data out of, into and across your cloud data platforms.
17+
[Fivetran](https://www.fivetran.com) is the automated data movement platform moving data out of, into and across your cloud data platforms.
1418

1519
[ClickHouse Cloud](https://clickhouse.com/cloud) is supported as a [Fivetran destination](https://fivetran.com/docs/destinations/clickhouse), allowing users to load data from various sources into ClickHouse.
1620

docs/integrations/data-ingestion/etl-tools/nifi-and-clickhouse.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,12 @@ import nifi12 from '@site/static/images/integrations/data-ingestion/etl-tools/ni
2424
import nifi13 from '@site/static/images/integrations/data-ingestion/etl-tools/nifi_13.png';
2525
import nifi14 from '@site/static/images/integrations/data-ingestion/etl-tools/nifi_14.png';
2626
import nifi15 from '@site/static/images/integrations/data-ingestion/etl-tools/nifi_15.png';
27+
import CommunityMaintainedBadge from '@theme/badges/CommunityMaintained';
2728

2829
# Connect Apache NiFi to ClickHouse
2930

31+
<CommunityMaintainedBadge/>
32+
3033
<a href="https://nifi.apache.org/" target="_blank">Apache NiFi</a> is an open-source workflow management software designed to automate data flow between software systems. It allows the creation of ETL data pipelines and is shipped with more than 300 data processors. This step-by-step tutorial shows how to connect Apache NiFi to ClickHouse as both a source and destination, and to load a sample dataset.
3134

3235
## 1. Gather your connection details {#1-gather-your-connection-details}

docs/integrations/data-ingestion/etl-tools/vector-to-clickhouse.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,12 @@ title: 'Integrating Vector with ClickHouse'
99
import Image from '@theme/IdealImage';
1010
import vector01 from '@site/static/images/integrations/data-ingestion/etl-tools/vector_01.png';
1111
import vector02 from '@site/static/images/integrations/data-ingestion/etl-tools/vector_02.png';
12+
import CommunityMaintainedBadge from '@theme/badges/CommunityMaintained';
1213

1314
# Integrating Vector with ClickHouse
1415

16+
<CommunityMaintainedBadge/>
17+
1518
Being able to analyze your logs in real time is critical for production applications. Have you ever wondered if ClickHouse is good at storing and analyzing log data? Just checkout <a href="https://eng.uber.com/logging/" target="_blank">Uber's experience</a> with converting their logging infrastructure from ELK to ClickHouse.
1619

1720
This guide shows how to use the popular data pipeline <a href="https://vector.dev/docs/about/what-is-vector/" target="_blank">Vector</a> to tail an Nginx log file and send it to ClickHouse. The steps below would be similar for tailing any type of log file. We will assume you already have ClickHouse up and running and Vector installed (no need to start it yet though).

docs/integrations/data-ingestion/google-dataflow/dataflow.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,12 @@ description: 'Users can ingest data into ClickHouse using Google Dataflow'
66
title: 'Integrating Google Dataflow with ClickHouse'
77
---
88

9+
import ClickHouseSupportedBadge from '@theme/badges/ClickHouseSupported';
10+
911
# Integrating Google Dataflow with ClickHouse
1012

13+
<ClickHouseSupportedBadge/>
14+
1115
[Google Dataflow](https://cloud.google.com/dataflow) is a fully managed stream and batch data processing service. It supports pipelines written in Java or Python and is built on the Apache Beam SDK.
1216

1317
There are two main ways to use Google Dataflow with ClickHouse, both are leveraging [`ClickHouseIO Apache Beam connector`](/integrations/apache-beam):
@@ -29,4 +33,4 @@ ClickHouse offers [predefined templates](./templates) designed for specific use
2933
- Quick and easy setup for simple use cases.
3034
- Suitable also for users with minimal programming expertise.
3135

32-
Both approaches are fully compatible with Google Cloud and the ClickHouse ecosystem, offering flexibility depending on your technical expertise and project requirements.
36+
Both approaches are fully compatible with Google Cloud and the ClickHouse ecosystem, offering flexibility depending on your technical expertise and project requirements.

docs/integrations/data-ingestion/google-dataflow/java-runner.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,16 +6,20 @@ description: 'Users can ingest data into ClickHouse using Google Dataflow Java R
66
title: 'Dataflow Java Runner'
77
---
88

9+
import ClickHouseSupportedBadge from '@theme/badges/ClickHouseSupported';
10+
911
# Dataflow Java Runner
1012

13+
<ClickHouseSupportedBadge/>
14+
1115
The Dataflow Java Runner lets you execute custom Apache Beam pipelines on Google Cloud's Dataflow service. This approach provides maximum flexibility and is well-suited for advanced ETL workflows.
1216

1317
## How It Works {#how-it-works}
1418

15-
1. **Pipeline Implementation**
19+
1. **Pipeline Implementation**
1620
To use the Java Runner, you need to implement your Beam pipeline using the `ClickHouseIO` - our official Apache Beam connector. For code examples and instructions on how to use the `ClickHouseIO`, please visit [ClickHouse Apache Beam](/integrations/apache-beam).
1721

18-
2. **Deployment**
22+
2. **Deployment**
1923
Once your pipeline is implemented and configured, you can deploy it to Dataflow using Google Cloud's deployment tools. Comprehensive deployment instructions are provided in the [Google Cloud Dataflow documentation - Java Pipeline](https://cloud.google.com/dataflow/docs/quickstarts/create-pipeline-java).
2024

21-
**Note**: This approach assumes familiarity with the Beam framework and coding expertise. If you prefer a no-code solution, consider using [ClickHouse's predefined templates](./templates).
25+
**Note**: This approach assumes familiarity with the Beam framework and coding expertise. If you prefer a no-code solution, consider using [ClickHouse's predefined templates](./templates).

docs/integrations/data-ingestion/google-dataflow/templates.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,12 @@ description: 'Users can ingest data into ClickHouse using Google Dataflow Templa
66
title: 'Google Dataflow Templates'
77
---
88

9+
import ClickHouseSupportedBadge from '@theme/badges/ClickHouseSupported';
10+
911
# Google Dataflow Templates
1012

13+
<ClickHouseSupportedBadge/>
14+
1115
Google Dataflow templates provide a convenient way to execute prebuilt, ready-to-use data pipelines without the need to write custom code. These templates are designed to simplify common data processing tasks and are built using [Apache Beam](https://beam.apache.org/), leveraging connectors like `ClickHouseIO` for seamless integration with ClickHouse databases. By running these templates on Google Dataflow, you can achieve highly scalable, distributed data processing with minimal effort.
1216

1317
## Why Use Dataflow Templates? {#why-use-dataflow-templates}
@@ -25,4 +29,4 @@ For detailed step-by-step instructions, refer to the [Google Dataflow Run Pipeli
2529
## List of ClickHouse Templates {#list-of-clickhouse-templates}
2630
* [BigQuery To ClickHouse](./templates/bigquery-to-clickhouse)
2731
* [GCS To ClickHouse](https://github.com/ClickHouse/DataflowTemplates/issues/3) (coming soon!)
28-
* [Pub Sub To ClickHouse](https://github.com/ClickHouse/DataflowTemplates/issues/4) (coming soon!)
32+
* [Pub Sub To ClickHouse](https://github.com/ClickHouse/DataflowTemplates/issues/4) (coming soon!)

0 commit comments

Comments
 (0)