You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
sidebar_label: 'Integrating Kafka with ClickHouse'
3
-
sidebar_position: 1
4
-
slug: /integrations/kafka
5
-
description: 'Introduction to Kafka with ClickHouse'
6
-
title: 'Integrating Kafka with ClickHouse'
7
-
---
8
-
9
-
import Kafkasvg from '@site/static/images/integrations/logos/kafka.svg';
10
-
import Confluentsvg from '@site/static/images/integrations/logos/confluent.svg';
11
-
import Msksvg from '@site/static/images/integrations/logos/msk.svg';
12
-
import Azureeventhubssvg from '@site/static/images/integrations/logos/azure_event_hubs.svg';
13
-
import Warpstreamsvg from '@site/static/images/integrations/logos/warpstream.svg';
14
-
import redpanda_logo from '@site/static/images/integrations/logos/logo_redpanda.png';
15
-
import Image from '@theme/IdealImage';
16
1
17
2
# Integrating Kafka with ClickHouse
18
3
19
-
[Apache Kafka](https://kafka.apache.org/) is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. ClickHouse provides different options to **read**data from and **write** data to Apache Kafka and other Kafka API-compatible brokers (e.g., Redpanda, Amazon MSK).
4
+
[Apache Kafka](https://kafka.apache.org/) is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. ClickHouse provides multiple options to **read from** and **write to** Kafka and other Kafka API-compatible brokers (e.g., Redpanda, Amazon MSK).
20
5
21
-
## Choosing an option {#choosing-an-option}
6
+
## Available options {#available-options}
22
7
23
-
Choosing the right option for your use case depends on multiple factors, including your ClickHouse deployment type, data flow direction and networking requirements.
8
+
Choosing the right option for your use case depends on multiple factors, including your ClickHouse deployment type, data flow direction and operational requirements.
24
9
25
-
|Option | Deployment | Kafka to ClickHouse | ClickHouse to Kafka |Private Networking|
For a more detailed comparison between these options, see [Choosing an approach](#choosing-an-approach).
16
+
For a more detailed comparison between these options, see [Choosing an option](#choosing-an-option).
32
17
33
18
### ClickPipes for Kafka {#clickpipes-for-kafka}
34
19
35
-
[ClickPipes](../clickpipes.md) is the native integration engine in ClickHouse Cloud and makes ingesting massive volumes of data from a diverse set of sources as simple as clicking a few buttons. It natively supports **private network** connections (i.e., PrivateLink), scaling ingestion and cluster resources **independently**, and **comprehensive monitoring** for streaming data into ClickHouse from Apache Kafka and other Kafka API-compatible brokers.
20
+
[ClickPipes](../clickpipes/index.md) is a managed integration platform that makes ingesting data from a diverse set of sources as simple as clicking a few buttons. Because it is fully managed and purpose-built for production workloads, ClickPipes significantly lowers infrastructure (CAPEX) and operational (OPEC'S) costs, removing the need for external data streaming and ETL tools.
This is the recommended option if you're a ClickHouse Cloud user. ClickPipes is **fully managed** and purpose-built to deliver the **best performance** in Cloud environments.
24
+
:::
45
25
46
-
More connectors will get added to ClickPipes in the future. You can find out more by [contacting us](https://clickhouse.com/company/contact?loc=clickpipes).
26
+
#### Main features {#clickpipes-for-kafka-main-features}
27
+
28
+
[//]: #"TODO(morsapaes) It isn't optimal to link to a static alpha-release of the Terraform provider. Link to a Terraform guide once that's available."
29
+
30
+
* Optimized for ClickHouse Cloud, delivering blazing-fast performance
31
+
* Horizontal and vertical scalability for high-throughput workloads
32
+
* Built-in fault tolerance with configurable replicas and automatic retries
33
+
* Deployment and management via ClickHouse Cloud UI, [Open API](../../../cloud/manage/api/api-overview.md), or [Terraform](https://registry.terraform.io/providers/ClickHouse/clickhouse/3.3.3-alpha2/docs/resources/clickpipe)
34
+
* Enterprise-grade security with support for cloud-native authorization (IAM) and private connectivity (PrivateLink)
35
+
* Support for a wide range of [data sources](../clickpipes/kafka.md#supported-data-sources), including Confluent Cloud, Amazon MSK, Redpanda Cloud, and Azure Event Hubs
36
+
37
+
#### Getting started {#clickpipes-for-kafka-getting-started}
38
+
39
+
To get started using ClickPipes for Kafka, see the [reference documentation](../clickpipes/kafka.md) or navigate to the `Data Sources` tab in the ClickHouse Cloud UI.
47
40
48
41
### Kafka Connect Sink {#kafka-connect-sink}
49
42
50
-
Kafka Connect is an open-source framework that works as a centralized data hub for simple data integration between Kafka and other data systems. The [ClickHouse Kafka Connect Sink](./kafka-clickhouse-connect-sink.md) connector provides a scalable and reliable way to **read** data Apache Kafka and other Kafka API-compatible brokers.
43
+
Kafka Connect is an open-source framework that works as a centralized data hub for simple data integration between Kafka and other data systems. The [ClickHouse Kafka Connect Sink](https://github.com/ClickHouse/clickhouse-kafka-connect) connector provides a scalable and highly-configurable option to read data from Apache Kafka and other Kafka API-compatible brokers.
44
+
45
+
:::tip
46
+
This is the recommended option if you're already a Kafka Connect user. The Kafka Connect Sink offers a rich set of features and configuration options for **advanced tuning**.
47
+
:::
48
+
49
+
#### Main features {#kafka-connect-sink-main-features}
50
+
51
+
* Can be configured to support exactly-once semantics
52
+
* Supports all ClickHouse data types
53
+
* Handles structured data with declared schemas and unstructured JSON data
54
+
* Tested continuously against ClickHouse Cloud
55
+
56
+
#### Getting started {#kafka-connect-sink-getting-started}
57
+
58
+
To get started using the ClickHouse Kafka Connect Sink, see the [reference documentation](./kafka-clickhouse-connect-sink.md).
51
59
52
60
### Kafka Table Engine {#kafka-table-engine}
53
61
54
-
The [Kafka table engine](./kafka-table-engine.md) can be used to **read** data from and **write** data to Apache Kafka and other Kafka API-compatible brokers. This engine does **not** support private network connections, which means your broker(s) must be configured for public access.
62
+
The [Kafka table engine](./kafka-table-engine.md) can be used to read data from and write data to Apache Kafka and other Kafka API-compatible brokers. This option is bundled with open-source ClickHouse and is available across all deployment types.
63
+
64
+
:::tip
65
+
This is the recommended option if you're self-hosting ClickHouse and need a **low entry barrier** option, or if you need to **write** data to Kafka.
66
+
:::
67
+
68
+
#### Main features {#kafka-table-engine-main-features}
69
+
70
+
* Can be used for reading and writing data
71
+
* Bundled with open-source ClickHouse
72
+
* Supports all ClickHouse data types
55
73
56
-
### Choosing an approach {#choosing-an-approach}
74
+
#### Getting started {kafka-table-engine-getting-started}
75
+
76
+
To get started using the Kafka Table Engine, see the [reference documentation](./kafka-table-engine.md).
77
+
78
+
### Choosing an option {#choosing-an-option}
57
79
58
80
| Product | Deployment | Strengths | Weaknesses |
59
81
|---------|------------|-----------|------------|
60
-
|**ClickPipes for Kafka**| CH Cloud | • Native CH Cloud experience for ingesting from Kafka. Built-in monitoring and schema management • Scalable architecture that ensures high throughput and low latency • Supports private networking connections on AWS (via PrivateLink)• Supports SSL/TLS authentication (incl. mTLS) and IAM authorization• Supports programmatic configuration (Terraform, API endpoints) | • Does not support pushing data to Kafka • Does not support AWS Private Link connections to Confluent Cloud • Does not support Private Service Connect or Azure Private Link • Not available on GCP or Azure, though it can connect to services in these cloud providers • At-least-once semantics • Protobuf is not supported yet, only Avro and JSON|
61
-
|**Kafka Connect Sink**| CH CloudCH BYOC OSS CH | • Exactly-once semantics• Allows granular control over data transformation, batching and error handling• Can be deployed in private networks• Allows real-time replication from databases not yet supported in ClickPipes via Debezium | • Does not support pushing data to Kafka• Operationally complex to set up and maintain• Requires Kafka and Kafka Connect expertise |
62
-
|**Kafka Table Engine**| CH CloudCH BYOC OSS CH | • Supports pushing data to Kafka • Supports most common formats (Avro, JSON, Protobuf) • Allows real-time replication from databases not yet supported in ClickPipes via Debezium | • At-least-once semantics • Requires brokers to be exposed to a public network (IP whitelisting possible) • Limited horizontal scaling for consumers. Cannot be scaled independently from the CH server• Limited error handling and debugging information • No SSL/TLS authentication in CH Cloud • Requires Kafka expertise |
82
+
|**ClickPipes for Kafka**|[CH Cloud]| • Scalable architecture for high throughput and low latency<br/>• Built-in monitoring and schema management<br/>• Private networking connections (via PrivateLink)<br/>• Supports SSL/TLS authentication and IAM authorization<br/>• Supports programmatic configuration (Terraform, API endpoints) | • Does not support pushing data to Kafka<br/>• At-least-once semantics |
83
+
|**Kafka Connect Sink**|[CH Cloud]<br/>[CH BYOC]<br/>[CH OSS]| • Exactly-once semantics<br/>• Allows granular control over data transformation, batching and error handling<br/>• Can be deployed in private networks<br/>• Allows real-time replication from databases not yet supported in ClickPipes via Debezium | • Does not support pushing data to Kafka<br/>• Operationally complex to set up and maintain<br/>• Requires Kafka and Kafka Connect expertise |
84
+
|**Kafka Table Engine**|[CH Cloud]<br/>[CH BYOC]<br/>[CH OSS]| • Supports pushing data to Kafka<br/>• Allows real-time replication from databases not yet supported in ClickPipes via Debezium | • At-least-once semantics<br/>• Limited horizontal scaling for consumers. Cannot be scaled independently from the CH server<br/>• Limited error handling and debugging options<br/>• Requires Kafka expertise |
63
85
64
86
### Other options {#other-options}
65
87
@@ -70,3 +92,7 @@ The [Kafka table engine](./kafka-table-engine.md) can be used to **read** data f
70
92
*[**JDBC Connect Sink**](./kafka-connect-jdbc.md) - The Kafka Connect JDBC Sink connector allows you to export data from Kafka topics to any relational database with a JDBC driver.
71
93
72
94
***Custom code** - Custom code using respective client libraries for Kafka and ClickHouse may be appropriate cases where custom processing of events is required. This is beyond the scope of this documentation.
0 commit comments