Skip to content

Commit 23c0e17

Browse files
committed
Apply review suggestions and other improvements
1 parent 7f9a5cf commit 23c0e17

File tree

1 file changed

+66
-40
lines changed
  • docs/integrations/data-ingestion/kafka

1 file changed

+66
-40
lines changed
Lines changed: 66 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,65 +1,87 @@
1-
---
2-
sidebar_label: 'Integrating Kafka with ClickHouse'
3-
sidebar_position: 1
4-
slug: /integrations/kafka
5-
description: 'Introduction to Kafka with ClickHouse'
6-
title: 'Integrating Kafka with ClickHouse'
7-
---
8-
9-
import Kafkasvg from '@site/static/images/integrations/logos/kafka.svg';
10-
import Confluentsvg from '@site/static/images/integrations/logos/confluent.svg';
11-
import Msksvg from '@site/static/images/integrations/logos/msk.svg';
12-
import Azureeventhubssvg from '@site/static/images/integrations/logos/azure_event_hubs.svg';
13-
import Warpstreamsvg from '@site/static/images/integrations/logos/warpstream.svg';
14-
import redpanda_logo from '@site/static/images/integrations/logos/logo_redpanda.png';
15-
import Image from '@theme/IdealImage';
161

172
# Integrating Kafka with ClickHouse
183

19-
[Apache Kafka](https://kafka.apache.org/) is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. ClickHouse provides different options to **read** data from and **write** data to Apache Kafka and other Kafka API-compatible brokers (e.g., Redpanda, Amazon MSK).
4+
[Apache Kafka](https://kafka.apache.org/) is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. ClickHouse provides multiple options to **read from** and **write to** Kafka and other Kafka API-compatible brokers (e.g., Redpanda, Amazon MSK).
205

21-
## Choosing an option {#choosing-an-option}
6+
## Available options {#available-options}
227

23-
Choosing the right option for your use case depends on multiple factors, including your ClickHouse deployment type, data flow direction and networking requirements.
8+
Choosing the right option for your use case depends on multiple factors, including your ClickHouse deployment type, data flow direction and operational requirements.
249

25-
|Option | Deployment | Kafka to ClickHouse | ClickHouse to Kafka | Private Networking |
26-
|---------|------------|---------------------|---------------------|--------------------|
27-
| ClickPipes for Kafka | CH Cloud || ||
28-
| Kafka Connect Sink | CH Cloud, CH BYOC, CH OSS || | |
29-
| Kafka table engine | CH Cloud, CH BYOC, CH OSS ||| |
10+
|Option | Deployment type | Kafka to ClickHouse | ClickHouse to Kafka | Fully managed |
11+
|---------|------------|:-------------------:|:-------------------:|:------------------:|
12+
| [ClickPipes for Kafka](../clickpipes/kafka.md) | [CH Cloud] || ||
13+
| [Kafka Connect Sink](./kafka-clickhouse-connect-sink.md) | [CH Cloud], [CH BYOC], [CH OSS] || | |
14+
| [Kafka table engine](./kafka-table-engine.md) | [CH Cloud], [CH BYOC], [CH OSS] ||| |
3015

31-
For a more detailed comparison between these options, see [Choosing an approach](#choosing-an-approach).
16+
For a more detailed comparison between these options, see [Choosing an option](#choosing-an-option).
3217

3318
### ClickPipes for Kafka {#clickpipes-for-kafka}
3419

35-
[ClickPipes](../clickpipes.md) is the native integration engine in ClickHouse Cloud and makes ingesting massive volumes of data from a diverse set of sources as simple as clicking a few buttons. It natively supports **private network** connections (i.e., PrivateLink), scaling ingestion and cluster resources **independently**, and **comprehensive monitoring** for streaming data into ClickHouse from Apache Kafka and other Kafka API-compatible brokers.
20+
[ClickPipes](../clickpipes/index.md) is a managed integration platform that makes ingesting data from a diverse set of sources as simple as clicking a few buttons. Because it is fully managed and purpose-built for production workloads, ClickPipes significantly lowers infrastructure (CAPEX) and operational (OPEC'S) costs, removing the need for external data streaming and ETL tools.
3621

37-
| Name |Logo|Type| Status | Documentation |
38-
|----------------------|----|----|-----------------|------------------------------------------------------------------------------------------------------|
39-
| Apache Kafka |<Kafkasvg class="image" alt="Apache Kafka logo" style={{width: '3rem', 'height': '3rem'}}/>|Streaming| Stable | [ClickPipes for Kafka integration guide](../clickpipes/kafka.md) |
40-
| Confluent Cloud |<Confluentsvg class="image" alt="Confluent Cloud logo" style={{width: '3rem'}}/>|Streaming| Stable | [ClickPipes for Kafka integration guide](../clickpipes/kafka.md) |
41-
| Redpanda |<Image img={redpanda_logo} size="logo" alt="Redpanda logo"/>|Streaming| Stable | [ClickPipes for Kafka integration guide](../clickpipes/kafka.md) |
42-
| AWS MSK |<Msksvg class="image" alt="AWS MSK logo" style={{width: '3rem', 'height': '3rem'}}/>|Streaming| Stable | [ClickPipes for Kafka integration guide](../clickpipes/kafka.md) |
43-
| Azure Event Hubs |<Azureeventhubssvg class="image" alt="Azure Event Hubs logo" style={{width: '3rem'}}/>|Streaming| Stable | [ClickPipes for Kafka integration guide](../clickpipes/kafka.md) |
44-
| WarpStream |<Warpstreamsvg class="image" alt="WarpStream logo" style={{width: '3rem'}}/>|Streaming| Stable | [ClickPipes for Kafka integration guide](../clickpipes/kafka.md) |
22+
:::tip
23+
This is the recommended option if you're a ClickHouse Cloud user. ClickPipes is **fully managed** and purpose-built to deliver the **best performance** in Cloud environments.
24+
:::
4525

46-
More connectors will get added to ClickPipes in the future. You can find out more by [contacting us](https://clickhouse.com/company/contact?loc=clickpipes).
26+
#### Main features {#clickpipes-for-kafka-main-features}
27+
28+
[//]: # "TODO(morsapaes) It isn't optimal to link to a static alpha-release of the Terraform provider. Link to a Terraform guide once that's available."
29+
30+
* Optimized for ClickHouse Cloud, delivering blazing-fast performance
31+
* Horizontal and vertical scalability for high-throughput workloads
32+
* Built-in fault tolerance with configurable replicas and automatic retries
33+
* Deployment and management via ClickHouse Cloud UI, [Open API](../../../cloud/manage/api/api-overview.md), or [Terraform](https://registry.terraform.io/providers/ClickHouse/clickhouse/3.3.3-alpha2/docs/resources/clickpipe)
34+
* Enterprise-grade security with support for cloud-native authorization (IAM) and private connectivity (PrivateLink)
35+
* Support for a wide range of [data sources](../clickpipes/kafka.md#supported-data-sources), including Confluent Cloud, Amazon MSK, Redpanda Cloud, and Azure Event Hubs
36+
37+
#### Getting started {#clickpipes-for-kafka-getting-started}
38+
39+
To get started using ClickPipes for Kafka, see the [reference documentation](../clickpipes/kafka.md) or navigate to the `Data Sources` tab in the ClickHouse Cloud UI.
4740

4841
### Kafka Connect Sink {#kafka-connect-sink}
4942

50-
Kafka Connect is an open-source framework that works as a centralized data hub for simple data integration between Kafka and other data systems. The [ClickHouse Kafka Connect Sink](./kafka-clickhouse-connect-sink.md) connector provides a scalable and reliable way to **read** data Apache Kafka and other Kafka API-compatible brokers.
43+
Kafka Connect is an open-source framework that works as a centralized data hub for simple data integration between Kafka and other data systems. The [ClickHouse Kafka Connect Sink](https://github.com/ClickHouse/clickhouse-kafka-connect) connector provides a scalable and highly-configurable option to read data from Apache Kafka and other Kafka API-compatible brokers.
44+
45+
:::tip
46+
This is the recommended option if you're already a Kafka Connect user. The Kafka Connect Sink offers a rich set of features and configuration options for **advanced tuning**.
47+
:::
48+
49+
#### Main features {#kafka-connect-sink-main-features}
50+
51+
* Can be configured to support exactly-once semantics
52+
* Supports all ClickHouse data types
53+
* Handles structured data with declared schemas and unstructured JSON data
54+
* Tested continuously against ClickHouse Cloud
55+
56+
#### Getting started {#kafka-connect-sink-getting-started}
57+
58+
To get started using the ClickHouse Kafka Connect Sink, see the [reference documentation](./kafka-clickhouse-connect-sink.md).
5159

5260
### Kafka Table Engine {#kafka-table-engine}
5361

54-
The [Kafka table engine](./kafka-table-engine.md) can be used to **read** data from and **write** data to Apache Kafka and other Kafka API-compatible brokers. This engine does **not** support private network connections, which means your broker(s) must be configured for public access.
62+
The [Kafka table engine](./kafka-table-engine.md) can be used to read data from and write data to Apache Kafka and other Kafka API-compatible brokers. This option is bundled with open-source ClickHouse and is available across all deployment types.
63+
64+
:::tip
65+
This is the recommended option if you're self-hosting ClickHouse and need a **low entry barrier** option, or if you need to **write** data to Kafka.
66+
:::
67+
68+
#### Main features {#kafka-table-engine-main-features}
69+
70+
* Can be used for reading and writing data
71+
* Bundled with open-source ClickHouse
72+
* Supports all ClickHouse data types
5573

56-
### Choosing an approach {#choosing-an-approach}
74+
#### Getting started {kafka-table-engine-getting-started}
75+
76+
To get started using the Kafka Table Engine, see the [reference documentation](./kafka-table-engine.md).
77+
78+
### Choosing an option {#choosing-an-option}
5779

5880
| Product | Deployment | Strengths | Weaknesses |
5981
|---------|------------|-----------|------------|
60-
| **ClickPipes for Kafka** | CH Cloud |Native CH Cloud experience for ingesting from Kafka. Built-in monitoring and schema management • Scalable architecture that ensures high throughput and low latency • Supports private networking connections on AWS (via PrivateLink) • Supports SSL/TLS authentication (incl. mTLS) and IAM authorization • Supports programmatic configuration (Terraform, API endpoints) | • Does not support pushing data to Kafka • Does not support AWS Private Link connections to Confluent Cloud • Does not support Private Service Connect or Azure Private Link • Not available on GCP or Azure, though it can connect to services in these cloud providers • At-least-once semantics • Protobuf is not supported yet, only Avro and JSON |
61-
| **Kafka Connect Sink** | CH Cloud CH BYOC OSS CH | • Exactly-once semantics • Allows granular control over data transformation, batching and error handling • Can be deployed in private networks • Allows real-time replication from databases not yet supported in ClickPipes via Debezium | • Does not support pushing data to Kafka • Operationally complex to set up and maintain • Requires Kafka and Kafka Connect expertise |
62-
| **Kafka Table Engine** | CH Cloud CH BYOC OSS CH | • Supports pushing data to Kafka • Supports most common formats (Avro, JSON, Protobuf) • Allows real-time replication from databases not yet supported in ClickPipes via Debezium | • At-least-once semantics • Requires brokers to be exposed to a public network (IP whitelisting possible) • Limited horizontal scaling for consumers. Cannot be scaled independently from the CH server • Limited error handling and debugging information • No SSL/TLS authentication in CH Cloud • Requires Kafka expertise |
82+
| **ClickPipes for Kafka** | [CH Cloud] |Scalable architecture for high throughput and low latency<br/>• Built-in monitoring and schema management<br/>• Private networking connections (via PrivateLink)<br/>• Supports SSL/TLS authentication and IAM authorization<br/>• Supports programmatic configuration (Terraform, API endpoints) | • Does not support pushing data to Kafka<br/>• At-least-once semantics |
83+
| **Kafka Connect Sink** | [CH Cloud]<br/>[CH BYOC]<br/>[CH OSS] | • Exactly-once semantics<br/>• Allows granular control over data transformation, batching and error handling<br/>• Can be deployed in private networks<br/>• Allows real-time replication from databases not yet supported in ClickPipes via Debezium | • Does not support pushing data to Kafka<br/>• Operationally complex to set up and maintain<br/>• Requires Kafka and Kafka Connect expertise |
84+
| **Kafka Table Engine** | [CH Cloud]<br/>[CH BYOC]<br/>[CH OSS] | • Supports pushing data to Kafka<br/>• Allows real-time replication from databases not yet supported in ClickPipes via Debezium | • At-least-once semantics<br/>• Limited horizontal scaling for consumers. Cannot be scaled independently from the CH server<br/>• Limited error handling and debugging options<br/>• Requires Kafka expertise |
6385

6486
### Other options {#other-options}
6587

@@ -70,3 +92,7 @@ The [Kafka table engine](./kafka-table-engine.md) can be used to **read** data f
7092
* [**JDBC Connect Sink**](./kafka-connect-jdbc.md) - The Kafka Connect JDBC Sink connector allows you to export data from Kafka topics to any relational database with a JDBC driver.
7193

7294
* **Custom code** - Custom code using respective client libraries for Kafka and ClickHouse may be appropriate cases where custom processing of events is required. This is beyond the scope of this documentation.
95+
96+
[CH BYOC]: ../../../cloud/reference/byoc.md
97+
[CH Cloud]: ../../../cloud-index.md
98+
[CH OSS]: ../../../intro.md

0 commit comments

Comments
 (0)