Skip to content

Commit 940eb89

Browse files
authored
Merge pull request #3998 from morsapaes/docs-kafka_table_engine
integrations: update Kafka integration overview
2 parents cb54bd4 + 3b9a460 commit 940eb89

File tree

5 files changed

+86
-31
lines changed

5 files changed

+86
-31
lines changed

docs/integrations/data-ingestion/clickpipes/kafka.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ without an embedded schema id, then the specific schema ID or subject must be sp
112112
| Azure Event Hubs |<Azureeventhubssvg class="image" alt="Azure Event Hubs logo" style={{width: '3rem'}}/>|Streaming| Stable | Configure ClickPipes and start ingesting streaming data from Azure Event Hubs into ClickHouse Cloud. |
113113
| WarpStream |<Warpstreamsvg class="image" alt="WarpStream logo" style={{width: '3rem'}}/>|Streaming| Stable | Configure ClickPipes and start ingesting streaming data from WarpStream into ClickHouse Cloud. |
114114

115-
More connectors are will get added to ClickPipes, you can find out more by [contacting us](https://clickhouse.com/company/contact?loc=clickpipes).
115+
More connectors will get added to ClickPipes in the future. You can find out more by [contacting us](https://clickhouse.com/company/contact?loc=clickpipes).
116116

117117
## Supported data formats {#supported-data-formats}
118118

docs/integrations/data-ingestion/kafka/index.md

Lines changed: 80 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -8,47 +8,98 @@ title: 'Integrating Kafka with ClickHouse'
88

99
# Integrating Kafka with ClickHouse
1010

11-
[Apache Kafka](https://kafka.apache.org/) is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. In most cases involving Kafka and ClickHouse, users will wish to insert Kafka based data into ClickHouse. Below we outline several options for both use cases, identifying the pros and cons of each approach.
11+
[Apache Kafka](https://kafka.apache.org/) is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. ClickHouse provides multiple options to **read from** and **write to** Kafka and other Kafka API-compatible brokers (e.g., Redpanda, Amazon MSK).
1212

13-
## Choosing an option {#choosing-an-option}
13+
## Available options {#available-options}
1414

15-
When integrating Kafka with ClickHouse, you will need to make early architectural decisions about the high-level approach used. We outline the most common strategies below:
15+
Choosing the right option for your use case depends on multiple factors, including your ClickHouse deployment type, data flow direction and operational requirements.
1616

17-
### ClickPipes for Kafka (ClickHouse Cloud) {#clickpipes-for-kafka-clickhouse-cloud}
18-
* [**ClickPipes**](../clickpipes/kafka.md) offers the easiest and most intuitive way to ingest data into ClickHouse Cloud. With support for Apache Kafka, Confluent Cloud and Amazon MSK today, and many more data sources coming soon.
17+
|Option | Deployment type | Fully managed | Kafka to ClickHouse | ClickHouse to Kafka |
18+
|---------|------------|:-------------------:|:-------------------:|:------------------:|
19+
| [ClickPipes for Kafka](../clickpipes/kafka.md) | [Cloud], [BYOC] (coming soon!) ||| |
20+
| [Kafka Connect Sink](./kafka-clickhouse-connect-sink.md) | [Cloud], [BYOC], [Self-hosted] | || |
21+
| [Kafka table engine](./kafka-table-engine.md) | [Cloud], [BYOC], [Self-hosted] | |||
1922

20-
### 3rd-Party Cloud-based Kafka Connectivity {#3rd-party-cloud-based-kafka-connectivity}
21-
* [**Confluent Cloud**](./confluent/index.md) - Confluent platform provides an option to upload and [run ClickHouse Connector Sink on Confluent Cloud](./confluent/custom-connector.md) or use [HTTP Sink Connector for Confluent Platform](./confluent/kafka-connect-http.md) that integrates Apache Kafka with an API via HTTP or HTTPS.
23+
For a more detailed comparison between these options, see [Choosing an option](#choosing-an-option).
2224

23-
* [**Amazon MSK**](./msk/index.md) - support Amazon MSK Connect framework to forward data from Apache Kafka clusters to external systems such as ClickHouse. You can install ClickHouse Kafka Connect on Amazon MSK.
25+
### ClickPipes for Kafka {#clickpipes-for-kafka}
2426

25-
* [**Redpanda Cloud**](https://cloud.redpanda.com/) - Redpanda is a Kafka API-compatible streaming data platform that can be used as an upstream data source for ClickHouse. The hosted cloud platform, Redpanda Cloud, integrates with ClickHouse over Kafka protocol, enabling real-time data ingestion for streaming analytics workloads
27+
[ClickPipes](../clickpipes/index.md) is a managed integration platform that makes ingesting data from a diverse set of sources as simple as clicking a few buttons. Because it is fully managed and purpose-built for production workloads, ClickPipes significantly lowers infrastructure and operational costs, removing the need for external data streaming and ETL tools.
2628

27-
### Self-managed Kafka Connectivity {#self-managed-kafka-connectivity}
28-
* [**Kafka Connect**](./kafka-clickhouse-connect-sink.md) - Kafka Connect is a free, open-source component of Apache Kafka that works as a centralized data hub for simple data integration between Kafka and other data systems. Connectors provide a simple means of scalable and reliably streaming data to and from Kafka. Source Connectors inserts data to Kafka topics from other systems, whilst Sink Connectors delivers data from Kafka topics into other data stores such as ClickHouse.
29-
* [**Vector**](./kafka-vector.md) - Vector is a vendor agnostic data pipeline. With the ability to read from Kafka, and send events to ClickHouse, this represents a robust integration option.
30-
* [**JDBC Connect Sink**](./kafka-connect-jdbc.md) - The Kafka Connect JDBC Sink connector allows you to export data from Kafka topics to any relational database with a JDBC driver
31-
* **Custom code** - Custom code using respective client libraries for Kafka and ClickHouse may be appropriate cases where custom processing of events is required. This is beyond the scope of this documentation.
32-
* [**Kafka table engine**](./kafka-table-engine.md) provides a Native ClickHouse integration (not available on ClickHouse Cloud). This table engine **pulls** data from the source system. This requires ClickHouse to have direct access to Kafka.
33-
* [**Kafka table engine with named collections**](./kafka-table-engine-named-collections.md) - Using named collections provides native ClickHouse integration with Kafka. This approach allows secure connections to multiple Kafka clusters, centralizing configuration management and improving scalability and security.
29+
:::tip
30+
This is the recommended option if you're a ClickHouse Cloud user. ClickPipes is **fully managed** and purpose-built to deliver the **best performance** in Cloud environments.
31+
:::
3432

35-
### Choosing an approach {#choosing-an-approach}
36-
It comes down to a few decision points:
33+
#### Main features {#clickpipes-for-kafka-main-features}
3734

38-
* **Connectivity** - The Kafka table engine needs to be able to pull from Kafka if ClickHouse is the destination. This requires bi-directional connectivity. If there is a network separation, e.g. ClickHouse is in the Cloud and Kafka is self-managed, you may be hesitant to remove this for compliance and security reasons. (This approach is not currently supported in ClickHouse Cloud.) The Kafka table engine utilizes resources within ClickHouse itself, utilizing threads for the consumers. Placing this resource pressure on ClickHouse may not be possible due to resource constraints, or your architects may prefer a separation of concerns. In this case, tools such as Kafka Connect, which run as a separate process and can be deployed on different hardware may be preferable. This allows the process responsible for pulling Kafka data to be scaled independently of ClickHouse.
35+
[//]: # "TODO It isn't optimal to link to a static alpha-release of the Terraform provider. Link to a Terraform guide once that's available."
3936

40-
* **Hosting on Cloud** - Cloud vendors may set limitations on Kafka components available on their platform. Follow the guide to explore recommended options for each Cloud vendor.
37+
* Optimized for ClickHouse Cloud, delivering blazing-fast performance
38+
* Horizontal and vertical scalability for high-throughput workloads
39+
* Built-in fault tolerance with configurable replicas and automatic retries
40+
* Deployment and management via ClickHouse Cloud UI, [Open API](../../../cloud/manage/api/api-overview.md), or [Terraform](https://registry.terraform.io/providers/ClickHouse/clickhouse/3.3.3-alpha2/docs/resources/clickpipe)
41+
* Enterprise-grade security with support for cloud-native authorization (IAM) and private connectivity (PrivateLink)
42+
* Supports a wide range of [data sources](../clickpipes/kafka.md#supported-data-sources), including Confluent Cloud, Amazon MSK, Redpanda Cloud, and Azure Event Hubs
43+
* Supports most common serialization formats (JSON, Avro, Protobuf coming soon!)
4144

42-
* **External enrichment** - Whilst messages can be manipulated before insertion into ClickHouse, through the use of functions in the select statement of the materialized view, users may prefer to move complex enrichment external to ClickHouse.
45+
#### Getting started {#clickpipes-for-kafka-getting-started}
4346

44-
* **Data flow direction** - Vector only supports the transfer of data from Kafka to ClickHouse.
47+
To get started using ClickPipes for Kafka, see the [reference documentation](../clickpipes/kafka.md) or navigate to the `Data Sources` tab in the ClickHouse Cloud UI.
4548

46-
## Assumptions {#assumptions}
49+
### Kafka Connect Sink {#kafka-connect-sink}
4750

48-
The user guides linked above assume the following:
51+
Kafka Connect is an open-source framework that works as a centralized data hub for simple data integration between Kafka and other data systems. The [ClickHouse Kafka Connect Sink](https://github.com/ClickHouse/clickhouse-kafka-connect) connector provides a scalable and highly-configurable option to read data from Apache Kafka and other Kafka API-compatible brokers.
4952

50-
* You are familiar with the Kafka fundamentals, such as producers, consumers and topics.
51-
* You have a topic prepared for these examples. We assume all data is stored in Kafka as JSON, although the principles remain the same if using Avro.
52-
* We utilise the excellent [kcat](https://github.com/edenhill/kcat) (formerly kafkacat) in our examples to publish and consume Kafka data.
53-
* Whilst we reference some python scripts for loading sample data, feel free to adapt the examples to your dataset.
54-
* You are broadly familiar with ClickHouse materialized views.
53+
:::tip
54+
This is the recommended option if you prefer **high configurability** or are already a Kafka Connect user.
55+
:::
56+
57+
#### Main features {#kafka-connect-sink-main-features}
58+
59+
* Can be configured to support exactly-once semantics
60+
* Supports most common serialization formats (JSON, Avro, Protobuf)
61+
* Tested continuously against ClickHouse Cloud
62+
63+
#### Getting started {#kafka-connect-sink-getting-started}
64+
65+
To get started using the ClickHouse Kafka Connect Sink, see the [reference documentation](./kafka-clickhouse-connect-sink.md).
66+
67+
### Kafka table engine {#kafka-table-engine}
68+
69+
The [Kafka table engine](./kafka-table-engine.md) can be used to read data from and write data to Apache Kafka and other Kafka API-compatible brokers. This option is bundled with open-source ClickHouse and is available across all deployment types.
70+
71+
:::tip
72+
This is the recommended option if you're self-hosting ClickHouse and need a **low entry barrier** option, or if you need to **write** data to Kafka.
73+
:::
74+
75+
#### Main features {#kafka-table-engine-main-features}
76+
77+
* Can be used for [reading](./kafka-table-engine.md/#kafka-to-clickhouse) and [writing](./kafka-table-engine.md/#clickhouse-to-kafka) data
78+
* Bundled with open-source ClickHouse
79+
* Supports most common serialization formats (JSON, Avro, Protobuf)
80+
81+
#### Getting started {#kafka-table-engine-getting-started}
82+
83+
To get started using the Kafka table engine, see the [reference documentation](./kafka-table-engine.md).
84+
85+
### Choosing an option {#choosing-an-option}
86+
87+
| Product | Strengths | Weaknesses |
88+
|---------|-----------|------------|
89+
| **ClickPipes for Kafka** | • Scalable architecture for high throughput and low latency<br/>• Built-in monitoring and schema management<br/>• Private networking connections (via PrivateLink)<br/>• Supports SSL/TLS authentication and IAM authorization<br/>• Supports programmatic configuration (Terraform, API endpoints) | • Does not support pushing data to Kafka<br/>• At-least-once semantics |
90+
| **Kafka Connect Sink** | • Exactly-once semantics<br/>• Allows granular control over data transformation, batching and error handling<br/>• Can be deployed in private networks<br/>• Allows real-time replication from databases not yet supported in ClickPipes via Debezium | • Does not support pushing data to Kafka<br/>• Operationally complex to set up and maintain<br/>• Requires Kafka and Kafka Connect expertise |
91+
| **Kafka table engine** | • Supports [pushing data to Kafka](./kafka-table-engine.md/#clickhouse-to-kafka)<br/>• Operationally simple to set up | • At-least-once semantics<br/>• Limited horizontal scaling for consumers. Cannot be scaled independently from the ClickHouse server<br/>• Limited error handling and debugging options<br/>• Requires Kafka expertise |
92+
93+
### Other options {#other-options}
94+
95+
* [**Confluent Cloud**](./confluent/index.md) - Confluent Platform provides an option to upload and [run ClickHouse Connector Sink on Confluent Cloud](./confluent/custom-connector.md) or use [HTTP Sink Connector for Confluent Platform](./confluent/kafka-connect-http.md) that integrates Apache Kafka with an API via HTTP or HTTPS.
96+
97+
* [**Vector**](./kafka-vector.md) - Vector is a vendor-agnostic data pipeline. With the ability to read from Kafka, and send events to ClickHouse, this represents a robust integration option.
98+
99+
* [**JDBC Connect Sink**](./kafka-connect-jdbc.md) - The Kafka Connect JDBC Sink connector allows you to export data from Kafka topics to any relational database with a JDBC driver.
100+
101+
* **Custom code** - Custom code using Kafka and ClickHouse [client libraries](../../language-clients/index.md) may be appropriate in cases where custom processing of events is required.
102+
103+
[BYOC]: ../../../cloud/reference/byoc.md
104+
[Cloud]: ../../../cloud-index.md
105+
[Self-hosted]: ../../../intro.md

docs/tutorial.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ show_related_blogs: true
1212

1313
## Overview {#overview}
1414

15-
Learn how to ingest and query data in ClickHouse using a New York City taxi example dataset.
15+
Learn how to ingest and query data in ClickHouse using the New York City taxi example dataset.
1616

1717
### Prerequisites {#prerequisites}
1818

scripts/aspell-dict-file.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -919,6 +919,8 @@ supabase
919919
artitecture
920920
emqx
921921
mqttx
922+
--docs/integrations/data-ingestion/kafka/index.md--
923+
configurability
922924
--docs/integrations/data-ingestion/kafka/confluent/custom-connector.md--
923925
AddCustomConnectorPlugin
924926
--docs/integrations/data-ingestion/kafka/confluent/kafka-connect-http.md--

styles/ClickHouse/Headings.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,8 @@ exceptions:
6262
- Time To Live
6363
- Docker Compose
6464
- Kafka
65+
- Kafka Connect
66+
- Kafka Connect Sink
6567
- Google Cloud Run
6668
- NPM
6769
- OTel

0 commit comments

Comments
 (0)