Skip to content

Commit 5c723c5

Browse files
authored
Merge pull request #3941 from ClickHouse/ks/add-best-practices
Add Kafka ClickPipe best practices
2 parents 51fc786 + cd87d03 commit 5c723c5

File tree

1 file changed

+15
-9
lines changed
  • docs/integrations/data-ingestion/clickpipes

1 file changed

+15
-9
lines changed

docs/integrations/data-ingestion/clickpipes/kafka.md

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ without an embedded schema id, then the specific schema ID or subject must be sp
101101

102102
11. **Congratulations!** you have successfully set up your first ClickPipe. If this is a streaming ClickPipe it will be continuously running, ingesting data in real-time from your remote data source.
103103

104-
## Supported Data Sources {#supported-data-sources}
104+
## Supported data sources {#supported-data-sources}
105105

106106
| Name |Logo|Type| Status | Description |
107107
|----------------------|----|----|-----------------|------------------------------------------------------------------------------------------------------|
@@ -114,14 +114,14 @@ without an embedded schema id, then the specific schema ID or subject must be sp
114114

115115
More connectors are will get added to ClickPipes, you can find out more by [contacting us](https://clickhouse.com/company/contact?loc=clickpipes).
116116

117-
## Supported Data Formats {#supported-data-formats}
117+
## Supported data formats {#supported-data-formats}
118118

119119
The supported formats are:
120120
- [JSON](../../../interfaces/formats.md/#json)
121121
- [AvroConfluent](../../../interfaces/formats.md/#data-format-avro-confluent)
122122

123123

124-
### Supported Data Types {#supported-data-types}
124+
### Supported data types {#supported-data-types}
125125

126126
#### Standard types support {#standard-types-support}
127127
The following standard ClickHouse data types are currently supported in ClickPipes:
@@ -169,7 +169,7 @@ Note that you will have to manually change the destination column to the desired
169169

170170
ClickPipes supports all Avro Primitive and Complex types, and all Avro Logical types except `time-millis`, `time-micros`, `local-timestamp-millis`, `local_timestamp-micros`, and `duration`. Avro `record` types are converted to Tuple, `array` types to Array, and `map` to Map (string keys only). In general the conversions listed [here](/interfaces/formats/Avro#data-types-matching) are available. We recommend using exact type matching for Avro numeric types, as ClickPipes does not check for overflow or precision loss on type conversion.
171171

172-
#### Nullable Types and Avro Unions {#nullable-types-and-avro-unions}
172+
#### Nullable types and Avro unions {#nullable-types-and-avro-unions}
173173

174174
Nullable types in Avro are defined by using a Union schema of `(T, null)` or `(null, T)` where T is the base Avro type. During schema inference, such unions will be mapped to a ClickHouse "Nullable" column. Note that ClickHouse does not support
175175
`Nullable(Array)`, `Nullable(Map)`, or `Nullable(Tuple)` types. Avro null unions for these types will be mapped to non-nullable versions (Avro Record types are mapped to a ClickHouse named Tuple). Avro "nulls" for these types will be inserted as:
@@ -179,7 +179,7 @@ Nullable types in Avro are defined by using a Union schema of `(T, null)` or `(n
179179

180180
ClickPipes does not currently support schemas that contain other Avro Unions (this may change in the future with the maturity of the new ClickHouse Variant and JSON data types). If the Avro schema contains a "non-null" union, ClickPipes will generate an error when attempting to calculate a mapping between the Avro schema and Clickhouse column types.
181181

182-
#### Avro Schema Management {#avro-schema-management}
182+
#### Avro schema management {#avro-schema-management}
183183

184184
ClickPipes dynamically retrieves and applies the Avro schema from the configured Schema Registry using the schema ID embedded in each message/event. Schema updates are detected and processed automatically.
185185

@@ -190,7 +190,7 @@ The following rules are applied to the mapping between the retrieved Avro schema
190190
- If the Avro schema is missing a field defined in the ClickHouse destination mapping, the ClickHouse column will be populated with a "zero" value, such as 0 or an empty string. Note that [DEFAULT](/sql-reference/statements/create/table#default) expressions are not currently evaluated for ClickPipes inserts (this is temporary limitation pending updates to the ClickHouse server default processing).
191191
- If the Avro schema field and the ClickHouse column are incompatible, inserts of that row/message will fail, and the failure will be recorded in the ClickPipes errors table. Note that several implicit conversions are supported (like between numeric types), but not all (for example, an Avro `record` field can not be inserted into an `Int32` ClickHouse column).
192192

193-
## Kafka Virtual Columns {#kafka-virtual-columns}
193+
## Kafka virtual columns {#kafka-virtual-columns}
194194

195195
The following virtual columns are supported for Kafka compatible streaming data sources. When creating a new destination table virtual columns can be added by using the `Add Column` button.
196196

@@ -208,6 +208,12 @@ The following virtual columns are supported for Kafka compatible streaming data
208208
Note that the _raw_message column is only recommended for JSON data. For use cases where only the JSON string is required (such as using ClickHouse [`JsonExtract*`](/sql-reference/functions/json-functions#jsonextract-functions) functions to populate a downstream materialized
209209
view), it may improve ClickPipes performance to delete all the "non-virtual" columns.
210210

211+
## Best practices {#best-practices}
212+
213+
### Message Compression {#compression}
214+
We strongly recommend using compression for your Kafka topics. Compression can result in a significant saving in data transfer costs with virtually no performance hit.
215+
To learn more about message compression in Kafka, we recommend starting with this [guide](https://www.confluent.io/blog/apache-kafka-message-compression/).
216+
211217
## Limitations {#limitations}
212218

213219
- [DEFAULT](/sql-reference/statements/create/table#default) is not supported.
@@ -269,7 +275,7 @@ Below is an example of the required IAM policy for Apache Kafka APIs for MSK:
269275
}
270276
```
271277

272-
#### Configuring a Trusted Relationship {#configuring-a-trusted-relationship}
278+
#### Configuring a trusted relationship {#configuring-a-trusted-relationship}
273279

274280
If you are authenticating to MSK with a IAM role ARN, you will need to add a trusted relationship between your ClickHouse Cloud instance so the role can be assumed.
275281

@@ -343,11 +349,11 @@ the ClickPipe will automatically restart the consumer and continue processing me
343349

344350
- **What are the requirements for using ClickPipes for Kafka?**
345351

346-
In order to use ClickPipes for Kafka, you will need a running Kafka broker and a ClickHouse Cloud service with ClickPipes enabled. You will also need to ensure that ClickHouse Cloud can access your Kafka broker. This can be achieved by allowing remote connection on the Kafka side, whitelisting [ClickHouse Cloud Egress IP addresses](/manage/security/cloud-endpoints-api) in your Kafka setup.
352+
In order to use ClickPipes for Kafka, you will need a running Kafka broker and a ClickHouse Cloud service with ClickPipes enabled. You will also need to ensure that ClickHouse Cloud can access your Kafka broker. This can be achieved by allowing remote connection on the Kafka side, whitelisting [ClickHouse Cloud Egress IP addresses](/manage/security/cloud-endpoints-api) in your Kafka setup. Alternatively, you can use [AWS PrivateLink](/integrations/clickpipes/aws-privatelink) to connect ClickPipes for Kafka to your Kafka brokers.
347353

348354
- **Does ClickPipes for Kafka support AWS PrivateLink?**
349355

350-
AWS PrivateLink is supported. Please [contact us](https://clickhouse.com/company/contact?loc=clickpipes) for more information.
356+
AWS PrivateLink is supported. See [the documentation](/integrations/clickpipes/aws-privatelink) for more information on how to set it up.
351357

352358
- **Can I use ClickPipes for Kafka to write data to a Kafka topic?**
353359

0 commit comments

Comments
 (0)