Skip to content

Commit acb0cca

Browse files
authored
Merge pull request #4000 from ClickHouse/ks/update-kafka-clickpipe-setup
Update kafka ClickPipe setup
2 parents 6f3115e + 3c09852 commit acb0cca

File tree

9 files changed

+60
-60
lines changed

9 files changed

+60
-60
lines changed

docs/integrations/data-ingestion/clickpipes/kafka.md

Lines changed: 60 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -12,19 +12,14 @@ import Msksvg from '@site/static/images/integrations/logos/msk.svg';
1212
import Azureeventhubssvg from '@site/static/images/integrations/logos/azure_event_hubs.svg';
1313
import Warpstreamsvg from '@site/static/images/integrations/logos/warpstream.svg';
1414
import redpanda_logo from '@site/static/images/integrations/logos/logo_redpanda.png';
15-
import cp_service from '@site/static/images/integrations/data-ingestion/clickpipes/cp_service.png';
1615
import cp_step0 from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step0.png';
1716
import cp_step1 from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step1.png';
1817
import cp_step2 from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step2.png';
1918
import cp_step3 from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step3.png';
2019
import cp_step4a from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step4a.png';
21-
import cp_step4a3 from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step4a3.png';
22-
import cp_step4b from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step4b.png';
2320
import cp_step5 from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step5.png';
24-
import cp_success from '@site/static/images/integrations/data-ingestion/clickpipes/cp_success.png';
25-
import cp_remove from '@site/static/images/integrations/data-ingestion/clickpipes/cp_remove.png';
26-
import cp_destination from '@site/static/images/integrations/data-ingestion/clickpipes/cp_destination.png';
2721
import cp_overview from '@site/static/images/integrations/data-ingestion/clickpipes/cp_overview.png';
22+
import cp_table_settings from '@site/static/images/integrations/data-ingestion/clickpipes/cp_table_settings.png';
2823
import Image from '@theme/IdealImage';
2924

3025
# Integrating Kafka with ClickHouse Cloud
@@ -33,73 +28,91 @@ You have familiarized yourself with the [ClickPipes intro](./index.md).
3328

3429
## Creating your first Kafka ClickPipe {#creating-your-first-kafka-clickpipe}
3530

36-
1. Access the SQL Console for your ClickHouse Cloud Service.
31+
<VerticalStepper type="numbered" headerLevel="h3">
3732

38-
<Image img={cp_service} alt="ClickPipes service" size="md" border/>
33+
### Navigate to data sources {#1-load-sql-console}
34+
Select the `Data Sources` button on the left-side menu and click on "Set up a ClickPipe".
35+
<Image img={cp_step0} alt="Select imports" size="md"/>
3936

37+
### Select a data source {#2-select-data-source}
38+
Select your Kafka data source from the list.
39+
<Image img={cp_step1} alt="Select data source type" size="md"/>
4040

41-
2. Select the `Data Sources` button on the left-side menu and click on "Set up a ClickPipe"
41+
### Configure the data source {#3-configure-data-source}
42+
Fill out the form by providing your ClickPipe with a name, a description (optional), your credentials, and other connection details.
43+
<Image img={cp_step2} alt="Fill out connection details" size="md"/>
4244

43-
<Image img={cp_step0} alt="Select imports" size="lg" border/>
45+
### Configure a schema registry (optional) {#4-configure-your-schema-registry}
46+
A valid schema is required for Avro streams. See [Schema registries](#schema-registries) for more details on how to configure a schema registry.
4447

45-
3. Select your data source.
48+
### Configure a reverse private endpoint (optional) {#5-configure-reverse-private-endpoint}
49+
Configure a Reverse Private Endpoint to allow ClickPipes to connect to your Kafka cluster using AWS PrivateLink.
50+
See our [AWS PrivateLink documentation](./aws-privatelink.md) for more information.
4651

47-
<Image img={cp_step1} alt="Select data source type" size="lg" border/>
52+
### Select your topic {#6-select-your-topic}
53+
Select your topic and the UI will display a sample document from the topic.
54+
<Image img={cp_step3} alt="Set your topic" size="md"/>
4855

49-
4. Fill out the form by providing your ClickPipe with a name, a description (optional), your credentials, and other connection details.
56+
### Configure your destination table {#7-configure-your-destination-table}
5057

51-
<Image img={cp_step2} alt="Fill out connection details" size="lg" border/>
58+
In the next step, you can select whether you want to ingest data into a new ClickHouse table or reuse an existing one. Follow the instructions in the screen to modify your table name, schema, and settings. You can see a real-time preview of your changes in the sample table at the top.
5259

53-
5. Configure the schema registry. A valid schema is required for Avro streams and optional for JSON. This schema will be used to parse [AvroConfluent](../../../interfaces/formats.md/#data-format-avro-confluent) or validate JSON messages on the selected topic.
54-
- Avro messages that cannot be parsed or JSON messages that fail validation will generate an error.
55-
- the "root" path of the schema registry. For example, a Confluent Cloud schema registry URL is just an HTTPS url with no path, like `https://test-kk999.us-east-2.aws.confluent.cloud` If only the root
56-
path is specified, the schema used to determine column names and types in step 4 will be determined by the id embedded in the sampled Kafka messages.
57-
- the path `/schemas/ids/[ID]` to the schema document by the numeric schema id. A complete url using a schema id would be `https://registry.example.com/schemas/ids/1000`
58-
- the path `/subjects/[subject_name]` to the schema document by subject name. Optionally, a specific version can be referenced by appending `/versions/[version]` to the url (otherwise ClickPipes
59-
will retrieve the latest version). A complete url using a schema subject would be `https://registry.example.com/subjects/events` or `https://registry/example.com/subjects/events/versions/4`
60+
<Image img={cp_step4a} alt="Set table, schema, and settings" size="md"/>
6061

61-
Note that in all cases ClickPipes will automatically retrieve an updated or different schema from the registry if indicated by the schema ID embedded in the message. If the message is written
62-
without an embedded schema id, then the specific schema ID or subject must be specified to parse all messages.
62+
You can also customize the advanced settings using the controls provided
6363

64-
6. Select your topic and the UI will display a sample document from the topic.
64+
<Image img={cp_table_settings} alt="Set advanced controls" size="md"/>
6565

66-
<Image img={cp_step3} alt="Set data format and topic" size="lg" border/>
6766

68-
7. In the next step, you can select whether you want to ingest data into a new ClickHouse table or reuse an existing one. Follow the instructions in the screen to modify your table name, schema, and settings. You can see a real-time preview of your changes in the sample table at the top.
69-
70-
<Image img={cp_step4a} alt="Set table, schema, and settings" size="lg" border/>
71-
72-
You can also customize the advanced settings using the controls provided
73-
74-
<Image img={cp_step4a3} alt="Set advanced controls" size="lg" border/>
67+
### Configure permissions {#8-configure-permissions}
68+
ClickPipes will create a dedicated user for writing data into a destination table. You can select a role for this internal user using a custom role or one of the predefined role:
69+
- `Full access`: with the full access to the cluster. It might be useful if you use Materialized View or Dictionary with the destination table.
70+
- `Only destination table`: with the `INSERT` permissions to the destination table only.
7571

76-
8. Alternatively, you can decide to ingest your data in an existing ClickHouse table. In that case, the UI will allow you to map fields from the source to the ClickHouse fields in the selected destination table.
72+
<Image img={cp_step5} alt="Permissions" size="md"/>
7773

78-
<Image img={cp_step4b} alt="Use an existing table" size="lg" border/>
74+
### Complete setup {#9-complete-setup}
75+
Clicking on "Create ClickPipe" will create and run your ClickPipe. It will now be listed in the Data Sources section.
7976

80-
9. Finally, you can configure permissions for the internal ClickPipes user.
77+
<Image img={cp_overview} alt="View overview" size="md"/>
8178

82-
**Permissions:** ClickPipes will create a dedicated user for writing data into a destination table. You can select a role for this internal user using a custom role or one of the predefined role:
83-
- `Full access`: with the full access to the cluster. It might be useful if you use Materialized View or Dictionary with the destination table.
84-
- `Only destination table`: with the `INSERT` permissions to the destination table only.
79+
</VerticalStepper>
8580

86-
<Image img={cp_step5} alt="Permissions" size="lg" border/>
81+
## Schema registries {#schema-registries}
82+
ClickPipes supports schema registries for Avro data streams.
8783

88-
10. By clicking on "Complete Setup", the system will register you ClickPipe, and you'll be able to see it listed in the summary table.
84+
### Supported registries {#supported-schema-registries}
85+
Schema registries that use the Confluent Schema Registry API are supported. This includes:
86+
- Confluent Kafka and Cloud
87+
- Redpanda
88+
- AWS MSK
89+
- Upstash
8990

90-
<Image img={cp_success} alt="Success notice" size="sm" border/>
91+
ClickPipes is not currently compatible with the AWS Glue Schema registry or the Azure Schema Registry.
9192

92-
<Image img={cp_remove} alt="Remove notice" size="lg" border/>
93+
### Configuration {#schema-registry-configuration}
9394

94-
The summary table provides controls to display sample data from the source or the destination table in ClickHouse
95+
ClickPipes with Avro data require a schema registry. This can be configured in one of three ways:
96+
97+
1. Providing a complete path to the schema subject (e.g. `https://registry.example.com/subjects/events`)
98+
- Optionally, a specific version can be referenced by appending `/versions/[version]` to the url (otherwise ClickPipes will retrieve the latest version).
99+
2. Providing a complete path to the schema id (e.g. `https://registry.example.com/schemas/ids/1000`)
100+
3. Providing the root schema registry URL (e.g. `https://registry.example.com`)
95101

96-
<Image img={cp_destination} alt="View destination" size="lg" border/>
102+
### How it works {#how-schema-registries-work}
103+
ClickPipes dynamically retrieves and applies the Avro schema from the configured Schema Registry.
104+
- If there's a schema id embedded in the message, it will use that to retrieve the schema.
105+
- If there's no schema id embedded in the message, it will use the schema id or subject name specified in the ClickPipe configuration to retrieve the schema.
106+
- If the message is written without an embedded schema id, and no schema id or subject name is specified in the ClickPipe configuration, then the schema will not be retrieved and the message will be skipped with a `SOURCE_SCHEMA_ERROR` logged in the ClickPipes errors table.
107+
- If the message does not conform to the schema, then the message will be skipped with a `DATA_PARSING_ERROR` logged in the ClickPipes errors table.
97108

98-
As well as controls to remove the ClickPipe and display a summary of the ingest job.
109+
### Schema mapping {#schema-mapping}
110+
The following rules are applied to the mapping between the retrieved Avro schema and the ClickHouse destination table:
99111

100-
<Image img={cp_overview} alt="View overview" size="lg" border/>
112+
- If the Avro schema contains a field that is not included in the ClickHouse destination mapping, that field is ignored.
113+
- If the Avro schema is missing a field defined in the ClickHouse destination mapping, the ClickHouse column will be populated with a "zero" value, such as 0 or an empty string. Note that DEFAULT expressions are not currently evaluated for ClickPipes inserts (this is temporary limitation pending updates to the ClickHouse server default processing).
114+
- If the Avro schema field and the ClickHouse column are incompatible, inserts of that row/message will fail, and the failure will be recorded in the ClickPipes errors table. Note that several implicit conversions are supported (like between numeric types), but not all (for example, an Avro record field can not be inserted into an Int32 ClickHouse column).
101115

102-
11. **Congratulations!** you have successfully set up your first ClickPipe. If this is a streaming ClickPipe it will be continuously running, ingesting data in real-time from your remote data source.
103116

104117
## Supported data sources {#supported-data-sources}
105118

@@ -177,19 +190,6 @@ Nullable types in Avro are defined by using a Union schema of `(T, null)` or `(n
177190
- An empty Map for a null Avro Map
178191
- A named Tuple with all default/zero values for a null Avro Record
179192

180-
ClickPipes does not currently support schemas that contain other Avro Unions (this may change in the future with the maturity of the new ClickHouse Variant and JSON data types). If the Avro schema contains a "non-null" union, ClickPipes will generate an error when attempting to calculate a mapping between the Avro schema and Clickhouse column types.
181-
182-
#### Avro schema management {#avro-schema-management}
183-
184-
ClickPipes dynamically retrieves and applies the Avro schema from the configured Schema Registry using the schema ID embedded in each message/event. Schema updates are detected and processed automatically.
185-
186-
At this time ClickPipes is only compatible with schema registries that use the [Confluent Schema Registry API](https://docs.confluent.io/platform/current/schema-registry/develop/api.html). In addition to Confluent Kafka and Cloud, this includes the Redpanda, AWS MSK, and Upstash schema registries. ClickPipes is not currently compatible with the AWS Glue Schema registry or the Azure Schema Registry (coming soon).
187-
188-
The following rules are applied to the mapping between the retrieved Avro schema and the ClickHouse destination table:
189-
- If the Avro schema contains a field that is not included in the ClickHouse destination mapping, that field is ignored.
190-
- If the Avro schema is missing a field defined in the ClickHouse destination mapping, the ClickHouse column will be populated with a "zero" value, such as 0 or an empty string. Note that [DEFAULT](/sql-reference/statements/create/table#default) expressions are not currently evaluated for ClickPipes inserts (this is temporary limitation pending updates to the ClickHouse server default processing).
191-
- If the Avro schema field and the ClickHouse column are incompatible, inserts of that row/message will fail, and the failure will be recorded in the ClickPipes errors table. Note that several implicit conversions are supported (like between numeric types), but not all (for example, an Avro `record` field can not be inserted into an `Int32` ClickHouse column).
192-
193193
## Kafka virtual columns {#kafka-virtual-columns}
194194

195195
The following virtual columns are supported for Kafka compatible streaming data sources. When creating a new destination table virtual columns can be added by using the `Add Column` button.
Binary file not shown.
-159 KB
Loading
-126 KB
Loading
67.3 KB
Loading
-70.3 KB
Loading
67.6 KB
Loading
Binary file not shown.
55 KB
Loading

0 commit comments

Comments
 (0)