You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### Configure a schema registry (optional) {#4-configure-your-schema-registry}
46
+
A valid schema is required for Avro streams. See [Schema registries](#schema-registries) for more details on how to configure a schema registry.
44
47
45
-
3. Select your data source.
48
+
### Configure a reverse private endpoint (optional) {#5-configure-reverse-private-endpoint}
49
+
Configure a Reverse Private Endpoint to allow ClickPipes to connect to your Kafka cluster using AWS PrivateLink.
50
+
See our [AWS PrivateLink documentation](./aws-privatelink.md) for more information.
46
51
47
-
<Imageimg={cp_step1}alt="Select data source type"size="lg"border/>
52
+
### Select your topic {#6-select-your-topic}
53
+
Select your topic and the UI will display a sample document from the topic.
54
+
<Imageimg={cp_step3}alt="Set your topic"size="md"/>
48
55
49
-
4. Fill out the form by providing your ClickPipe with a name, a description (optional), your credentials, and other connection details.
56
+
### Configure your destination table {#7-configure-your-destination-table}
50
57
51
-
<Imageimg={cp_step2}alt="Fill out connection details"size="lg"border/>
58
+
In the next step, you can select whether you want to ingest data into a new ClickHouse table or reuse an existing one. Follow the instructions in the screen to modify your table name, schema, and settings. You can see a real-time preview of your changes in the sample table at the top.
52
59
53
-
5. Configure the schema registry. A valid schema is required for Avro streams and optional for JSON. This schema will be used to parse [AvroConfluent](../../../interfaces/formats.md/#data-format-avro-confluent) or validate JSON messages on the selected topic.
54
-
- Avro messages that cannot be parsed or JSON messages that fail validation will generate an error.
55
-
- the "root" path of the schema registry. For example, a Confluent Cloud schema registry URL is just an HTTPS url with no path, like `https://test-kk999.us-east-2.aws.confluent.cloud` If only the root
56
-
path is specified, the schema used to determine column names and types in step 4 will be determined by the id embedded in the sampled Kafka messages.
57
-
- the path `/schemas/ids/[ID]` to the schema document by the numeric schema id. A complete url using a schema id would be `https://registry.example.com/schemas/ids/1000`
58
-
- the path `/subjects/[subject_name]` to the schema document by subject name. Optionally, a specific version can be referenced by appending `/versions/[version]` to the url (otherwise ClickPipes
59
-
will retrieve the latest version). A complete url using a schema subject would be `https://registry.example.com/subjects/events` or `https://registry/example.com/subjects/events/versions/4`
60
+
<Imageimg={cp_step4a}alt="Set table, schema, and settings"size="md"/>
60
61
61
-
Note that in all cases ClickPipes will automatically retrieve an updated or different schema from the registry if indicated by the schema ID embedded in the message. If the message is written
62
-
without an embedded schema id, then the specific schema ID or subject must be specified to parse all messages.
62
+
You can also customize the advanced settings using the controls provided
63
63
64
-
6. Select your topic and the UI will display a sample document from the topic.
<Imageimg={cp_step3}alt="Set data format and topic"size="lg"border/>
67
66
68
-
7. In the next step, you can select whether you want to ingest data into a new ClickHouse table or reuse an existing one. Follow the instructions in the screen to modify your table name, schema, and settings. You can see a real-time preview of your changes in the sample table at the top.
69
-
70
-
<Imageimg={cp_step4a}alt="Set table, schema, and settings"size="lg"border/>
71
-
72
-
You can also customize the advanced settings using the controls provided
ClickPipes will create a dedicated user for writing data into a destination table. You can select a role for this internal user using a custom role or one of the predefined role:
69
+
- `Full access`: with the full access to the cluster. It might be useful if you use Materialized View or Dictionary with the destination table.
70
+
- `Only destination table`: with the `INSERT` permissions to the destination table only.
75
71
76
-
8. Alternatively, you can decide to ingest your data in an existing ClickHouse table. In that case, the UI will allow you to map fields from the source to the ClickHouse fields in the selected destination table.
72
+
<Imageimg={cp_step5}alt="Permissions"size="md"/>
77
73
78
-
<Imageimg={cp_step4b}alt="Use an existing table"size="lg"border/>
74
+
### Complete setup {#9-complete-setup}
75
+
Clicking on "Create ClickPipe" will create and run your ClickPipe. It will now be listed in the Data Sources section.
79
76
80
-
9. Finally, you can configure permissions for the internal ClickPipes user.
**Permissions:** ClickPipes will create a dedicated user for writing data into a destination table. You can select a role for this internal user using a custom role or one of the predefined role:
83
-
- `Full access`: with the full access to the cluster. It might be useful if you use Materialized View or Dictionary with the destination table.
84
-
- `Only destination table`: with the `INSERT` permissions to the destination table only.
The summary table provides controls to display sample data from the source or the destination table in ClickHouse
95
+
ClickPipes with Avro data require a schema registry. This can be configured in one of three ways:
96
+
97
+
1. Providing a complete path to the schema subject (e.g. `https://registry.example.com/subjects/events`)
98
+
- Optionally, a specific version can be referenced by appending `/versions/[version]` to the url (otherwise ClickPipes will retrieve the latest version).
99
+
2. Providing a complete path to the schema id (e.g. `https://registry.example.com/schemas/ids/1000`)
100
+
3. Providing the root schema registry URL (e.g. `https://registry.example.com`)
ClickPipes dynamically retrieves and applies the Avro schema from the configured Schema Registry.
104
+
- If there's a schema id embedded in the message, it will use that to retrieve the schema.
105
+
- If there's no schema id embedded in the message, it will use the schema id or subject name specified in the ClickPipe configuration to retrieve the schema.
106
+
- If the message is written without an embedded schema id, and no schema id or subject name is specified in the ClickPipe configuration, then the schema will not be retrieved and the message will be skipped with a `SOURCE_SCHEMA_ERROR` logged in the ClickPipes errors table.
107
+
- If the message does not conform to the schema, then the message will be skipped with a `DATA_PARSING_ERROR` logged in the ClickPipes errors table.
97
108
98
-
As well as controls to remove the ClickPipe and display a summary of the ingest job.
109
+
### Schema mapping {#schema-mapping}
110
+
The following rules are applied to the mapping between the retrieved Avro schema and the ClickHouse destination table:
- If the Avro schema contains a field that is not included in the ClickHouse destination mapping, that field is ignored.
113
+
- If the Avro schema is missing a field defined in the ClickHouse destination mapping, the ClickHouse column will be populated with a "zero" value, such as 0 or an empty string. Note that DEFAULT expressions are not currently evaluated for ClickPipes inserts (this is temporary limitation pending updates to the ClickHouse server default processing).
114
+
- If the Avro schema field and the ClickHouse column are incompatible, inserts of that row/message will fail, and the failure will be recorded in the ClickPipes errors table. Note that several implicit conversions are supported (like between numeric types), but not all (for example, an Avro record field can not be inserted into an Int32 ClickHouse column).
101
115
102
-
11.**Congratulations!** you have successfully set up your first ClickPipe. If this is a streaming ClickPipe it will be continuously running, ingesting data in real-time from your remote data source.
103
116
104
117
## Supported data sources {#supported-data-sources}
105
118
@@ -177,19 +190,6 @@ Nullable types in Avro are defined by using a Union schema of `(T, null)` or `(n
177
190
- An empty Map for a null Avro Map
178
191
- A named Tuple with all default/zero values for a null Avro Record
179
192
180
-
ClickPipes does not currently support schemas that contain other Avro Unions (this may change in the future with the maturity of the new ClickHouse Variant and JSON data types). If the Avro schema contains a "non-null" union, ClickPipes will generate an error when attempting to calculate a mapping between the Avro schema and Clickhouse column types.
ClickPipes dynamically retrieves and applies the Avro schema from the configured Schema Registry using the schema ID embedded in each message/event. Schema updates are detected and processed automatically.
185
-
186
-
At this time ClickPipes is only compatible with schema registries that use the [Confluent Schema Registry API](https://docs.confluent.io/platform/current/schema-registry/develop/api.html). In addition to Confluent Kafka and Cloud, this includes the Redpanda, AWS MSK, and Upstash schema registries. ClickPipes is not currently compatible with the AWS Glue Schema registry or the Azure Schema Registry (coming soon).
187
-
188
-
The following rules are applied to the mapping between the retrieved Avro schema and the ClickHouse destination table:
189
-
- If the Avro schema contains a field that is not included in the ClickHouse destination mapping, that field is ignored.
190
-
- If the Avro schema is missing a field defined in the ClickHouse destination mapping, the ClickHouse column will be populated with a "zero" value, such as 0 or an empty string. Note that [DEFAULT](/sql-reference/statements/create/table#default) expressions are not currently evaluated for ClickPipes inserts (this is temporary limitation pending updates to the ClickHouse server default processing).
191
-
- If the Avro schema field and the ClickHouse column are incompatible, inserts of that row/message will fail, and the failure will be recorded in the ClickPipes errors table. Note that several implicit conversions are supported (like between numeric types), but not all (for example, an Avro `record` field can not be inserted into an `Int32` ClickHouse column).
192
-
193
193
## Kafka virtual columns {#kafka-virtual-columns}
194
194
195
195
The following virtual columns are supported for Kafka compatible streaming data sources. When creating a new destination table virtual columns can be added by using the `Add Column` button.
0 commit comments