Skip to content

Commit a675f4d

Browse files
committed
Schema registry updates
1 parent 5d843e9 commit a675f4d

File tree

1 file changed

+39
-23
lines changed
  • docs/integrations/data-ingestion/clickpipes

1 file changed

+39
-23
lines changed

docs/integrations/data-ingestion/clickpipes/kafka.md

Lines changed: 39 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -31,28 +31,19 @@ You have familiarized yourself with the [ClickPipes intro](./index.md).
3131
<VerticalStepper type="numbered" headerLevel="h3">
3232

3333
### Navigate to data sources {#1-load-sql-console}
34-
Select the `Data Sources` button on the left-side menu and click on "Set up a ClickPipe"
34+
Select the `Data Sources` button on the left-side menu and click on "Set up a ClickPipe".
3535
<Image img={cp_step0} alt="Select imports" size="md"/>
3636

3737
### Select a data source {#2-select-data-source}
38-
Select your data source.
38+
Select your Kafka data source from the list.
3939
<Image img={cp_step1} alt="Select data source type" size="md"/>
4040

4141
### Configure the data source {#3-configure-data-source}
4242
Fill out the form by providing your ClickPipe with a name, a description (optional), your credentials, and other connection details.
4343
<Image img={cp_step2} alt="Fill out connection details" size="md"/>
4444

4545
### Configure a schema registry (optional) {#4-configure-your-schema-registry}
46-
A valid schema is required for Avro streams and optional for JSON. This schema will be used to parse [AvroConfluent](../../../interfaces/formats.md/#data-format-avro-confluent) or validate JSON messages on the selected topic.
47-
- Avro messages that cannot be parsed or JSON messages that fail validation will generate an error.
48-
- The "root" path of the schema registry. For example, a Confluent Cloud schema registry URL is just an HTTPS url with no path, like `https://test-kk999.us-east-2.aws.confluent.cloud` If only the root
49-
path is specified, the schema used to determine column names and types in step 4 will be determined by the id embedded in the sampled Kafka messages.
50-
- the path `/schemas/ids/[ID]` to the schema document by the numeric schema id. A complete url using a schema id would be `https://registry.example.com/schemas/ids/1000`
51-
- the path `/subjects/[subject_name]` to the schema document by subject name. Optionally, a specific version can be referenced by appending `/versions/[version]` to the url (otherwise ClickPipes
52-
will retrieve the latest version). A complete url using a schema subject would be `https://registry.example.com/subjects/events` or `https://registry/example.com/subjects/events/versions/4`
53-
54-
Note that in all cases ClickPipes will automatically retrieve an updated or different schema from the registry if indicated by the schema ID embedded in the message. If the message is written
55-
without an embedded schema id, then the specific schema ID or subject must be specified to parse all messages.
46+
A valid schema is required for Avro streams. See [Schema registries](#schema-registries) for more details on how to configure a schema registry.
5647

5748
### Configure a reverse private endpoint (optional) {#5-configure-reverse-private-endpoint}
5849
Configure a Reverse Private Endpoint to allow ClickPipes to connect to your Kafka cluster using AWS PrivateLink.
@@ -87,6 +78,42 @@ Clicking on "Create ClickPipe" will create and run your ClickPipe. It will now b
8778

8879
</VerticalStepper>
8980

81+
## Schema registries {#schema-registries}
82+
ClickPipes supports schema registries for Avro data streams.
83+
84+
### Supported registries
85+
Schema registries that use the Confluent Schema Registry API are suppoirted. This includes:
86+
- Confluent Kafka and Cloud
87+
- Redpanda
88+
- AWS MSK
89+
- Upstash
90+
91+
ClickPipes is not currently compatible with the AWS Glue Schema registry or the Azure Schema Registry.
92+
93+
### Configuration
94+
95+
A schema registry can be configured by when setting up a ClickPipe. This can be configured in one of three ways:
96+
97+
1. Providing the root schema registry URL (e.g. `https://registry.example.com`). **This is the preferred method.**
98+
2. Providing a complete path to the schema id (e.g. `https://registry.example.com/schemas/ids/1000`)
99+
3. Providing a complete path to the schema subject (e.g. `https://registry.example.com/subjects/events`)
100+
- Optionally, a specific version can be referenced by appending `/versions/[version]` to the url (otherwise ClickPipes will retrieve the latest version).
101+
102+
### How it works
103+
ClickPipes dynamically retrieves and applies the Avro schema from the configured Schema Registry.
104+
- If there's a schema id embedded in the message, it will use that to retrieve the schema.
105+
- If there's no schema id embedded in the message, it will use the schema id or subject name specified in the ClickPipe configuration to retrieve the schema.
106+
- If the message is written without an embedded schema id, and no schema id or subject name is specified in the ClickPipe configuration, then the schema will not be retrieved and the message will be skipped with a `SOURCE_SCHEMA_ERROR` logged in the ClickPipes errors table.
107+
- If the message does not conform to the schema, then the message will be skipped with a `DATA_PARSING_ERROR` logged in the ClickPipes errors table.
108+
109+
### Schema mapping
110+
The following rules are applied to the mapping between the retrieved Avro schema and the ClickHouse destination table:
111+
112+
- If the Avro schema contains a field that is not included in the ClickHouse destination mapping, that field is ignored.
113+
- If the Avro schema is missing a field defined in the ClickHouse destination mapping, the ClickHouse column will be populated with a "zero" value, such as 0 or an empty string. Note that DEFAULT expressions are not currently evaluated for ClickPipes inserts (this is temporary limitation pending updates to the ClickHouse server default processing).
114+
- If the Avro schema field and the ClickHouse column are incompatible, inserts of that row/message will fail, and the failure will be recorded in the ClickPipes errors table. Note that several implicit conversions are supported (like between numeric types), but not all (for example, an Avro record field can not be inserted into an Int32 ClickHouse column).
115+
116+
90117
## Supported data sources {#supported-data-sources}
91118

92119
| Name |Logo|Type| Status | Description |
@@ -165,17 +192,6 @@ Nullable types in Avro are defined by using a Union schema of `(T, null)` or `(n
165192

166193
ClickPipes does not currently support schemas that contain other Avro Unions (this may change in the future with the maturity of the new ClickHouse Variant and JSON data types). If the Avro schema contains a "non-null" union, ClickPipes will generate an error when attempting to calculate a mapping between the Avro schema and Clickhouse column types.
167194

168-
#### Avro schema management {#avro-schema-management}
169-
170-
ClickPipes dynamically retrieves and applies the Avro schema from the configured Schema Registry using the schema ID embedded in each message/event. Schema updates are detected and processed automatically.
171-
172-
At this time ClickPipes is only compatible with schema registries that use the [Confluent Schema Registry API](https://docs.confluent.io/platform/current/schema-registry/develop/api.html). In addition to Confluent Kafka and Cloud, this includes the Redpanda, AWS MSK, and Upstash schema registries. ClickPipes is not currently compatible with the AWS Glue Schema registry or the Azure Schema Registry (coming soon).
173-
174-
The following rules are applied to the mapping between the retrieved Avro schema and the ClickHouse destination table:
175-
- If the Avro schema contains a field that is not included in the ClickHouse destination mapping, that field is ignored.
176-
- If the Avro schema is missing a field defined in the ClickHouse destination mapping, the ClickHouse column will be populated with a "zero" value, such as 0 or an empty string. Note that [DEFAULT](/sql-reference/statements/create/table#default) expressions are not currently evaluated for ClickPipes inserts (this is temporary limitation pending updates to the ClickHouse server default processing).
177-
- If the Avro schema field and the ClickHouse column are incompatible, inserts of that row/message will fail, and the failure will be recorded in the ClickPipes errors table. Note that several implicit conversions are supported (like between numeric types), but not all (for example, an Avro `record` field can not be inserted into an `Int32` ClickHouse column).
178-
179195
## Kafka virtual columns {#kafka-virtual-columns}
180196

181197
The following virtual columns are supported for Kafka compatible streaming data sources. When creating a new destination table virtual columns can be added by using the `Add Column` button.

0 commit comments

Comments
 (0)