ClickHouse
diff --git a/‎docs/integrations/data-ingestion/clickpipes/kafka.md
Lines changed: 60 additions & 60 deletions b/‎docs/integrations/data-ingestion/clickpipes/kafka.md
Lines changed: 60 additions & 60 deletions
diff --git a/‎static/images/integrations/data-ingestion/clickpipes/cp_final_table.png
-121 KB b/‎static/images/integrations/data-ingestion/clickpipes/cp_final_table.png
-121 KB
diff --git a/‎static/images/integrations/data-ingestion/clickpipes/cp_overview.png
-159 KB b/‎static/images/integrations/data-ingestion/clickpipes/cp_overview.png
-159 KB
diff --git a/‎static/images/integrations/data-ingestion/clickpipes/cp_step0.png
-126 KB b/‎static/images/integrations/data-ingestion/clickpipes/cp_step0.png
-126 KB
diff --git a/‎static/images/integrations/data-ingestion/clickpipes/cp_step1.png
67.3 KB b/‎static/images/integrations/data-ingestion/clickpipes/cp_step1.png
67.3 KB
diff --git a/‎static/images/integrations/data-ingestion/clickpipes/cp_step2.png
-70.3 KB b/‎static/images/integrations/data-ingestion/clickpipes/cp_step2.png
-70.3 KB
diff --git a/‎static/images/integrations/data-ingestion/clickpipes/cp_step3.png
67.6 KB b/‎static/images/integrations/data-ingestion/clickpipes/cp_step3.png
67.6 KB
diff --git a/‎static/images/integrations/data-ingestion/clickpipes/cp_step4a2.png
-53.2 KB b/‎static/images/integrations/data-ingestion/clickpipes/cp_step4a2.png
-53.2 KB
diff --git a/‎static/images/integrations/data-ingestion/clickpipes/cp_table_settings.png
55 KB b/‎static/images/integrations/data-ingestion/clickpipes/cp_table_settings.png
55 KB
@@ -12,19 +12,14 @@ import Msksvg from '@site/static/images/integrations/logos/msk.svg';
 import Azureeventhubssvg from '@site/static/images/integrations/logos/azure_event_hubs.svg';
 import Warpstreamsvg from '@site/static/images/integrations/logos/warpstream.svg';
 import redpanda_logo from '@site/static/images/integrations/logos/logo_redpanda.png';
-import cp_service from '@site/static/images/integrations/data-ingestion/clickpipes/cp_service.png';
 import cp_step0 from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step0.png';
 import cp_step1 from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step1.png';
 import cp_step2 from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step2.png';
 import cp_step3 from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step3.png';
 import cp_step4a from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step4a.png';
-import cp_step4a3 from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step4a3.png';
-import cp_step4b from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step4b.png';
 import cp_step5 from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step5.png';
-import cp_success from '@site/static/images/integrations/data-ingestion/clickpipes/cp_success.png';
-import cp_remove from '@site/static/images/integrations/data-ingestion/clickpipes/cp_remove.png';
-import cp_destination from '@site/static/images/integrations/data-ingestion/clickpipes/cp_destination.png';
 import cp_overview from '@site/static/images/integrations/data-ingestion/clickpipes/cp_overview.png';
+import cp_table_settings from '@site/static/images/integrations/data-ingestion/clickpipes/cp_table_settings.png';
 import Image from '@theme/IdealImage';
 
 # Integrating Kafka with ClickHouse Cloud
@@ -33,73 +28,91 @@ You have familiarized yourself with the [ClickPipes intro](./index.md).
 
 ## Creating your first Kafka ClickPipe {#creating-your-first-kafka-clickpipe}
 
-1. Access the SQL Console for your ClickHouse Cloud Service.
+<VerticalStepper type="numbered" headerLevel="h3">
 
-<Image img={cp_service} alt="ClickPipes service" size="md" border/>
+### Navigate to data sources {#1-load-sql-console}
+Select the `Data Sources` button on the left-side menu and click on "Set up a ClickPipe".
+<Image img={cp_step0} alt="Select imports" size="md"/>
 
+### Select a data source {#2-select-data-source}
+Select your Kafka data source from the list.
+<Image img={cp_step1} alt="Select data source type" size="md"/>
 
-2. Select the `Data Sources` button on the left-side menu and click on "Set up a ClickPipe"
+### Configure the data source {#3-configure-data-source}
+Fill out the form by providing your ClickPipe with a name, a description (optional), your credentials, and other connection details.
+<Image img={cp_step2} alt="Fill out connection details" size="md"/>
 
-<Image img={cp_step0} alt="Select imports" size="lg" border/>
+### Configure a schema registry (optional) {#4-configure-your-schema-registry}
+A valid schema is required for Avro streams. See [Schema registries](#schema-registries) for more details on how to configure a schema registry.
 
-3. Select your data source.
+### Configure a reverse private endpoint (optional) {#5-configure-reverse-private-endpoint}
+Configure a Reverse Private Endpoint to allow ClickPipes to connect to your Kafka cluster using AWS PrivateLink.
+See our [AWS PrivateLink documentation](./aws-privatelink.md) for more information.
 
-<Image img={cp_step1} alt="Select data source type" size="lg" border/>
+### Select your topic {#6-select-your-topic}
+Select your topic and the UI will display a sample document from the topic.
+<Image img={cp_step3} alt="Set your topic" size="md"/>
 
-4. Fill out the form by providing your ClickPipe with a name, a description (optional), your credentials, and other connection details.
+### Configure your destination table {#7-configure-your-destination-table}
 
-<Image img={cp_step2} alt="Fill out connection details" size="lg" border/>
+In the next step, you can select whether you want to ingest data into a new ClickHouse table or reuse an existing one. Follow the instructions in the screen to modify your table name, schema, and settings. You can see a real-time preview of your changes in the sample table at the top.
 
-5. Configure the schema registry. A valid schema is required for Avro streams and optional for JSON. This schema will be used to parse [AvroConfluent](../../../interfaces/formats.md/#data-format-avro-confluent) or validate JSON messages on the selected topic.
-- Avro messages that cannot be parsed or JSON messages that fail validation will generate an error.
-- the "root" path of the schema registry.  For example, a Confluent Cloud schema registry URL is just an HTTPS url with no path, like `https://test-kk999.us-east-2.aws.confluent.cloud`  If only the root
-path is specified, the schema used to determine column names and types in step 4 will be determined by the id embedded in the sampled Kafka messages.
-- the path `/schemas/ids/[ID]` to the schema document by the numeric schema id. A complete url using a schema id would be `https://registry.example.com/schemas/ids/1000`
-- the path `/subjects/[subject_name]` to the schema document by subject name.  Optionally, a specific version can be referenced by appending `/versions/[version]` to the url (otherwise ClickPipes
-will retrieve the latest version).  A complete url using a schema subject would be `https://registry.example.com/subjects/events` or `https://registry/example.com/subjects/events/versions/4`
+<Image img={cp_step4a} alt="Set table, schema, and settings" size="md"/>
 
-Note that in all cases ClickPipes will automatically retrieve an updated or different schema from the registry if indicated by the schema ID embedded in the message.  If the message is written
-without an embedded schema id, then the specific schema ID or subject must be specified to parse all messages.
+You can also customize the advanced settings using the controls provided
 
-6. Select your topic and the UI will display a sample document from the topic.
+<Image img={cp_table_settings} alt="Set advanced controls" size="md"/>
 
-<Image img={cp_step3} alt="Set data format and topic" size="lg" border/>
 
-7. In the next step, you can select whether you want to ingest data into a new ClickHouse table or reuse an existing one. Follow the instructions in the screen to modify your table name, schema, and settings. You can see a real-time preview of your changes in the sample table at the top.
-
-<Image img={cp_step4a} alt="Set table, schema, and settings" size="lg" border/>
-
-  You can also customize the advanced settings using the controls provided
-
-<Image img={cp_step4a3} alt="Set advanced controls" size="lg" border/>
+### Configure permissions {#8-configure-permissions}
+ClickPipes will create a dedicated user for writing data into a destination table. You can select a role for this internal user using a custom role or one of the predefined role:
+    - `Full access`: with the full access to the cluster. It might be useful if you use Materialized View or Dictionary with the destination table.
+    - `Only destination table`: with the `INSERT` permissions to the destination table only.
 
-8. Alternatively, you can decide to ingest your data in an existing ClickHouse table. In that case, the UI will allow you to map fields from the source to the ClickHouse fields in the selected destination table.
+<Image img={cp_step5} alt="Permissions" size="md"/>
 
-<Image img={cp_step4b} alt="Use an existing table" size="lg" border/>
+### Complete setup {#9-complete-setup}
+Clicking on "Create ClickPipe" will create and run your ClickPipe. It will now be listed in the Data Sources section.
 
-9. Finally, you can configure permissions for the internal ClickPipes user.
+<Image img={cp_overview} alt="View overview" size="md"/>
 
-  **Permissions:** ClickPipes will create a dedicated user for writing data into a destination table. You can select a role for this internal user using a custom role or one of the predefined role:
-    - `Full access`: with the full access to the cluster. It might be useful if you use Materialized View or Dictionary with the destination table.
-    - `Only destination table`: with the `INSERT` permissions to the destination table only.
+</VerticalStepper>
 
-<Image img={cp_step5} alt="Permissions" size="lg" border/>
+## Schema registries {#schema-registries}
+ClickPipes supports schema registries for Avro data streams.
 
-10. By clicking on "Complete Setup", the system will register you ClickPipe, and you'll be able to see it listed in the summary table.
+### Supported registries {#supported-schema-registries}
+Schema registries that use the Confluent Schema Registry API are supported. This includes:
+- Confluent Kafka and Cloud
+- Redpanda
+- AWS MSK
+- Upstash
 
-<Image img={cp_success} alt="Success notice" size="sm" border/>
+ClickPipes is not currently compatible with the AWS Glue Schema registry or the Azure Schema Registry.
 
-<Image img={cp_remove} alt="Remove notice" size="lg" border/>
+### Configuration {#schema-registry-configuration}
 
-  The summary table provides controls to display sample data from the source or the destination table in ClickHouse
+ClickPipes with Avro data require a schema registry. This can be configured in one of three ways:
+ 
+1. Providing a complete path to the schema subject (e.g. `https://registry.example.com/subjects/events`)
+    - Optionally, a specific version can be referenced by appending `/versions/[version]` to the url (otherwise ClickPipes will retrieve the latest version).
+2. Providing a complete path to the schema id (e.g. `https://registry.example.com/schemas/ids/1000`)
+3. Providing the root schema registry URL (e.g. `https://registry.example.com`) 
 
-<Image img={cp_destination} alt="View destination" size="lg" border/>
+### How it works {#how-schema-registries-work}
+ClickPipes dynamically retrieves and applies the Avro schema from the configured Schema Registry.
+- If there's a schema id embedded in the message, it will use that to retrieve the schema.
+- If there's no schema id embedded in the message, it will use the schema id or subject name specified in the ClickPipe configuration to retrieve the schema.
+- If the message is written without an embedded schema id, and no schema id or subject name is specified in the ClickPipe configuration, then the schema will not be retrieved and the message will be skipped with a `SOURCE_SCHEMA_ERROR` logged in the ClickPipes errors table.
+- If the message does not conform to the schema, then the message will be skipped with a `DATA_PARSING_ERROR` logged in the ClickPipes errors table.
 
-  As well as controls to remove the ClickPipe and display a summary of the ingest job.
+### Schema mapping {#schema-mapping}
+The following rules are applied to the mapping between the retrieved Avro schema and the ClickHouse destination table:
 
-<Image img={cp_overview} alt="View overview" size="lg" border/>
+- If the Avro schema contains a field that is not included in the ClickHouse destination mapping, that field is ignored.
+- If the Avro schema is missing a field defined in the ClickHouse destination mapping, the ClickHouse column will be populated with a "zero" value, such as 0 or an empty string. Note that DEFAULT expressions are not currently evaluated for ClickPipes inserts (this is temporary limitation pending updates to the ClickHouse server default processing).
+- If the Avro schema field and the ClickHouse column are incompatible, inserts of that row/message will fail, and the failure will be recorded in the ClickPipes errors table. Note that several implicit conversions are supported (like between numeric types), but not all (for example, an Avro record field can not be inserted into an Int32 ClickHouse column).
 
-11. **Congratulations!** you have successfully set up your first ClickPipe. If this is a streaming ClickPipe it will be continuously running, ingesting data in real-time from your remote data source.
 
 ## Supported data sources {#supported-data-sources}
 
@@ -177,19 +190,6 @@ Nullable types in Avro are defined by using a Union schema of `(T, null)` or `(n
 - An empty Map for a null Avro Map
 - A named Tuple with all default/zero values for a null Avro Record
 
-ClickPipes does not currently support schemas that contain other Avro Unions (this may change in the future with the maturity of the new ClickHouse Variant and JSON data types).  If the Avro schema contains a "non-null" union, ClickPipes will generate an error when attempting to calculate a mapping between the Avro schema and Clickhouse column types.
-
-#### Avro schema management {#avro-schema-management}
-
-ClickPipes dynamically retrieves and applies the Avro schema from the configured Schema Registry using the schema ID embedded in each message/event.  Schema updates are detected and processed automatically.
-
-At this time ClickPipes is only compatible with schema registries that use the [Confluent Schema Registry API](https://docs.confluent.io/platform/current/schema-registry/develop/api.html).  In addition to Confluent Kafka and Cloud, this includes the Redpanda, AWS MSK, and Upstash schema registries.  ClickPipes is not currently compatible with the AWS Glue Schema registry or the Azure Schema Registry (coming soon).
-
-The following rules are applied to the mapping between the retrieved Avro schema and the ClickHouse destination table:
-- If the Avro schema contains a field that is not included in the ClickHouse destination mapping, that field is ignored.
-- If the Avro schema is missing a field defined in the ClickHouse destination mapping, the ClickHouse column will be populated with a "zero" value, such as 0 or an empty string.  Note that [DEFAULT](/sql-reference/statements/create/table#default) expressions are not currently evaluated for ClickPipes inserts (this is temporary limitation pending updates to the ClickHouse server default processing).
-- If the Avro schema field and the ClickHouse column are incompatible, inserts of that row/message will fail, and the failure will be recorded in the ClickPipes errors table.  Note that several implicit conversions are supported (like between numeric types), but not all (for example, an Avro `record` field can not be inserted into an `Int32` ClickHouse column).
-
 ## Kafka virtual columns {#kafka-virtual-columns}
 
 The following virtual columns are supported for Kafka compatible streaming data sources.  When creating a new destination table virtual columns can be added by using the `Add Column` button.