You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Kafka Connect for Azure Cosmos DB is a connector to read from and write data to Azure Cosmos DB. The Azure Cosmos DB sink connector allows you to export data from Apache Kafka topics to an Azure Cosmos DB database. The connector polls data from Kafka to write to containers in the database based on the topics subscription.
16
16
17
17
## Prerequisites
18
18
19
-
* Start with the [Confluent platform setup](https://github.com/microsoft/kafka-connect-cosmosdb/blob/dev/doc/Confluent_Platform_Setup.md) because it gives you a complete environment to work with. If you do not wish to use Confluent Platform, then you need to install and configure Zookeeper, Apache Kafka, Kafka Connect, yourself. You will also need to install and configure the Azure Cosmos DB connectors manually.
19
+
* Start with the [Confluent platform setup](https://github.com/microsoft/kafka-connect-cosmosdb/blob/dev/doc/Confluent_Platform_Setup.md) because it gives you a complete environment to work with. If you don't wish to use Confluent Platform, then you need to install and configure Zookeeper, Apache Kafka, Kafka Connect, yourself. You'll also need to install and configure the Azure Cosmos DB connectors manually.
20
20
* Create an Azure Cosmos DB account, container [setup guide](https://github.com/microsoft/kafka-connect-cosmosdb/blob/dev/doc/CosmosDB_Setup.md)
21
21
* Bash shell, which is tested on GitHub Codespaces, Mac, Ubuntu, Windows with WSL2. This shell doesn’t work in Cloud Shell or WSL1.
If you are using the recommended [Confluent platform setup](https://github.com/microsoft/kafka-connect-cosmosdb/blob/dev/doc/Confluent_Platform_Setup.md), the Azure Cosmos DB sink connector is included in the installation, and you can skip this step.
27
+
If you're using the recommended [Confluent platform setup](https://github.com/microsoft/kafka-connect-cosmosdb/blob/dev/doc/Confluent_Platform_Setup.md), the Azure Cosmos DB sink connector is included in the installation, and you can skip this step.
28
28
29
29
Otherwise, you can download the JAR file from the latest [Release](https://github.com/microsoft/kafka-connect-cosmosdb/releases) or package this repo to create a new JAR file. To install the connector manually using the JAR file, refer to these [instructions](https://docs.confluent.io/current/connect/managing/install.html#install-connector-manually). You can also package a new JAR file from the source code.
30
30
@@ -42,13 +42,13 @@ ls target/*dependencies.jar
42
42
43
43
## Create a Kafka topic and write data
44
44
45
-
If you are using the Confluent Platform, the easiest way to create a Kafka topic is by using the supplied Control Center UX. Otherwise, you can create a Kafka topic manually using the following syntax:
45
+
If you're using the Confluent Platform, the easiest way to create a Kafka topic is by using the supplied Control Center UX. Otherwise, you can create a Kafka topic manually using the following syntax:
For this scenario, we will create a Kafka topic named “hotels” and will write non-schema embedded JSON data to the topic. To create a topic inside Control Center, see the [Confluent guide](https://docs.confluent.io/platform/current/quickstart/ce-docker-quickstart.html#step-2-create-ak-topics).
51
+
For this scenario, we'll create a Kafka topic named “hotels” and will write non-schema embedded JSON data to the topic. To create a topic inside Control Center, see the [Confluent guide](https://docs.confluent.io/platform/current/quickstart/ce-docker-quickstart.html#step-2-create-ak-topics).
52
52
53
53
Next, start the Kafka console producer to write a few records to the “hotels” topic.
54
54
@@ -76,9 +76,9 @@ The three records entered are published to the “hotels” Kafka topic in JSON
76
76
77
77
## Create the sink connector
78
78
79
-
Create the Azure Cosmos DB sink connector in Kafka Connect. The following JSON body defines config for the sink connector. Make sure to replace the values for `connect.cosmos.connection.endpoint` and `connect.cosmos.master.key`, properties that you should have saved from the Azure Cosmos DB setup guide in the prerequisites.
79
+
Create an Azure Cosmos DB sink connector in Kafka Connect. The following JSON body defines config for the sink connector. Make sure to replace the values for `connect.cosmos.connection.endpoint` and `connect.cosmos.master.key`, properties that you should have saved from the Azure Cosmos DB setup guide in the prerequisites.
80
80
81
-
Refer to the [sink properties](#sink-configuration-properties) section for more information on each of these configuration properties.
81
+
For more information on each of these configuration properties, see [sink properties](#sink-configuration-properties).
82
82
83
83
```json
84
84
{
@@ -105,11 +105,11 @@ Once you have all the values filled out, save the JSON file somewhere locally. Y
105
105
106
106
### Create connector using Control Center
107
107
108
-
An easy option to create the connector is by going through the Control Center webpage. Follow this [installation guide](https://docs.confluent.io/platform/current/quickstart/ce-docker-quickstart.html#step-3-install-a-ak-connector-and-generate-sample-data) to create a connector from Control Center. Instead of using the `DatagenConnector` option, use the `CosmosDBSinkConnector` tile instead. When configuring the sink connector, fill out the values as you have filled in the JSON file.
108
+
An easy option to create the connector is by going through the Control Center webpage. Follow this [installation guide](https://docs.confluent.io/platform/current/quickstart/ce-docker-quickstart.html#step-3-install-a-ak-connector-and-generate-sample-data) to create a connector from Control Center. Instead of using the `DatagenConnector` option, use the `CosmosDBSinkConnector` tile instead. When configuring the sink connector, fill out the values as you've filled in the JSON file.
109
109
110
110
Alternatively, in the connectors page, you can upload the JSON file created earlier by using the **Upload connector config file** option.
:::image type="content" source="./media/kafka-connector-sink/delete-sink-connector.png" lightbox="./media/kafka-connector-sink/delete-sink-connector.png" alt-text="Screenshot of delete option in the sink connector dialog.":::
133
133
134
134
Alternatively, use the Connect REST API to delete:
135
135
@@ -142,7 +142,7 @@ To delete the created Azure Cosmos DB service and its resource group using Azure
The following settings are used to configure the Cosmos DB Kafka sink connector. These configuration values determine which Kafka topics data is consumed, which Azure Cosmos DB container’s data is written into, and formats to serialize the data. For an example configuration file with the default values, refer to [this config](https://github.com/microsoft/kafka-connect-cosmosdb/blob/dev/src/docker/resources/sink.example.json).
145
+
The following settings are used to configure an Azure Cosmos DB Kafka sink connector. These configuration values determine which Kafka topics data is consumed, which Azure Cosmos DB container’s data is written into, and formats to serialize the data. For an example configuration file with the default values, refer to [this config](https://github.com/microsoft/kafka-connect-cosmosdb/blob/dev/src/docker/resources/sink.example.json).
146
146
147
147
| Name | Type | Description | Required/Optional |
148
148
| :--- | :--- | :--- | :--- |
@@ -191,18 +191,18 @@ The sink Connector also supports the following AVRO logical types:
191
191
192
192
## Single Message Transforms(SMT)
193
193
194
-
Along with the sink connector settings, you can specify the use of Single Message Transformations (SMTs) to modify messages flowing through the Kafka Connect platform. For more information, refer to the[Confluent SMT Documentation](https://docs.confluent.io/platform/current/connect/transforms/overview.html).
194
+
Along with the sink connector settings, you can specify the use of Single Message Transformations (SMTs) to modify messages flowing through the Kafka Connect platform. For more information, see[Confluent SMT Documentation](https://docs.confluent.io/platform/current/connect/transforms/overview.html).
195
195
196
196
### Using the InsertUUID SMT
197
197
198
-
You can use InsertUUID SMT to automatically add item IDs. With the custom `InsertUUID` SMT, you can insert the `id` field with a random UUID value for each message, before it is written to Azure Cosmos DB.
198
+
You can use InsertUUID SMT to automatically add item IDs. With the custom `InsertUUID` SMT, you can insert the `id` field with a random UUID value for each message, before it's written to Azure Cosmos DB.
199
199
200
200
> [!WARNING]
201
201
> Use this SMT only if the messages don’t contain the `id` field. Otherwise, the `id` values will be overwritten and you may end up with duplicate items in your database. Using UUIDs as the message ID can be quick and easy but are [not an ideal partition key](https://stackoverflow.com/questions/49031461/would-using-a-substring-of-a-guid-in-cosmosdb-as-partitionkey-be-a-bad-idea) to use in Azure Cosmos DB.
202
202
203
203
### Install the SMT
204
204
205
-
Before you can use the `InsertUUID` SMT, you will need to install this transform in your Confluent Platform setup. If you are using the Confluent Platform setup from this repo, the transform is already included in the installation, and you can skip this step.
205
+
Before you can use the `InsertUUID` SMT, you'll need to install this transform in your Confluent Platform setup. If you're using the Confluent Platform setup from this repo, the transform is already included in the installation, and you can skip this step.
206
206
207
207
Alternatively, you can package the [InsertUUID source](https://github.com/confluentinc/kafka-connect-insert-uuid) to create a new JAR file. To install the connector manually using the JAR file, refer to these [instructions](https://docs.confluent.io/current/connect/managing/install.html#install-connector-manually).
208
208
@@ -253,7 +253,7 @@ Here are solutions to some common problems that you may encounter when working w
253
253
254
254
### Read non-JSON data with JsonConverter
255
255
256
-
If you have non-JSON data on your source topic in Kafka and attempt to read it using the `JsonConverter`, you will see the following exception:
256
+
If you have non-JSON data on your source topic in Kafka and attempt to read it using the `JsonConverter`, you'll see the following exception:
257
257
258
258
```console
259
259
org.apache.kafka.connect.errors.DataException: Converting byte[] to Kafka Connect data failed due to serialization error:
@@ -273,7 +273,7 @@ This error is likely caused by data in the source topic being serialized in eith
273
273
274
274
### Read non-Avro data with AvroConverter
275
275
276
-
This scenario is applicable when you try to use the Avro converter to read data from a topic that is not in Avro format. Which, includes data written by an Avro serializer other than the Confluent Schema Registry’s Avro serializer, which has its own wire format.
276
+
This scenario is applicable when you try to use the Avro converter to read data from a topic that isn't in Avro format. Which, includes data written by an Avro serializer other than the Confluent Schema Registry’s Avro serializer, which has its own wire format.
@@ -314,7 +314,7 @@ Kafka Connect supports a special structure of JSON messages containing both payl
314
314
}
315
315
```
316
316
317
-
If you try to read JSON data that does not contain the data in this structure, you will get the following error:
317
+
If you try to read JSON data that doesn't contain the data in this structure, you'll get the following error:
318
318
319
319
```none
320
320
org.apache.kafka.connect.errors.DataException: JsonConverter with schemas.enable requires "schema" and "payload" fields and may not contain additional fields. If you are trying to deserialize plain JSON data, set schemas.enable=false in your converter configuration.
@@ -329,7 +329,7 @@ To be clear, the only JSON structure that is valid for `schemas.enable=true` has
329
329
330
330
## Limitations
331
331
332
-
* Autocreation of databases and containers in Azure Cosmos DB are not supported. The database and containers must already exist, and they must be configured correctly.
332
+
* Autocreation of databases and containers in Azure Cosmos DB aren't supported. The database and containers must already exist, and they must be configured correctly.
Kafka Connect for Azure Cosmos DB is a connector to read from and write data to Azure Cosmos DB. The Azure Cosmos DB source connector provides the capability to read data from the Azure Cosmos DB change feed and publish this data to a Kafka topic.
16
16
17
17
## Prerequisites
18
18
19
-
* Start with the [Confluent platform setup](https://github.com/microsoft/kafka-connect-cosmosdb/blob/dev/doc/Confluent_Platform_Setup.md) because it gives you a complete environment to work with. If you do not wish to use Confluent Platform, then you need to install and configure Zookeeper, Apache Kafka, Kafka Connect, yourself. You will also need to install and configure the Azure Cosmos DB connectors manually.
19
+
* Start with the [Confluent platform setup](https://github.com/microsoft/kafka-connect-cosmosdb/blob/dev/doc/Confluent_Platform_Setup.md) because it gives you a complete environment to work with. If you don't wish to use Confluent Platform, then you need to install and configure Zookeeper, Apache Kafka, Kafka Connect, yourself. You'll also need to install and configure the Azure Cosmos DB connectors manually.
20
20
* Create an Azure Cosmos DB account, container [setup guide](https://github.com/microsoft/kafka-connect-cosmosdb/blob/dev/doc/CosmosDB_Setup.md)
21
21
* Bash shell, which is tested on GitHub Codespaces, Mac, Ubuntu, Windows with WSL2. This shell doesn’t work in Cloud Shell or WSL1.
If you are using the recommended [Confluent platform setup](https://github.com/microsoft/kafka-connect-cosmosdb/blob/dev/doc/Confluent_Platform_Setup.md), the Azure Cosmos DB source connector is included in the installation, and you can skip this step.
27
+
If you're using the recommended [Confluent platform setup](https://github.com/microsoft/kafka-connect-cosmosdb/blob/dev/doc/Confluent_Platform_Setup.md), the Azure Cosmos DB source connector is included in the installation, and you can skip this step.
28
28
29
29
Otherwise, you can use JAR file from latest [Release](https://github.com/microsoft/kafka-connect-cosmosdb/releases) and install the connector manually. To learn more, see these [instructions](https://docs.confluent.io/current/connect/managing/install.html#install-connector-manually). You can also package a new JAR file from the source code:
30
30
@@ -42,7 +42,7 @@ ls target/*dependencies.jar
42
42
43
43
## Create a Kafka topic
44
44
45
-
Create a Kafka topic using Confluent Control Center. For this scenario, we will create a Kafka topic named "apparels" and write non-schema embedded JSON data to the topic. To create a topic inside the Control Center, see [create Kafka topic doc](https://docs.confluent.io/platform/current/quickstart/ce-docker-quickstart.html#step-2-create-ak-topics).
45
+
Create a Kafka topic using Confluent Control Center. For this scenario, we'll create a Kafka topic named "apparels" and write non-schema embedded JSON data to the topic. To create a topic inside the Control Center, see [create Kafka topic doc](https://docs.confluent.io/platform/current/quickstart/ce-docker-quickstart.html#step-2-create-ak-topics).
46
46
47
47
## Create the source connector
48
48
@@ -74,11 +74,11 @@ For more information on each of the above configuration properties, see the [sou
74
74
75
75
#### Create connector using Control Center
76
76
77
-
An easy option to create the connector is from the Confluent Control Center portal. Follow the [Confluent setup guide](https://docs.confluent.io/platform/current/quickstart/ce-docker-quickstart.html#step-3-install-a-ak-connector-and-generate-sample-data) to create a connector from Control Center. When setting up, instead of using the `DatagenConnector` option, use the `CosmosDBSourceConnector` tile instead. When configuring the source connector, fill out the values as you have filled in the JSON file.
77
+
An easy option to create the connector is from the Confluent Control Center portal. Follow the [Confluent setup guide](https://docs.confluent.io/platform/current/quickstart/ce-docker-quickstart.html#step-3-install-a-ak-connector-and-generate-sample-data) to create a connector from Control Center. When setting up, instead of using the `DatagenConnector` option, use the `CosmosDBSourceConnector` tile instead. When configuring the source connector, fill out the values as you've filled in the JSON file.
78
78
79
79
Alternatively, in the connectors page, you can upload the JSON file built from the previous section by using the **Upload connector config file** option.
To delete the connector from the Confluent Control Center, navigate to the source connector you created and select the **Delete** icon.
133
133
134
-
:::image type="content" source="./media/kafka-connector-source/delete-source-connector.png" alt-text="Delete connector from Confluent center":::
134
+
:::image type="content" source="./media/kafka-connector-source/delete-source-connector.png" lightbox="./media/kafka-connector-source/delete-source-connector.png" alt-text="Screenshot of delete option in the source connector dialog.":::
0 commit comments