You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/event-hubs/event-hubs-kafka-connect-debezium.md
+15-12Lines changed: 15 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ description: This article provides information on how to use Debezium with Azure
4
4
ms.topic: how-to
5
5
author: abhirockzz
6
6
ms.author: abhishgu
7
-
ms.date: 01/06/2021
7
+
ms.date: 10/18/2021
8
8
---
9
9
10
10
# Integrate Apache Kafka Connect support on Azure Event Hubs with Debezium for Change Data Capture
@@ -14,13 +14,13 @@ ms.date: 01/06/2021
14
14
> [!WARNING]
15
15
> Use of the Apache Kafka Connect framework as well as the Debezium platform and its connectors are **not eligible for product support through Microsoft Azure**.
16
16
>
17
-
> Apache Kafka Connect assumes for its dynamic configuration to be held in compacted topics with otherwise unlimited retention. Azure Event Hubs [does not implement compaction as a broker feature](event-hubs-federation-overview.md#log-projections) and always imposes a time-based retention limit on retained events, rooting from the principle that Azure Event Hubs is a real-time event streaming engine and not a long-term data or configuration store.
17
+
> Apache Kafka Connect assumes for its dynamic configuration to be held in compacted topics with otherwise unlimited retention. Event Hubs [does not implement compaction as a broker feature](event-hubs-federation-overview.md#log-projections) and always imposes a time-based retention limit on retained events, rooting from the principle that Event Hubs is a real-time event streaming engine and not a long-term data or configuration store.
18
18
>
19
19
> While the Apache Kafka project might be comfortable with mixing these roles, Azure believes that such information is best managed in a proper database or configuration store.
20
20
>
21
-
> Many Apache Kafka Connect scenarios will be functional, but these conceptual differences between Apache Kafka's and Azure Event Hubs' retention models may cause certain configurations not to work as expected.
21
+
> Many Apache Kafka Connect scenarios will be functional, but these conceptual differences between Apache Kafka's and Event Hubs' retention models may cause certain configurations not to work as expected.
22
22
23
-
This tutorial walks you through how to set up a change data capture based system on Azure using [Azure Event Hubs](./event-hubs-about.md?WT.mc_id=devto-blog-abhishgu) (for Kafka), [Azure DB for PostgreSQL](../postgresql/overview.md) and Debezium. It will use the [Debezium PostgreSQL connector](https://debezium.io/documentation/reference/1.2/connectors/postgresql.html) to stream database modifications from PostgreSQL to Kafka topics in Azure Event Hubs
23
+
This tutorial walks you through how to set up a change data capture based system on Azure using [Event Hubs](./event-hubs-about.md?WT.mc_id=devto-blog-abhishgu) (for Kafka), [Azure DB for PostgreSQL](../postgresql/overview.md) and Debezium. It will use the [Debezium PostgreSQL connector](https://debezium.io/documentation/reference/1.2/connectors/postgresql.html) to stream database modifications from PostgreSQL to Kafka topics in Event Hubs
24
24
25
25
> [!NOTE]
26
26
> This article contains references to the term *whitelist*, a term that Microsoft no longer uses. When the term is removed from the software, we'll remove it from this article.
@@ -35,7 +35,7 @@ In this tutorial, you take the following steps:
35
35
> * (Optional) Consume change data events with a `FileStreamSink` connector
36
36
37
37
## Pre-requisites
38
-
To complete this walk through, you will require:
38
+
To complete this walk through, you'll require:
39
39
40
40
- Azure subscription. If you don't have one, [create a free account](https://azure.microsoft.com/free/).
41
41
- Linux/MacOS
@@ -45,7 +45,7 @@ To complete this walk through, you will require:
45
45
## Create an Event Hubs namespace
46
46
An Event Hubs namespace is required to send and receive from any Event Hubs service. See [Creating an event hub](event-hubs-create.md) for instructions to create a namespace and an event hub. Get the Event Hubs connection string and fully qualified domain name (FQDN) for later use. For instructions, see [Get an Event Hubs connection string](event-hubs-get-connection-string.md).
47
47
48
-
## Setup and configure Azure Database for PostgreSQL
48
+
## Set up and configure Azure Database for PostgreSQL
49
49
[Azure Database for PostgreSQL](../postgresql/overview.md) is a relational database service based on the community version of open-source PostgreSQL database engine, and is available in two deployment options: Single Server and Hyperscale (Citus). [Follow these instructions](../postgresql/quickstart-create-server-database-portal.md) to create an Azure Database for PostgreSQL server using the Azure portal.
50
50
51
51
## Setup and run Kafka Connect
@@ -56,7 +56,7 @@ This section will cover the following topics:
56
56
- Start Kafka Connect cluster with Debezium connector
57
57
58
58
### Download and setup Debezium connector
59
-
Please follow the latest instructions in the [Debezium documentation](https://debezium.io/documentation/reference/1.2/connectors/postgresql.html#postgresql-deploying-a-connector) to download and set up the connector.
59
+
Follow the latest instructions in the [Debezium documentation](https://debezium.io/documentation/reference/1.2/connectors/postgresql.html#postgresql-deploying-a-connector) to download and set up the connector.
60
60
61
61
- Download the connector’s plug-in archive. For example, to download version `1.2.0` of the connector, use this link - https://repo1.maven.org/maven2/io/debezium/debezium-connector-postgres/1.2.0.Final/debezium-connector-postgres-1.2.0.Final-plugin.tar.gz
62
62
- Extract the JAR files and copy them to the [Kafka Connect plugin.path](https://kafka.apache.org/documentation/#connectconfigs).
@@ -65,6 +65,9 @@ Please follow the latest instructions in the [Debezium documentation](https://de
65
65
### Configure Kafka Connect for Event Hubs
66
66
Minimal reconfiguration is necessary when redirecting Kafka Connect throughput from Kafka to Event Hubs. The following `connect-distributed.properties` sample illustrates how to configure Connect to authenticate and communicate with the Kafka endpoint on Event Hubs:
67
67
68
+
> [!IMPORTANT]
69
+
> Debezium will auto-create a topic per table and a bunch of metadata topics. Kafka **topic** corresponds to an Event Hubs instance (event hub). For Apache Kafka to Azure Event Hubs mappings, see [Kafka and Event Hubs conceptual mapping](event-hubs-for-kafka-ecosystem-overview.md#kafka-and-event-hub-conceptual-mapping). There are different **limits** on number of event hubs in an Event Hubs namespace depending on the tier (Basic, Standard, Premium, or Dedicated). For these limits, See [Quotas](compare-tiers.md#quotas).
70
+
68
71
```properties
69
72
bootstrap.servers={YOUR.EVENTHUBS.FQDN}:9093 # e.g. namespace.servicebus.windows.net:9093
You should see the JSON payloads representing the change data events generated in PostgreSQL in response to the rows you had just added to the `todos` table. Here is a snippet of the payload:
217
+
You should see the JSON payloads representing the change data events generated in PostgreSQL in response to the rows you had added to the `todos` table. Here's a snippet of the payload:
215
218
216
219
217
220
```json
@@ -243,7 +246,7 @@ You should see the JSON payloads representing the change data events generated i
243
246
}
244
247
```
245
248
246
-
The event consists of the `payload` along with its `schema` (omitted for brevity). In `payload` section, notice how the create operation (`"op": "c"`) is represented - `"before": null` means that it was a newly `INSERT`ed row, `after` provides values for the columns in the row, `source` provides the PostgreSQL instance metadata from where this event was picked up etc.
249
+
The event consists of the `payload` along with its `schema` (omitted for brevity). In `payload` section, notice how the create operation (`"op": "c"`) is represented - `"before": null` means that it was a newly `INSERT`ed row, `after` provides values for the columns in the row, `source` provides the PostgreSQL instance metadata from where this event was picked up and so on.
247
250
248
251
You can try the same with update or delete operations as well and introspect the change data events. For example, to update the task status for `configure and install connector` (assuming its `id` is `3`):
249
252
@@ -252,7 +255,7 @@ UPDATE todos SET todo_status = 'complete' WHERE id = 3;
252
255
```
253
256
254
257
## (Optional) Install FileStreamSink connector
255
-
Now that all the `todos` table changes are being captured in Event Hubs topic, we will use the FileStreamSink connector (that is available by default in Kafka Connect) to consume these events.
258
+
Now that all the `todos` table changes are being captured in Event Hubs topic, you'll use the FileStreamSink connector (that is available by default in Kafka Connect) to consume these events.
256
259
257
260
Create a configuration file (`file-sink-connector.json`) for the connector - replace the `file` attribute as per your file system
Kafka Connect creates Event Hub topics to store configurations, offsets, and status that persist even after the Connect cluster has been taken down. Unless this persistence is desired, it is recommended that these topics are deleted. You may also want to delete the `my-server.public.todos` Event Hub that were created during the course of this walk through.
290
+
Kafka Connect creates Event Hub topics to store configurations, offsets, and status that persist even after the Connect cluster has been taken down. Unless this persistence is desired, it's recommended that these topics are deleted. You may also want to delete the `my-server.public.todos` Event Hub that were created during this walk through.
0 commit comments