Skip to content

Commit 4541248

Browse files
authored
Merge pull request #176199 from spelluru/ehubkafka1018
Information about limits/quotas
2 parents ffeb981 + acbe1a1 commit 4541248

File tree

1 file changed

+15
-12
lines changed

1 file changed

+15
-12
lines changed

articles/event-hubs/event-hubs-kafka-connect-debezium.md

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: This article provides information on how to use Debezium with Azure
44
ms.topic: how-to
55
author: abhirockzz
66
ms.author: abhishgu
7-
ms.date: 01/06/2021
7+
ms.date: 10/18/2021
88
---
99

1010
# Integrate Apache Kafka Connect support on Azure Event Hubs with Debezium for Change Data Capture
@@ -14,13 +14,13 @@ ms.date: 01/06/2021
1414
> [!WARNING]
1515
> Use of the Apache Kafka Connect framework as well as the Debezium platform and its connectors are **not eligible for product support through Microsoft Azure**.
1616
>
17-
> Apache Kafka Connect assumes for its dynamic configuration to be held in compacted topics with otherwise unlimited retention. Azure Event Hubs [does not implement compaction as a broker feature](event-hubs-federation-overview.md#log-projections) and always imposes a time-based retention limit on retained events, rooting from the principle that Azure Event Hubs is a real-time event streaming engine and not a long-term data or configuration store.
17+
> Apache Kafka Connect assumes for its dynamic configuration to be held in compacted topics with otherwise unlimited retention. Event Hubs [does not implement compaction as a broker feature](event-hubs-federation-overview.md#log-projections) and always imposes a time-based retention limit on retained events, rooting from the principle that Event Hubs is a real-time event streaming engine and not a long-term data or configuration store.
1818
>
1919
> While the Apache Kafka project might be comfortable with mixing these roles, Azure believes that such information is best managed in a proper database or configuration store.
2020
>
21-
> Many Apache Kafka Connect scenarios will be functional, but these conceptual differences between Apache Kafka's and Azure Event Hubs' retention models may cause certain configurations not to work as expected.
21+
> Many Apache Kafka Connect scenarios will be functional, but these conceptual differences between Apache Kafka's and Event Hubs' retention models may cause certain configurations not to work as expected.
2222
23-
This tutorial walks you through how to set up a change data capture based system on Azure using [Azure Event Hubs](./event-hubs-about.md?WT.mc_id=devto-blog-abhishgu) (for Kafka), [Azure DB for PostgreSQL](../postgresql/overview.md) and Debezium. It will use the [Debezium PostgreSQL connector](https://debezium.io/documentation/reference/1.2/connectors/postgresql.html) to stream database modifications from PostgreSQL to Kafka topics in Azure Event Hubs
23+
This tutorial walks you through how to set up a change data capture based system on Azure using [Event Hubs](./event-hubs-about.md?WT.mc_id=devto-blog-abhishgu) (for Kafka), [Azure DB for PostgreSQL](../postgresql/overview.md) and Debezium. It will use the [Debezium PostgreSQL connector](https://debezium.io/documentation/reference/1.2/connectors/postgresql.html) to stream database modifications from PostgreSQL to Kafka topics in Event Hubs
2424

2525
> [!NOTE]
2626
> This article contains references to the term *whitelist*, a term that Microsoft no longer uses. When the term is removed from the software, we'll remove it from this article.
@@ -35,7 +35,7 @@ In this tutorial, you take the following steps:
3535
> * (Optional) Consume change data events with a `FileStreamSink` connector
3636
3737
## Pre-requisites
38-
To complete this walk through, you will require:
38+
To complete this walk through, you'll require:
3939

4040
- Azure subscription. If you don't have one, [create a free account](https://azure.microsoft.com/free/).
4141
- Linux/MacOS
@@ -45,7 +45,7 @@ To complete this walk through, you will require:
4545
## Create an Event Hubs namespace
4646
An Event Hubs namespace is required to send and receive from any Event Hubs service. See [Creating an event hub](event-hubs-create.md) for instructions to create a namespace and an event hub. Get the Event Hubs connection string and fully qualified domain name (FQDN) for later use. For instructions, see [Get an Event Hubs connection string](event-hubs-get-connection-string.md).
4747

48-
## Setup and configure Azure Database for PostgreSQL
48+
## Set up and configure Azure Database for PostgreSQL
4949
[Azure Database for PostgreSQL](../postgresql/overview.md) is a relational database service based on the community version of open-source PostgreSQL database engine, and is available in two deployment options: Single Server and Hyperscale (Citus). [Follow these instructions](../postgresql/quickstart-create-server-database-portal.md) to create an Azure Database for PostgreSQL server using the Azure portal.
5050

5151
## Setup and run Kafka Connect
@@ -56,7 +56,7 @@ This section will cover the following topics:
5656
- Start Kafka Connect cluster with Debezium connector
5757

5858
### Download and setup Debezium connector
59-
Please follow the latest instructions in the [Debezium documentation](https://debezium.io/documentation/reference/1.2/connectors/postgresql.html#postgresql-deploying-a-connector) to download and set up the connector.
59+
Follow the latest instructions in the [Debezium documentation](https://debezium.io/documentation/reference/1.2/connectors/postgresql.html#postgresql-deploying-a-connector) to download and set up the connector.
6060

6161
- Download the connector’s plug-in archive. For example, to download version `1.2.0` of the connector, use this link - https://repo1.maven.org/maven2/io/debezium/debezium-connector-postgres/1.2.0.Final/debezium-connector-postgres-1.2.0.Final-plugin.tar.gz
6262
- Extract the JAR files and copy them to the [Kafka Connect plugin.path](https://kafka.apache.org/documentation/#connectconfigs).
@@ -65,6 +65,9 @@ Please follow the latest instructions in the [Debezium documentation](https://de
6565
### Configure Kafka Connect for Event Hubs
6666
Minimal reconfiguration is necessary when redirecting Kafka Connect throughput from Kafka to Event Hubs. The following `connect-distributed.properties` sample illustrates how to configure Connect to authenticate and communicate with the Kafka endpoint on Event Hubs:
6767

68+
> [!IMPORTANT]
69+
> Debezium will auto-create a topic per table and a bunch of metadata topics. Kafka **topic** corresponds to an Event Hubs instance (event hub). For Apache Kafka to Azure Event Hubs mappings, see [Kafka and Event Hubs conceptual mapping](event-hubs-for-kafka-ecosystem-overview.md#kafka-and-event-hub-conceptual-mapping). There are different **limits** on number of event hubs in an Event Hubs namespace depending on the tier (Basic, Standard, Premium, or Dedicated). For these limits, See [Quotas](compare-tiers.md#quotas).
70+
6871
```properties
6972
bootstrap.servers={YOUR.EVENTHUBS.FQDN}:9093 # e.g. namespace.servicebus.windows.net:9093
7073
group.id=connect-cluster-group
@@ -159,7 +162,7 @@ curl -s http://localhost:8083/connectors/todo-connector/status
159162
```
160163

161164
## Test change data capture
162-
To see change data capture in action, you will need to create/update/delete records in the Azure PostgreSQL database.
165+
To see change data capture in action, you'll need to create/update/delete records in the Azure PostgreSQL database.
163166

164167
Start by connecting to your Azure PostgreSQL database (the example below uses [psql](https://www.postgresql.org/docs/12/app-psql.html))
165168

@@ -211,7 +214,7 @@ export TOPIC=my-server.public.todos
211214
kafkacat -b $BROKER -t $TOPIC -o beginning
212215
```
213216

214-
You should see the JSON payloads representing the change data events generated in PostgreSQL in response to the rows you had just added to the `todos` table. Here is a snippet of the payload:
217+
You should see the JSON payloads representing the change data events generated in PostgreSQL in response to the rows you had added to the `todos` table. Here's a snippet of the payload:
215218

216219

217220
```json
@@ -243,7 +246,7 @@ You should see the JSON payloads representing the change data events generated i
243246
}
244247
```
245248

246-
The event consists of the `payload` along with its `schema` (omitted for brevity). In `payload` section, notice how the create operation (`"op": "c"`) is represented - `"before": null` means that it was a newly `INSERT`ed row, `after` provides values for the columns in the row, `source` provides the PostgreSQL instance metadata from where this event was picked up etc.
249+
The event consists of the `payload` along with its `schema` (omitted for brevity). In `payload` section, notice how the create operation (`"op": "c"`) is represented - `"before": null` means that it was a newly `INSERT`ed row, `after` provides values for the columns in the row, `source` provides the PostgreSQL instance metadata from where this event was picked up and so on.
247250

248251
You can try the same with update or delete operations as well and introspect the change data events. For example, to update the task status for `configure and install connector` (assuming its `id` is `3`):
249252

@@ -252,7 +255,7 @@ UPDATE todos SET todo_status = 'complete' WHERE id = 3;
252255
```
253256

254257
## (Optional) Install FileStreamSink connector
255-
Now that all the `todos` table changes are being captured in Event Hubs topic, we will use the FileStreamSink connector (that is available by default in Kafka Connect) to consume these events.
258+
Now that all the `todos` table changes are being captured in Event Hubs topic, you'll use the FileStreamSink connector (that is available by default in Kafka Connect) to consume these events.
256259

257260
Create a configuration file (`file-sink-connector.json`) for the connector - replace the `file` attribute as per your file system
258261

@@ -284,7 +287,7 @@ tail -f /Users/foo/todos-cdc.txt
284287

285288

286289
## Cleanup
287-
Kafka Connect creates Event Hub topics to store configurations, offsets, and status that persist even after the Connect cluster has been taken down. Unless this persistence is desired, it is recommended that these topics are deleted. You may also want to delete the `my-server.public.todos` Event Hub that were created during the course of this walk through.
290+
Kafka Connect creates Event Hub topics to store configurations, offsets, and status that persist even after the Connect cluster has been taken down. Unless this persistence is desired, it's recommended that these topics are deleted. You may also want to delete the `my-server.public.todos` Event Hub that were created during this walk through.
288291

289292
## Next steps
290293

0 commit comments

Comments
 (0)