Skip to content

debezium/dbz#1691 Add Python example demonstrating Debezium Connect-mode#400

Open
KMohnishM wants to merge 1 commit intodebezium:mainfrom
KMohnishM:dbz-1691
Open

debezium/dbz#1691 Add Python example demonstrating Debezium Connect-mode#400
KMohnishM wants to merge 1 commit intodebezium:mainfrom
KMohnishM:dbz-1691

Conversation

@KMohnishM
Copy link

Description

This PR adds a Python example demonstrating how to use Debezium in Connect-mode using the pydbzengine integration.

The example shows how Debezium can be run in Connect mode without JSON serialization overhead, allowing direct access to structured change events from Python applications.

The example includes a minimal setup for running a PostgreSQL connector and consuming CDC events using Python. It also provides a utility script to download the required Debezium dependencies via Maven.
Added a Python example project under debezium-python/

@KMohnishM KMohnishM marked this pull request as ready for review March 11, 2026 09:34
@kmos kmos self-requested a review March 12, 2026 12:28
Copy link
Member

@kmos kmos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think thtat the best approach is to commit all the java dependencies in the directory. Could you please find another way?

@jpechane
Copy link
Contributor

@kmos This is definitely good point. Unfortunately it is the way how pydbzengine is doing it - see https://github.com/memiiso/pydbzengine/tree/main/pydbzengine/debezium/libs

But yes it would be really nice if the Python coude woul be just obtain the Java libs somethow.
This is not in scope of the PR now but something we should keep an eye on.

@jpechane jpechane marked this pull request as draft March 12, 2026 13:06
@Naros
Copy link
Member

Naros commented Mar 13, 2026

But @jpechane, unless I am missing something, isn't the setup_jars.py what is responsible for seeding the contents of the debezium/libs directory? In this case, would it not make sense to introduce a .gitignore that ignores debezium/libs for this example?

In addition, I don't believe we need the log directory or its contents either. wdyt?

@KMohnishM
Copy link
Author

KMohnishM commented Mar 13, 2026

@Naros That's True. The setup_jars.py script should indeed download the required JARs into the debezium/libs directory.

Let me verify that the example works correctly without committing the JAR files and update the PR accordingly. I will also remove the log directory as it will be build while running.

I will push an update shortly .

@jpechane jpechane requested a review from vjuranek March 18, 2026 10:17
# Fallback: unknown schema type, return string representation
return str(value)

if schema_type == "STRUCT":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debezium supports much more datatypes like timstamp etc. Could you extend the mapping to be properly converted?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks for calling this out.

I have extended the conversion logic to support Kafka Connect and Debezium logical types, including Timestamp, Date, Time, Decimal, and Debezium-specific types like MicroTimestamp and ZonedTimestamp.

The implementation is in connect_message.py. Let me know if there are any additional types you’d like to see covered .

elif schema_type == "STRING":
return str(value)

elif schema_type in ("INT8", "INT16", "INT32", "INT64"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can support conversion to either Python native types or numpy types dependning on user needs.

field_value = value.get(field)
except Exception:
field_value = None
result[field_name] = struct_to_dict(field_value, field.schema())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some of the table we might prefer pre-defined static classes. Can we support both struct_to_dict and a configuration where we define topic/table name to classname mapping?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have added support for that.

The default behavior still returns dictionaries via struct_to_dict, but I introduced an optional topic/table-to-class mapping.

If a mapping is provided, records can be materialized into typed Python objects. The mapping supports full topic name, short topic, or table name for flexibility.

@jpechane
Copy link
Contributor

@KMohnishM Please squash the commits to the JARs are not recored in git history.

… Mode

Signed-off-by: Mohnish <kmohnishm@gmail.com>
@kmos kmos marked this pull request as ready for review March 20, 2026 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants