Skip to content

Expose xrepl_origin_id on DML change events#400

Open
kgalieva wants to merge 5 commits intoyugabyte:mainfrom
Shopify:expose-xrepl-origin-id-on-dml
Open

Expose xrepl_origin_id on DML change events#400
kgalieva wants to merge 5 commits intoyugabyte:mainfrom
Shopify:expose-xrepl-origin-id-on-dml

Conversation

@kgalieva
Copy link

@kgalieva kgalieva commented Mar 10, 2026

Expose xrepl_origin_id on CDC change events

Summary

  • Surface the xrepl_origin_id field from YugabyteDB CDC RowMessage proto onto every DML (INSERT/UPDATE/DELETE) and COMMIT Debezium change event
  • The field appears as xrepl_origin_id (optional int32) in the source block of the Kafka message envelope, only when non-zero
  • Companion change to PR which attaches xrepl_origin_id to every DML CDC event on the server side

Motivation

Previously, xrepl_origin_id was only set on COMMIT records in the CDC stream and the Debezium connector did not surface it at all. The upstream YugabyteDB change now attaches it to every DML record as well. This connector change threads that value through to the Debezium output on both DML and COMMIT events so consumers can use the replication origin to filter or route events (e.g. for loop-prevention in bidirectional replication).

Changes

File Change
pom.xml Bump yb-client from 0.8.108 to 0.8.111-20260312.173733-1 (includes xrepl_origin_id in proto)
ReplicationMessage.java Added default getXreplOriginId() method returning 0
YbProtoReplicationMessage.java Override to return rawMessage.getXreplOriginId()
SourceInfo.java Added xreplOriginId field, constant, accessor, and parameter to update()
YugabyteDBOffsetContext.java Added overloaded updateRecordPosition() accepting xreplOriginId
YugabyteDBStreamingChangeEventSource.java DML and COMMIT paths pass message.getXreplOriginId()
YugabyteDBConsistentStreamingSource.java DML and COMMIT paths pass message.getXreplOriginId()
YugabyteDBSourceInfoStructMaker.java Added xrepl_origin_id (optional int32) to schema and struct
SourceInfoTest.java Updated to match new update() signature and schema

Surface the replication origin ID from YugabyteDB CDC RowMessage
proto onto every DML (INSERT/UPDATE/DELETE) Debezium change event.
The field appears as `xrepl_origin_id` in the `source` block of
the Kafka message envelope, only when non-zero.
@shishir2001-yb shishir2001-yb self-requested a review March 17, 2026 17:04
public static final String TABLE_ID = "table_id";
public static final String TABLET_ID = "tablet_id";
public static final String PARTITION_ID_KEY = "partition_id";
public static final String XREPL_ORIGIN_ID = "xrepl_origin_id";
Copy link
Member

@shishir2001-yb shishir2001-yb Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this to below to match the upstream Debezium behaviour:
public static final String XREPL_ORIGIN_ID = "origin";
https://github.com/debezium/debezium/pull/7009/files

private String tabletId;
private Long commitTime;
private Long recordTime;
private int xreplOriginId;
Copy link
Member

@shishir2001-yb shishir2001-yb Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we should keep it as String instead of int so that we are similar to Upstream Debezium. Thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. In upstream Postgres Debezium, origin is a string because Postgres's CDC stream provides the replication origin name. In YugabyteDB, the CDC proto only surfaces xrepl_origin_id as a uint32 — the origin name lives in the PG catalog (pg_replication_origin) and isn't accessible from the CDC/DocDB layer.

We could stringify the int to match the upstream schema type, but then consumers would get "42" instead of a meaningful name like "my_origin" - which feels misleading. I'd prefer to keep it as int since that's what YB actually provides, and rename the field to "origin" for partial alignment with upstream.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm true, then I believe we should not use the below as it may cause confusion if a cluster is consuming data from both Logical and gRPC

public static final String XREPL_ORIGIN_ID = "origin";

Maybe we can change it to below?

public static final String XREPL_ORIGIN_ID = "origin_id";

@shishir2001-yb
Copy link
Member

Minor comments but overall looks good

- Rename source info field from "xrepl_origin_id" to "origin" to align
  with upstream Debezium naming convention
- Add xreplOriginId to SourceInfo.toString() when non-zero for debugging
- Add YugabyteDBReplicationOriginTest with integration tests covering
  origin propagation on DML events, omission when zero, and multiple
  origins in the same session
@shishir2001-yb
Copy link
Member

@kgalieva, which YB-client version are you using? Compiling your PR changes throws me this error

[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  8.312 s
[INFO] Finished at: 2026-03-20T08:58:50Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project debezium-connector-yugabytedb: Compilation failure
[ERROR] /home/ec2-user/code/debezium-connector-yugabytedb/src/main/java/io/debezium/connector/yugabytedb/connection/pgproto/YbProtoReplicationMessage.java:[180,26] cannot find symbol
[ERROR]   symbol:   method getXreplOriginId()
[ERROR]   location: variable rawMessage of type org.yb.cdc.CdcService.RowMessage
[ERROR] 
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

@kgalieva
Copy link
Author

@shishir2001-yb I updated the client version, ready to run the tests

@shishir2001-yb
Copy link
Member

@kgalieva, just one comment needs to be addressed #400 (comment)

sb.append(", table=").append(tableName);
}
if (xreplOriginId != 0) {
sb.append(", origin=").append(xreplOriginId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this to origin_id

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants