feat: support for schemas in JSON record builder #174

dalelane · 2025-11-24T23:05:23Z

This pull request introduces support for emitting structured records from the JSON record builder. This will allow the MQ Source Connector to read JSON string messages from MQ, and produce them to Kafka using any standard Converter (e.g. to produce them in Avro or Protobuf formats if desired).

I've chosen to support Kafka Connect's JSON schema support, rather than the different (and more widely understood) JSON schema.

While supporting "standard" JSON schema would have simplified the user config in some respects, this would have left the MQ Connector with responsibility of performing the (ambiguous) conversion from the user-provided JSON schema to the schema used in Connect. As there is not a 1:1 mapping between these two schema types, I think it would be difficult to do such a conversion in a way that always meets user expectations.

Instead, by making the user provide a Connect JSON schema, I'm proposing forcing the user to manually convert any json schema they may already have into a Connect schema - forcing them to make the appropriate choices in mapping between the two type systems.

This was a difficult trade-off to make, as I'm favouring unambiguity of config over ease of config (if we assume that more users are comfortable writing "standard" JSON schemas than Connect JSON schemas). To try and catch confusions in this, I've included validation to ensure that we reject non-Connect schemas.

The JsonConverter dependency used in JSON record builder has support for this from Kafka Connect v4.2, so the simplest implementation would be to update the dependency in pom.xml to version 4.2, and just pass through the schemas.enable and schema.content configuration properties to the converter and leave the Converter to do everything.

This felt like an overly aggressive dependency jump, so in the interest of continuing to support Connect 3.x versions, I've implemented a fall-back implementation that reuses the schema "envelope" approach present in JsonConverter 3.x

The additional string operations this will incur for every message will almost certainly impact performance, so I see this as a temporary workaround that we should remove as soon as we feel that Connect 4.x adoption is sufficient.

Signed-off-by: Dale Lane [email protected]

Some of the existing tests create a JsonRecordBuilder and start using it without configuring it. This is not valid and would not happen in real use as configure() is a required step in the Connect lifecycle. As I want to introduce new config for the record builder, I need to correct this mistake first. Signed-off-by: Dale Lane <[email protected]>

I've chosen to support Kafka Connect's JSON schema support, rather than the different (and more widely understood) JSON schema. While supporting "standard" JSON schema would have simplified the user config in some respects, this would have left the MQ Connector with responsibility of performing the (ambiguous) conversion from the user-provided JSON schema to the schema used in Connect. As there is not a 1:1 mapping between these two schema types, I think it would be difficult to do such a conversion in a way that always meets user expectations. Instead, by making the user provide a Connect JSON schema, I'm proposing forcing the user to manually convert any json schema they may already have into a Connect schema - forcing them to make the appropriate choices in mapping between the two type systems. This was a difficult trade-off to make, as I'm favouring unambiguity of config over ease of config (if we assume that more users are comfortable writing "standard" JSON schemas than Connect JSON schemas). To try and catch confusions in this, I've included a unit test to ensure that we reject non-Connect schemas. Signed-off-by: Dale Lane <[email protected]>

This commit introduces support for emitting structured records from the JSON record builder. This will allow the MQ Source Connector to read JSON string messages from MQ, and produce them to Kafka using any standard Converter (e.g. to produce them in Avro or Protobuf formats if desired). The JsonConverter dependency used in JSON record builder has support for this from Kafka Connect v4.2, so the simplest implementation would be to update the dependency in pom.xml to version 4.2, and just pass through the schemas.enable and schema.content configuration properties to the converter and leave the Converter to do everything. This felt like an overly aggressive dependency jump, so in the interest of continuing to support Connect 3.x versions, I've implemented a fall-back implementation that reuses the schema "envelope" approach present in JsonConverter 3.x The additional string operations this will incur for every message will almost certainly impact performance, so I see this as a temporary workaround that we should remove as soon as we feel that Connect 4.x adoption is sufficient. Signed-off-by: Dale Lane <[email protected]>

Signed-off-by: Dale Lane <[email protected]>

I didn't realise we were still using Java 8 to build the connector. Given that it has been deprecated in Kafka since v3.0 this doesn't seem like the right choice, but that feels like too significant a change to include in this pull request, so for now I'll just remove the newer syntax I'd used. Signed-off-by: Dale Lane <[email protected]>

For consistency with other places where this is done Signed-off-by: Dale Lane <[email protected]>

mark-VIII

This looks reasonable to me, but I'll defer to @Joel-hanson as the SME to approve.

dalelane · 2025-11-25T16:39:18Z

Thanks @mark-VIII - I really wanted an independent perspective on the decisions I've taken (around how users provide schemas, and the decision to use string concatenation as a ~~hack~~ workaround to avoid needing to mandate Kafka Connect 4.2+) as much as a review of the implementation code specifically.

mark-VIII · 2025-11-25T16:42:22Z

@dalelane Okay. Some pieces sparked some recollection of our previous conversation regarding this, but let me see if I can build a more complete picture to pass comment...

dalelane · 2025-11-25T16:45:42Z

If the commit messages + pull request comment don't give a clear enough description of the (contentious?) decisions that need to be reviewed, please let me know and I'll try and rewrite it.

src/test/java/com/ibm/eventstreams/connect/mqsource/MQSourceConnectorTest.java

README.md

Joel-hanson

LGTM. I don’t see any issues with this approach. I’ve left a few comments that might be helpful.

Signed-off-by: Dale Lane <[email protected]>

mark-VIII

I've done my best to digest what is going on here and here are my comments based on my understanding:

I agree that we should force the user to comply with the Kafka Connect JSON schema as trying to do some form of mapping sounds overly complex and likely to introduce unexpected behaviour.

I can also see that we automatically detect the support for schemas in the JSON converter for compatibility, which is nice. However, I think I'm a little unclear on the embedded schema vs schema supplied via config behaviour. What happens if the user provides a schema via config AND embeds a schema within each message payload? Which of the schemas would take precedence?

Generally, I think the decisions that have been made for the design here are the right ones, but could we give the user a link out somewhere that gives them somewhere to start with how to handle mapping the schema for Kafka Connect?

dalelane · 2025-11-26T12:06:36Z

What happens if the user provides a schema via config AND embeds a schema within each message payload? Which of the schemas would take precedence?

That's an interesting point - I hadn't thought of that.

We'd inherit the JsonConverter behaviour for this (which I had to go look up to find out). My reading of JsonConverter is that if you provide a schema it will be used, and if there is no provided schema the embedded message one is the fall back.

mark-VIII · 2025-11-26T13:09:50Z

@dalelane Yes, I'd agree with that from the logic there. Is it worth calling out that behaviour in the README for the schema content config option? I think anything that might clarify how you decide to set those two new options is useful.

Signed-off-by: Dale Lane <[email protected]>

mark-VIII

Just a suggestion to make the message clearer/more assertive, but otherwise I'm happy to approve at this stage given that Joel's feedback has been addressed.

src/main/java/com/ibm/eventstreams/connect/mqsource/MQSourceConnector.java

Co-authored-by: Mark S Taylor <[email protected]> Signed-off-by: Dale Lane <[email protected]>

dalelane added 4 commits November 24, 2025 22:24

chore: prepare new release version

892c7f8

Signed-off-by: Dale Lane <[email protected]>

dalelane requested review from Joel-hanson and mark-VIII November 24, 2025 23:05

dalelane added 2 commits November 25, 2025 09:01

chore: specify charset when converting string to bytes

860a239

For consistency with other places where this is done Signed-off-by: Dale Lane <[email protected]>

mark-VIII reviewed Nov 25, 2025

View reviewed changes

Joel-hanson reviewed Nov 26, 2025

View reviewed changes

src/test/java/com/ibm/eventstreams/connect/mqsource/MQSourceConnectorTest.java Show resolved Hide resolved

Joel-hanson reviewed Nov 26, 2025

View reviewed changes

README.md Outdated Show resolved Hide resolved

Joel-hanson previously approved these changes Nov 26, 2025

View reviewed changes

test: unit test for valid JSON schema config

4067cbb

Signed-off-by: Dale Lane <[email protected]>

dalelane dismissed Joel-hanson’s stale review via 4067cbb November 26, 2025 11:21

docs: additional explanation about schema in README

0f10996

Signed-off-by: Dale Lane <[email protected]>

mark-VIII reviewed Nov 26, 2025

View reviewed changes

docs: add clarification to README

30e5221

Signed-off-by: Dale Lane <[email protected]>

mark-VIII previously approved these changes Nov 26, 2025

View reviewed changes

src/main/java/com/ibm/eventstreams/connect/mqsource/MQSourceConnector.java Outdated Show resolved Hide resolved

docs: update README

cdb924c

Co-authored-by: Mark S Taylor <[email protected]> Signed-off-by: Dale Lane <[email protected]>

dalelane dismissed mark-VIII’s stale review via cdb924c November 26, 2025 18:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support for schemas in JSON record builder #174

feat: support for schemas in JSON record builder #174

Uh oh!

dalelane commented Nov 24, 2025

Uh oh!

mark-VIII left a comment

Uh oh!

dalelane commented Nov 25, 2025

Uh oh!

mark-VIII commented Nov 25, 2025

Uh oh!

dalelane commented Nov 25, 2025

Uh oh!

Uh oh!

Uh oh!

Joel-hanson left a comment

Uh oh!

mark-VIII left a comment •

edited

Loading

Uh oh!

dalelane commented Nov 26, 2025

Uh oh!

mark-VIII commented Nov 26, 2025

Uh oh!

mark-VIII left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: support for schemas in JSON record builder #174

Are you sure you want to change the base?

feat: support for schemas in JSON record builder #174

Uh oh!

Conversation

dalelane commented Nov 24, 2025

Uh oh!

mark-VIII left a comment

Choose a reason for hiding this comment

Uh oh!

dalelane commented Nov 25, 2025

Uh oh!

mark-VIII commented Nov 25, 2025

Uh oh!

dalelane commented Nov 25, 2025

Uh oh!

Uh oh!

Uh oh!

Joel-hanson left a comment

Choose a reason for hiding this comment

Uh oh!

mark-VIII left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dalelane commented Nov 26, 2025

Uh oh!

mark-VIII commented Nov 26, 2025

Uh oh!

mark-VIII left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mark-VIII left a comment •

edited

Loading