-
Notifications
You must be signed in to change notification settings - Fork 87
feat: support for schemas in JSON record builder #174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Some of the existing tests create a JsonRecordBuilder and start using it without configuring it. This is not valid and would not happen in real use as configure() is a required step in the Connect lifecycle. As I want to introduce new config for the record builder, I need to correct this mistake first. Signed-off-by: Dale Lane <[email protected]>
I've chosen to support Kafka Connect's JSON schema support, rather than the different (and more widely understood) JSON schema. While supporting "standard" JSON schema would have simplified the user config in some respects, this would have left the MQ Connector with responsibility of performing the (ambiguous) conversion from the user-provided JSON schema to the schema used in Connect. As there is not a 1:1 mapping between these two schema types, I think it would be difficult to do such a conversion in a way that always meets user expectations. Instead, by making the user provide a Connect JSON schema, I'm proposing forcing the user to manually convert any json schema they may already have into a Connect schema - forcing them to make the appropriate choices in mapping between the two type systems. This was a difficult trade-off to make, as I'm favouring unambiguity of config over ease of config (if we assume that more users are comfortable writing "standard" JSON schemas than Connect JSON schemas). To try and catch confusions in this, I've included a unit test to ensure that we reject non-Connect schemas. Signed-off-by: Dale Lane <[email protected]>
This commit introduces support for emitting structured records from the JSON record builder. This will allow the MQ Source Connector to read JSON string messages from MQ, and produce them to Kafka using any standard Converter (e.g. to produce them in Avro or Protobuf formats if desired). The JsonConverter dependency used in JSON record builder has support for this from Kafka Connect v4.2, so the simplest implementation would be to update the dependency in pom.xml to version 4.2, and just pass through the schemas.enable and schema.content configuration properties to the converter and leave the Converter to do everything. This felt like an overly aggressive dependency jump, so in the interest of continuing to support Connect 3.x versions, I've implemented a fall-back implementation that reuses the schema "envelope" approach present in JsonConverter 3.x The additional string operations this will incur for every message will almost certainly impact performance, so I see this as a temporary workaround that we should remove as soon as we feel that Connect 4.x adoption is sufficient. Signed-off-by: Dale Lane <[email protected]>
Signed-off-by: Dale Lane <[email protected]>
I didn't realise we were still using Java 8 to build the connector. Given that it has been deprecated in Kafka since v3.0 this doesn't seem like the right choice, but that feels like too significant a change to include in this pull request, so for now I'll just remove the newer syntax I'd used. Signed-off-by: Dale Lane <[email protected]>
For consistency with other places where this is done Signed-off-by: Dale Lane <[email protected]>
mark-VIII
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable to me, but I'll defer to @Joel-hanson as the SME to approve.
|
Thanks @mark-VIII - I really wanted an independent perspective on the decisions I've taken (around how users provide schemas, and the decision to use string concatenation as a |
|
@dalelane Okay. Some pieces sparked some recollection of our previous conversation regarding this, but let me see if I can build a more complete picture to pass comment... |
|
If the commit messages + pull request comment don't give a clear enough description of the (contentious?) decisions that need to be reviewed, please let me know and I'll try and rewrite it. |
src/test/java/com/ibm/eventstreams/connect/mqsource/MQSourceConnectorTest.java
Show resolved
Hide resolved
Joel-hanson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I don’t see any issues with this approach. I’ve left a few comments that might be helpful.
Signed-off-by: Dale Lane <[email protected]>
Signed-off-by: Dale Lane <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done my best to digest what is going on here and here are my comments based on my understanding:
I agree that we should force the user to comply with the Kafka Connect JSON schema as trying to do some form of mapping sounds overly complex and likely to introduce unexpected behaviour.
I can also see that we automatically detect the support for schemas in the JSON converter for compatibility, which is nice. However, I think I'm a little unclear on the embedded schema vs schema supplied via config behaviour. What happens if the user provides a schema via config AND embeds a schema within each message payload? Which of the schemas would take precedence?
Generally, I think the decisions that have been made for the design here are the right ones, but could we give the user a link out somewhere that gives them somewhere to start with how to handle mapping the schema for Kafka Connect?
That's an interesting point - I hadn't thought of that. We'd inherit the JsonConverter behaviour for this (which I had to go look up to find out). My reading of JsonConverter is that if you provide a schema it will be used, and if there is no provided schema the embedded message one is the fall back. |
|
@dalelane Yes, I'd agree with that from the logic there. Is it worth calling out that behaviour in the README for the schema content config option? I think anything that might clarify how you decide to set those two new options is useful. |
Signed-off-by: Dale Lane <[email protected]>
mark-VIII
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a suggestion to make the message clearer/more assertive, but otherwise I'm happy to approve at this stage given that Joel's feedback has been addressed.
src/main/java/com/ibm/eventstreams/connect/mqsource/MQSourceConnector.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Mark S Taylor <[email protected]> Signed-off-by: Dale Lane <[email protected]>
This pull request introduces support for emitting structured records from the JSON record builder. This will allow the MQ Source Connector to read JSON string messages from MQ, and produce them to Kafka using any standard Converter (e.g. to produce them in Avro or Protobuf formats if desired).
I've chosen to support Kafka Connect's JSON schema support, rather than the different (and more widely understood) JSON schema.
While supporting "standard" JSON schema would have simplified the user config in some respects, this would have left the MQ Connector with responsibility of performing the (ambiguous) conversion from the user-provided JSON schema to the schema used in Connect. As there is not a 1:1 mapping between these two schema types, I think it would be difficult to do such a conversion in a way that always meets user expectations.
Instead, by making the user provide a Connect JSON schema, I'm proposing forcing the user to manually convert any json schema they may already have into a Connect schema - forcing them to make the appropriate choices in mapping between the two type systems.
This was a difficult trade-off to make, as I'm favouring unambiguity of config over ease of config (if we assume that more users are comfortable writing "standard" JSON schemas than Connect JSON schemas). To try and catch confusions in this, I've included validation to ensure that we reject non-Connect schemas.
The JsonConverter dependency used in JSON record builder has support for this from Kafka Connect v4.2, so the simplest implementation would be to update the dependency in pom.xml to version 4.2, and just pass through the schemas.enable and schema.content configuration properties to the converter and leave the Converter to do everything.
This felt like an overly aggressive dependency jump, so in the interest of continuing to support Connect 3.x versions, I've implemented a fall-back implementation that reuses the schema "envelope" approach present in JsonConverter 3.x
The additional string operations this will incur for every message will almost certainly impact performance, so I see this as a temporary workaround that we should remove as soon as we feel that Connect 4.x adoption is sufficient.
Signed-off-by: Dale Lane [email protected]