Kafka Connect Expand JSON Transform (SMT)

Kafka Connect Single Message Transform (SMT) to parse JSON objects from given source field strings and expand them into appropriate Connect API structures.

Background

Inspired by RedHatInsights ExpandJSON, but aims to overcome the following issues:

Reliance on several additional dependencies including BSON for handling parsing of the actual JSON
Arrays always receive a hard-coded additional Struct layer with a single field with the name array in them and there is no way to configure or remove this from the result.
Schema name and namespace of the generated Structs are empty and cannot be controlled easily for nested structures by other mainstream SMTs (namely SetSchemaMetadata)
ExpandJSON only supports Schema records and only in the Value; schemaless records and keys are not supported.

This new ExpandJson SMT tries address the above as follows:

Use of no additional dependencies outside of what will already be present in the Connect runtime (namely, using only the org.apache.kafka artifacts connect-api, connect-transforms, and connect-json).
Uses existing functionality to parse and convert JSON data via JsonConverter, and aims to infer the schema in a "standardized" way by implementing roughly the same logic that is suggested in KIP-301: Schema Inferencing for JsonConverter; it should be quite trivial to switch to using this feature directly if/once it is implemented.
No hard-coded struct layer with the field name array for arrays (again matching the proposal from KIP-301).
Ability to set a schema.name.prefix in order to control the namespace and local name of the inferred nested schemas.
Support for expanding JSON in Key as well as value.
Support for expanding entire key or value content in case of schemaless records.

Potential improvements

Support to address nested fields, potentially using a dot-based notation (currently only root-level fields are supported).
Ability to support specific Map or potentially other fields for schemaless records instead of only whole-record parsing and expansion.
Possible ability to control which data types are chosen for each inferred type (e.g. should inferred integers be stored as long instead? etc)
Possible ability to control if fields should be inferred as optional vs required

Installation

I have not yet really deployed this anywhere, but as this is a single and very tiny JAR file with no other non-Kafka dependencies, then it is quite easy to just take the JAR file from latest release here and put it into your Kafka Connect plugins folder.

You can also fetch it via https://jitpack.io using Maven, Gradle, etc if desired.

Example usage

"transforms": "ExpandJson",
"transforms.ExpandJson.type": "com.github.joshuagrisham.kafka.connect.transforms.ExpandJson$Value",
"transforms.ExpandJson.fields": "someJsonTextField,anotherJsonTextField",
"transforms.ExpandJson.schema.name.prefix": "com.github.joshuagrisham.kafka.test.MyJsonRecord"

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
jitpack.yml		jitpack.yml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Kafka Connect Expand JSON Transform (SMT)

Background

Potential improvements

Installation

Example usage

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

joshuagrisham/kafka-connect-expand-json-transform

Folders and files

Latest commit

History

Repository files navigation

Kafka Connect Expand JSON Transform (SMT)

Background

Potential improvements

Installation

Example usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages