Skip to content

Conversation

@OneCricketeer
Copy link

@OneCricketeer OneCricketeer commented Aug 24, 2018

Inspired by https://jobs.zalando.com/tech/blog/backing-up-kafka-zookeeper/
Code: https://github.com/imduffy15/kafka-connect-s3

Reasoning: Without a whole separate Connector implementation than Confluent's, we wanted a way to backup the holistic Kafka (Avro-encoded) payload to S3 for querying in downstream systems, and this transform seemed useful for that, but then we started getting requests to read that S3 data back into Kafka, and were conflicted with how to take an Avro file and convert it back into the Schema Registry wire format. I think issues arose with how to strip the connect.meta string information, for example, in the schema that would cause a different ID in the registry. Basically, we didn't want to push old schemas over any new ones in the registry; in other words, no new IDs should get pushed onto a subject.

So, we want to preserve the Avro schema ID that's already within the bytes, and not have the overhead of the AvroConverter translating formats back and forth and doing lookups against a registry.

Unless I am mistaken, the original transform code here + ByteArrayConverter doesn't quite do what we want (thinking back, I don't know if we actually tried that 🤔).

Therefore, I am contributing this code and looking for feedback @jcustenborder . Thanks!

Note: I wrote this code before the schemaless support, so that feature is missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant