-
|
We have several use cases where we need to write events out as Avro- or Parquet-encoded data (e.g. to S3 and Kafka) so that downstream systems in our data platform can consume them directly. Right now, the Avro support in Vector appears to be limited to using to_avro_datum. From the Because We’d like to improve Vector’s Avro compatibility so that it can:
Our goal is to make Vector’s Avro output “just work” with the broader Avro ecosystem (Java Avro, avro-tools, Spark, etc.), not only with custom consumers that know about the current to_avro_datum usage. Proposed approach If you’re open to this, we’d like to contribute this work as a series of small, incremental PRs that preserve existing behavior by default. Roughly:
We’d really appreciate feedback on:
(rewrote this to be more explicit) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
|
Because the batching/buffering for these sinks is done, using |
Beta Was this translation helpful? Give feedback.
-
Thanks for the detailed proposal, @jlambatl! We're cautiously supportive of this enhancement. You're correct that the current implementation uses
This is a great approach. |
Beta Was this translation helpful? Give feedback.
Thanks for the detailed proposal, @jlambatl! We're cautiously supportive of this enhancement. You're correct that the current implementation uses
to_avro_datum. Adding OCF support would indeed improve interoperability with Spark, Flink, and other tools in the Avro ecosystem.This is a great approach.