Skip to content

[Feature] Support Avro format for reading and writing Paimon data #31

@dalingmeng

Description

@dalingmeng

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Modern lakehouse formats such as Apache Iceberg and Paimon primarily use columnar formats (e.g., Parquet, ORC) for data files to optimize analytical workloads, while Avro—a row-based storage format—also plays a critical role for its compactness and efficiency. For example, in the Java implementation of Paimon, Avro is the default format for manifest files.

Solution

We propose adding full Avro read and write support to the Paimon C++ SDK, compatible with Paimon Java’s Avro file.

Core Components
Avro Reader

  • Parse .avro files.
  • Convert Avro schemas to Arrow types.

Avro Writer

  • Serialize Arrow Array to Avro format with proper schema.
  • Ensure compatibility with Paimon Java’s Avro output.

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions