Skip to content

[Feature Request]: Iceberg Input&Output as standard component #5524

@kstna23

Description

@kstna23

What would you like to happen?

Feature Request: Support for Apache Iceberg Format in ETL Tool

Overview

We're requesting support for the Apache Iceberg table format within the ETL tool. Specifically, this would include:

  • Input components to read from Iceberg tables

  • Output components to write to Iceberg tables

  • Support for connecting to a meta-repository (e.g. Hive Metastore, AWS Glue, REST Catalog)

This would enable teams to build modern, scalable data pipelines using Iceberg, all within a low-code interface.


Why Iceberg?

Apache Iceberg is gaining traction as a core component in modern data lakehouse architectures. It brings many of the benefits of data warehouses—like ACID transactions and schema management—to cloud storage.

Key reasons to consider Iceberg:

  • Schema and Partition Evolution: Modify table structures without rewriting entire datasets.

  • Time Travel: Easily query previous versions of your data.

  • ACID Transactions: Safe concurrent reads/writes without corrupting data.

  • Performance: Works well with large-scale datasets thanks to partition pruning, metadata filtering, and columnar reads.

  • Open and Engine-Agnostic: Supported by Spark, Flink, Trino, Presto, and others.


Why It Matters for a Low-Code ETL Tool

Adding native Iceberg support would:

  • Enable Non-Engineers: Business and data analysts could work with Iceberg tables without writing Spark jobs or scripts.

  • Streamline Workflows: Build end-to-end data flows from source to Iceberg without leaving the ETL tool.

  • Reduce Duplication: Avoid needing to copy Iceberg data into other formats or systems just to make it usable in the pipeline.

  • Stay Relevant: Many organizations are shifting away from traditional data lakes to Iceberg-based lakehouses. Native support ensures the ETL tool stays compatible with current trends.


Suggested Components

Component | Purpose -- | -- Iceberg Input | Read data from existing Iceberg tables Iceberg Output | Write or upsert data into Iceberg tables Iceberg Meta Connector | Connect to catalog services (e.g. Hive, Glue) for table discovery and schema management

Final Thoughts

Iceberg is becoming a critical part of the modern data stack. Adding it to the ETL tool—especially in a low-code way—would open up new use cases and attract teams working on modern, scalable architectures.

Let us know if you'd like help drafting a more detailed spec or use case doc.

Issue Priority

Priority: 3

Issue Component

Component: Metadata

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions