-
Notifications
You must be signed in to change notification settings - Fork 434
Description
What would you like to happen?
Feature Request: Support for Apache Iceberg Format in ETL Tool
Overview
We're requesting support for the Apache Iceberg table format within the ETL tool. Specifically, this would include:
-
Input components to read from Iceberg tables
-
Output components to write to Iceberg tables
-
Support for connecting to a meta-repository (e.g. Hive Metastore, AWS Glue, REST Catalog)
This would enable teams to build modern, scalable data pipelines using Iceberg, all within a low-code interface.
Why Iceberg?
Apache Iceberg is gaining traction as a core component in modern data lakehouse architectures. It brings many of the benefits of data warehouses—like ACID transactions and schema management—to cloud storage.
Key reasons to consider Iceberg:
-
Schema and Partition Evolution: Modify table structures without rewriting entire datasets.
-
Time Travel: Easily query previous versions of your data.
-
ACID Transactions: Safe concurrent reads/writes without corrupting data.
-
Performance: Works well with large-scale datasets thanks to partition pruning, metadata filtering, and columnar reads.
-
Open and Engine-Agnostic: Supported by Spark, Flink, Trino, Presto, and others.
Why It Matters for a Low-Code ETL Tool
Adding native Iceberg support would:
-
Enable Non-Engineers: Business and data analysts could work with Iceberg tables without writing Spark jobs or scripts.
-
Streamline Workflows: Build end-to-end data flows from source to Iceberg without leaving the ETL tool.
-
Reduce Duplication: Avoid needing to copy Iceberg data into other formats or systems just to make it usable in the pipeline.
-
Stay Relevant: Many organizations are shifting away from traditional data lakes to Iceberg-based lakehouses. Native support ensures the ETL tool stays compatible with current trends.
Suggested Components
Final Thoughts
Iceberg is becoming a critical part of the modern data stack. Adding it to the ETL tool—especially in a low-code way—would open up new use cases and attract teams working on modern, scalable architectures.
Let us know if you'd like help drafting a more detailed spec or use case doc.
Issue Priority
Priority: 3
Issue Component
Component: Metadata