Replies: 2 comments 2 replies
-
|
@yjhjstz - it is really cool feature! Couple questions:
PS: it seems that links in References paragraph had lost. |
Beta Was this translation helpful? Give feedback.
1 reply
-
|
@yjhjstz it is really cool feature! but I have some questions
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Proposers
@yjhjstz @my-ship-it
Proposal Status
Under Discussion
Abstract
Cloudberry Multi-Catalog System Design Proposal
Based on the comprehensive analysis provided and research into Apache Doris's Iceberg Catalog implementation, I'll create a detailed proposal for Cloudberry's multi-catalog system design. The key idea is that no
CREATE FOREIGN TABLEis required. usageExecutive Summary
Apache Cloudberry, as an advanced open-source MPP database derived from PostgreSQL[1][2], has a unique opportunity to implement a cutting-edge multi-catalog system that extends its current two-tier naming (database.table) to a three-tier hierarchy (catalog.database.table)[3]. This proposal outlines a comprehensive design for a plugin-based catalog framework that enables federated queries across multiple data sources while maintaining PostgreSQL compatibility and leveraging Cloudberry's MPP architecture.
Motivation
Background and Motivation
Apache Cloudberry currently supports external data access through Foreign Data Wrappers (FDW)[4][5] and Platform Extension Framework (PXF)[6], but lacks the unified multi-catalog approach that modern data lakehouse architectures demand. Apache Doris has successfully implemented multi-catalog functionality[3][7] that enables seamless integration with external data sources like Iceberg, Hive, and JDBC databases. This capability is crucial for organizations building modern data platforms that need to query across heterogeneous data sources.
The proposed multi-catalog system addresses several key challenges:
Implementation
Core Architecture Design
1. Three-Tier Naming Hierarchy
The system extends Cloudberry's current naming convention from
database.tabletocatalog.database.table, enabling:2. Plugin-Based Catalog Framework
The architecture follows a layered approach similar to Apache Doris's multi-catalog implementation[3]:
3. Metadata Virtualization Subsystem (MVS)
The MVS serves as the core component that manages catalog registration, metadata caching, and dynamic schema resolution[8]. Key components include:
Catalog Registry
Name Resolution Engine
Virtual Object Factory
Rust+pgrx Iceberg Plugin Implementation
Technical Architecture
The Iceberg plugin leverages modern Rust capabilities through the pgrx framework[9][10], which provides safe and efficient PostgreSQL extension development:
Key Advantages
Iceberg Integration Features
Based on Apache Doris's comprehensive Iceberg support[13][14], the plugin will provide:
FDW Integration Strategy
The multi-catalog system integrates seamlessly with Cloudberry's existing Foreign Data Wrapper infrastructure[4][5]:
Query Execution Flow
MPP-Aware Design
SQL-DDL-USAGE
Performance Optimization
Multi-Level Caching Strategy
Query Optimization
Implementation Roadmap
Phase 1: Core Infrastructure
Phase 2: Rust Iceberg Plugin
Phase 3: FDW Integration
Phase 4: Advanced Features
Phase 5: Production Hardening
Benefits and Impact
For Users
For Ecosystem
Technical Advantages
Conclusion
This multi-catalog system design positions Apache Cloudberry as a leading open-source data analytics platform capable of competing with modern data lakehouse solutions. By leveraging proven architectural patterns from Apache Doris[13][3] and implementing them with modern technologies like Rust and pgrx[9][10], Cloudberry can provide a robust, performant, and maintainable solution for federated data analytics.
The proposed system maintains full backward compatibility while enabling powerful new capabilities for modern data architectures. The plugin-based approach ensures extensibility, while the Rust implementation provides safety and performance that traditional C extensions cannot match.
This proposal represents a significant step forward for Apache Cloudberry in the competitive data analytics landscape, providing users with the tools they need to build modern, scalable data platforms.
References
[1] Announcing Cloudberry Database Enters the Apache Incubator https://www.postgresql.org/about/news/announcing-cloudberry-database-enters-the-apache-incubator-2976/
[2] Apache Cloudberry (Incubating) - The Apache Software Foundation https://cloudberry.apache.org/
[3] Build a federated query solution with Apache Doris, Apache Flink ... https://dev.to/apachedoris/build-a-federated-query-solution-with-apache-doris-apache-flink-and-apache-hudi-40io
[4] ALTER FOREIGN DATA WRAPPER - Apache Cloudberry (Incubating) https://cloudberry.apache.org/docs/1.x/sql-stmts/alter-foreign-data-wrapper/
[5] CREATE FOREIGN DATA WRAPPER https://cloudberry.apache.org/docs/sql-stmts/create-foreign-data-wrapper/
[6] apache/cloudberry-pxf: Platform Extension Framework ... - GitHub https://github.com/apache/cloudberry-pxf
[7] Apache Doris speeds up data reporting, tagging, and data lake ... https://www.velodb.io/blog/152
[8] Metadata and Data Virtualization Explained - Altoros https://www.altoros.com/blog/metadata-and-data-virtualization/
[9] Develop Database Extensions Using PGRX https://cloudberry.apache.org/docs/developer/develop-extensions-using-rust/
[10] pgcentralfoundation/pgrx: Build Postgres Extensions with Rust! https://github.com/pgcentralfoundation/pgrx
[11] iceberg - Rust - Docs.rs https://docs.rs/iceberg
[12] apache/iceberg-rust - GitHub https://github.com/apache/iceberg-rust
[13] When Doris Meets Iceberg: A Data Engineer's Redemption - DZone https://dzone.com/articles/when-doris-meets-iceberg
[14] Using Doris and Iceberg - Apache Doris https://doris.apache.org/docs/dev/lakehouse/best-practices/doris-iceberg/
Rollout/Adoption Plan
No response
Are you willing to submit a PR?
Beta Was this translation helpful? Give feedback.
All reactions