Replies: 1 comment 2 replies
-
|
Interesting idea! so branching (in the RFC) is essentially a deep copy of the metadata? I am thinking about an alternative way to branching. Instead of implementing the idea in DuckLake, could we branch the PostgreSQL directly, based on Neon branch, which enables a full catalog fork (instant and copy-on-write). This way, no modifications (other than GC I guess) would be required in DuckLake side. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
[RFC] Git-style Branching for DuckLake
Hi DuckLake team,
First, thank you for building DuckLake. The architecture—Parquet files + pluggable metadata backends + time travel—is exactly what data lakes need. I've been using it for financial data workflows and it's been a joy to work with.
I've implemented Git-style branching on top of DuckLake and wanted to share the design to see if it aligns with your roadmap. Happy to make changes, split into smaller PRs, or adjust the approach based on your feedback.
Problem Statement
Teams working on shared data need isolated workspaces to:
Current workarounds require full data copies—expensive and slow.
Potential Solution
Zero-copy branching at the metadata layer:
AT (BRANCH => 'name')syntaxSchema Changes
New Tables (10)
Core (2):
ducklake_branch(branch_id, branch_name, parent_branch_id, fork_snapshot_id, head_snapshot_id, status, created_at)ducklake_branch_lineage(branch_id, ancestor_branch_id, max_visible_snapshot)Deletion tracking (8): Same pattern for files, tables, schemas, views, columns, macros, partitions, delete_files:
ducklake_branch_*_deletion(branch_id, ancestor_branch_id, object_id, deleted_at_snapshot)Modified Tables (24)
All existing metadata tables gain
branch_id BIGINT NOT NULL DEFAULT 0:Open Questions
I'd appreciate your feedback on:
Does this align with your roadmap? Is branching something you've considered for DuckLake?
Merge support: Should we support merging branches?
Garbage collection: When can shared files be deleted?
Migration path: For existing catalogs, should we:
Naming conventions: Are function names clear, or should they match existing DuckLake patterns more closely?
Summary
This implementation adds branching with:
I'm grateful for DuckLake and excited about the possibility of contributing. Happy to:
Looking forward to your thoughts!
Beta Was this translation helpful? Give feedback.
All reactions