-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Summary
Introduce first-class database branches that allow users to create isolated, lightweight copies of a database (or a schema/table subset), make changes safely, and merge them back using LatticeDB’s existing CRDT/merge policies and bitemporal lineage. This unlocks dev/test workflows, data science sandboxes, blue/green changes, and collaborative modeling—without duplicating terabytes or leaving the engine.
Motivation
- Safe experimentation: Run migrations, vector-index builds, DP policy changes, or query tuning without risking prod.
- Reproducible ML/analytics: Branch, transform, train, and later reconcile results.
- Ops velocity: Blue/green data changes and hotfixes with clear review & rollback paths.
- Leverage strengths: LatticeDB already has CRDT merge policies, MVCC, and bitemporal history—perfect building blocks for branching.
Proposed Design
1) Core Concepts
- Branch: A named, immutable base pointer plus a mutable HEAD for writes (copy-on-write at page/extent granularity).
- Base: Any commit-like snapshot:
main@TX=…or timepoint (AS OF '2025-09-22T12:00Z'). - Merge: 3-way merge:
base,target,source. Conflicts resolved via per-column merge policies (e.g.,lww,sum_bounded,gset, or custom WASM resolvers). - Scope: Whole DB, schema, or table-level branches.
2) SQL / CLI Surface
SQL DDL
-- Create a branch from current main
CREATE BRANCH feature_price_optim FROM main;
-- From a time-travel snapshot
CREATE BRANCH audit_fix FROM main FOR SYSTEM_TIME AS OF TX 12345;
-- Switch session to a branch
SET CURRENT_BRANCH = 'feature_price_optim';
-- Merge feature branch back into main, policy-aware
MERGE BRANCH feature_price_optim INTO main
WITH (dry_run = true, strategy = 'policy'); -- dry-run shows plan & conflicts
-- Promote after review
MERGE BRANCH feature_price_optim INTO main
WITH (commit_message = 'Introduce dynamic pricing v2');
-- Show differences
SHOW DIFF BETWEEN feature_price_optim AND main
WHERE schema = 'sales' AND table = 'orders';
-- Delete/GC
DROP BRANCH feature_price_optim;CLI
latticedb branch create feature_price_optim --from main
latticedb branch list
latticedb branch diff feature_price_optim main --tables sales.orders
latticedb branch merge feature_price_optim main --dry-run
latticedb branch merge feature_price_optim main --commit-message "pricing v2"
latticedb branch drop feature_price_optim
3) GUI/UX
-
Branch switcher in header (like git branch picker).
-
Diff viewer:
- DDL diffs (added/removed indexes, policy changes).
- DML diffs (row counts, sampled row deltas, histogram deltas).
- Vector index deltas (HNSW/IVF params & metadata).
-
Merge preview with conflict counts per table/column + the policy applied.
-
One-click dry-run → PR-style review → Merge workflow.
-
Badges in query history indicating current branch and base snapshot.
4) Storage/Engine Considerations
- Copy-on-Write (CoW): Page/extent-level CoW in the Unified Log-Structured Storage to keep branches cheap.
- WAL tagging: Tag records with
{branch_id, tx_id}; reuse existing MVCC to isolate branch snapshots. - Checkpointing: Branch HEADs become named checkpoints; merging creates a new checkpoint on target.
- GC/Retention: Reference counting of pages across branches; policy-driven TTL for stale branches;
VACUUM BRANCH. - Indexes: Lazy materialization for secondary/B+Tree and ANN indexes; rebuild on first touch or during
MERGE ... WITH (prebuild_indexes=true).
5) Security & Governance
- RLS/CLS inheritance: Policies copy from base; diffs flagged on merge.
- Auditability: Create/merge/drop events emitted to audit log with actor, rationale, diff summary.
- Role controls: Grants to create/merge branches, and scoped branches (table/schema).
6) Interactions with Temporal & Lineage
- Branches record their base transaction/time. On merge, produce a lineage edge capturing 3-way sources and policies applied.
lineage_explain()extended to show branch merges as first-class events.
Alternatives Considered
- Full physical clones (expensive in space/time).
- External CDC/ETL sandboxes (breaks ACID and lineage continuity).
- Single-table shadow copies (awkward for cross-table constraints & queries).
Risks & Mitigations
- Storage bloat: Mitigate with CoW, page ref-counts, and branch TTLs.
- Merge surprises: Require dry-run by default in GUI; per-table acceptance gates.
- Index skew: Offer
prebuild_indexesoption and background index build in branch.
Milestones
-
MVP (engine + CLI)
- CREATE/SET/DROP BRANCH
- Read/write isolation using MVCC + branch_id
- CoW for data pages
- Basic MERGE (policy =
lwwonly) SHOW DIFFsummaries (row counts, DDL change list)
-
Policy-Aware Merge
- Respect existing column merge policies (lww, gset, sum_bounded)
- WASM resolver hook
- Detailed conflict reporting
-
GUI
- Branch picker, diff viewer, merge preview
- Dry-run with PR-style review & approvals
-
Performance & Ops
- Background index (re)builds on branch
VACUUM BRANCH, retention policies, telemetry- Audit trail & lineage integration
Acceptance Criteria
-
Branch creation < 1s for 100GB base (no data copy).
-
Branch reads/writes isolated;
SELECTonmainunaffected by branch mutations. -
MERGE ... --dry-runoutputs:- DDL diffs
- Per-table DML summary (inserts/updates/deletes)
- Conflict counts per column with policy to apply
-
Successful merge applies CRDT/WASM resolvers correctly (tests included).
-
GUI shows end-to-end flow with clear conflict visualization.
Out of Scope (for now)
- Cross-cluster branch replication.
- Multi-target fan-in merges.
- Binary artifact branching for UDFs beyond current WASM hooks.
Open Questions
- Should branches be nestable/hierarchical (e.g.,
feature/a/b) with inherited retention? - Need
READ FROM branch X JOIN branch Yin a single query for comparisons? - Default retention/TTL for inactive branches?
Testing
- Unit: MVCC with branch_id, CoW semantics, WAL tags.
- Integration: DDL/DML diffs, policy-aware merges (including WASM).
- Performance: Space amplification, merge latencies, index rebuild times.
- Security: RLS/CLS parity across branches, audit event emission.
Labels
enhancement engine gui sql mvcc merge-policies good-first-issue (for CLI/GUI sub-tasks)
Checklist
- Engine: branch metadata, CoW, WAL tagging
- SQL: CREATE/SET/DROP/MERGE/SHOW DIFF
- Policy integration + WASM hook
- CLI plumbing & docs
- GUI branch switcher + diff + merge preview
- GC/retention & VACUUM BRANCH
- Tests & benchmarks
- Docs (README + DESIGN/ARCHITECTURE updates)
Metadata
Metadata
Assignees
Labels
Projects
Status