Skip to content

Feature: Bitemporal Queries: Correctness, Pruning, and Snapshot Reproducibility #5

@hoangsonww

Description

@hoangsonww

Summary

Tighten semantics and performance for time-travel and valid-time queries. Specifically:

  • Make FOR SYSTEM_TIME AS OF … and VALID PERIOD [from,to) provably correct against row version bounds.
  • Add temporal index pruning and predicate pushdown so scans avoid non-overlapping versions.
  • Introduce snapshot tokens to guarantee reproducible results for “as of TX/Timestamp” queries.
  • Harden tuple (record) parsing around open/closed interval edges and null/±∞ bounds.

Goals

  • Correct evaluation of overlap between (tx_from, tx_to] and (valid_from, valid_to] vs. query cut(s).
  • Optimizer can prune segments/rowgroups that cannot match temporal predicates.
  • Stable SnapshotHandle (logical tx id + wallclock + epoch) that pins MVCC horizon.
  • Clear rules for inclusivity: FOR SYSTEM_TIME AS OF t uses tx_from ≤ t ≤ tx_to (or tx_to=∞); VALID PERIOD [a,b) uses half-open.
  • Observable speed-ups on temporal filters in tests/ microbenchmarks.

Non-Goals

  • Full lineage GUI—CLI/SQL lineage_explain() only.
  • Multi-shard causal snapshots (future work).

Design

1) Canonical Temporal Overlap

Implement helpers (header-only temporal.h):

struct Interval { Timestamp lo; Timestamp hi; enum Bound {Open, Closed}; Bound lo_b, hi_b; };
inline bool overlaps(const Interval& row, const Interval& q);
inline bool contains_point(const Interval& row, Timestamp t);
  • Represent “infinite” as kMaxTs and kMinTs.
  • Centralize inclusivity so executor and optimizer share logic.

2) Storage: Temporal Min/Max on Chunks

Augment row-page/columnar metadata with:

chunk.tx_min, chunk.tx_max
chunk.valid_min, chunk.valid_max
  • Filled during append/compaction.
  • Exposed to optimizer as TemporalStats.

3) Optimizer: Temporal Pruning & Pushdown

  • Translate FOR SYSTEM_TIME AS OF t to contains_point(tx_interval,t).
  • Translate VALID PERIOD [a,b) to overlaps(valid_interval,[a,b)).
  • Add a pruning rule: if overlaps(chunk.interval, query.interval)==false → skip.
  • Push comparisons down to scan so the executor short-circuits early.

4) Snapshot Reproducibility

  • Add CREATE SNAPSHOT (internal) generating SnapshotHandle { lsn, clock_ts, epoch }.
  • SET SNAPSHOT '<handle>' pins MVCC visibility + disables “fuzzy” reads.
  • FOR SYSTEM_TIME AS OF TX <id> resolved via handle→LSN table when available.

5) Edge-Case Parsing Hardening

  • Normalize tuples with missing *_to to .
  • Guarantee from ≤ to invariant; flip or error with diagnostics if violated.
  • Unit tests for boundary conditions: exact equality on lo/hi, open vs closed.

6) Observability

  • Tracing spans: temporal.prune, temporal.overlap_check, snapshot.pin.
  • Metrics: temporal_pruned_chunks_total, temporal_scan_rows_filtered_total.

Acceptance Criteria

  • FOR SYSTEM_TIME AS OF and VALID PERIOD pass new spec tests for boundaries, null/∞, and equal endpoints.
  • ✅ Temporal pruning reduces scanned chunks by ≥80% on synthetic skewed history (added benchmark).
  • SET SNAPSHOT guarantees identical results across repeated runs while active.
  • ✅ No regressions on non-temporal queries (CI).

Tasks

Engine / Executor

  • Implement Interval + overlap/contain helpers.
  • Wire helpers into scan node temporal predicate evaluation.

Storage

  • Persist tx_min/max, valid_min/max in chunk/segment metadata.
  • Populate during append/compaction; backfill on load if absent.

Optimizer

  • Rule: derive temporal constraints from SQL → logical predicates.
  • Rule: chunk pruning using TemporalStats.
  • Pushdown: attach fast path to scan node.

Snapshotting

  • SnapshotHandle struct + serialization.
  • SET SNAPSHOT / RESET SNAPSHOT plumbing to MVCC visibility.
  • Map AS OF TX to handle when present; fall back to current mapping.

Tests

  • Unit: interval math (open/closed, ∞, equality).
  • Integration: FOR SYSTEM_TIME AS OF & VALID PERIOD edge cases.
  • Benchmark: history depth vs. pruned chunks and wall time.
  • Fuzz: random intervals, ensure no mismatches vs. reference evaluator.

Docs

  • Update README.md (“Bitemporal Time Travel & Lineage”) with inclusivity table.
  • Add docs/temporal-semantic-tests.md and optimizer note on pruning.

Risks & Mitigations

  • Incorrect inclusivity semantics → lock them in docs + golden tests.
  • Stats staleness after compaction → rebuild on compaction; keep conservative.
  • Snapshot leaks → RAII guard that auto-resets at transaction end.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingdocumentationImprovements or additions to documentationenhancementNew feature or requestgood first issueGood for newcomershelp wantedExtra attention is neededquestionFurther information is requested

Projects

Status

Backlog

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions