Skip to content

Conversation

@IvanPartsunev
Copy link
Contributor

Summary

  • Add stream_timestamp property to DeltaLocator and Delta models for tracking delta commit times in Unix milliseconds
  • Add list_partition_deltas_by_timestamp method to list deltas sorted by commit time (newest first by default)
  • Add get_latest_delta_by_timestamp method to fetch the most recently committed delta

Motivation

For unordered (ADD) deltas, the existing stream_position ordering may not reflect the actual commit sequence. This feature enables:

  • Retrieving deltas in order of when they were actually committed
  • Supporting use cases where temporal ordering matters for incremental processing

Changes

  • deltacat/storage/model/delta.py: Added stream_timestamp property with Unix milliseconds validation (1_000_000_000_000 to 9_999_999_999_999)
  • deltacat/storage/main/impl.py:
    • Set stream_timestamp automatically when committing deltas
    • Added list_partition_deltas_by_timestamp with 1-based position slicing
    • Added get_latest_delta_by_timestamp convenience method
  • deltacat/storage/interface.py: Added function signatures for new methods

Test plan

  • Unit tests for stream_timestamp property validation on DeltaLocator
  • Unit tests for stream_timestamp delegation on Delta model
  • Tests for list_partition_deltas_by_timestamp with various scenarios (empty partition, ascending/descending, position filtering)
  • Tests for get_latest_delta_by_timestamp with empty/non-empty partitions
  • All existing tests pass
  • Lint checks pass

streamTimestamp is a system property set at commit time, so it should
be excluded from equivalence comparisons (like IDs and other locators).
This fixes test_recursive_cross_catalog_copy failing because copied
deltas have different timestamps than originals.
@IvanPartsunev IvanPartsunev reopened this Jan 26, 2026
@pdames pdames self-requested a review January 28, 2026 07:21
@pdames
Copy link
Member

pdames commented Feb 1, 2026

Nice - thanks for taking this on @IvanPartsunev! I like the idea of using a tuple for the stream position to ensure that no two timestamped deltas present conflicts - it's clever. However, since it does introduce the complexity of delta model parsers needing to now always check whether they're dealing with a legacy 1-part or new 2-part stream position, I've been wondering if we might be able to achieve the same end result using the existing 64-bit int delta stream position.

One option may be to introduce a new CHRONO write mode and corresponding delta type. Although I'm not a huge fan of continuing to proliferate new write modes, I think this would help keep delta stream position parsing consistent across the codebase, and make the intent more clear both at delta write and read time.

Right now, we currently set aside the 0 to UINT32_MAX stream position range for strictly ordered deltas, and then everything from UINT32_MAX +1 to UINT64_MAX for unordered ADD deltas. I think we can safely carve another partition out of the unordered delta range from UINT32_MAX + 1 to UINT48_MAX for CHRONO deltas that use millisecond precision epoch timestamps, and revise the stream position range reserved for unordered ADD deltas down to UINT48_MAX +1 to UINT64_MAX.

A millisecond-precision timestamp today already requires 41 bits, and 48 bits is enough to avoid an overflow until the year 10889. It also leaves a small 1 in 65536 chance that a random 64-bit ADD delta stream position falls within the reserved UINT48 range (which should be immediately resolved via retry at https://github.com/ray-project/deltacat/blob/2.0/deltacat/storage/main/impl.py#L2806-L2807).

Also, since CHRONO should behave the same as ADD in terms of (1) raising an error when trying to write to table with merge keys and (2) being treated the same during compaction (i.e., concatenated in the order presented) then any existing tables that have any existing ADD deltas within this range should continue to behave the same.

WDYT?

be greater than that of any prior delta in the partition.
Creates a partition delta locator.

Stream position, if provided, should be greater than that of any prior delta
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back on it, this comment has been incorrect since ADD deltas were introduced:

Suggested change
Stream position, if provided, should be greater than that of any prior delta
For APPEND, UPSERT, and DELETE deltas the stream position should be greater than that of any prior delta

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants