-
Notifications
You must be signed in to change notification settings - Fork 44
feat: Add stream_timestamp support and timestamp-based delta listing #595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 2.0
Are you sure you want to change the base?
feat: Add stream_timestamp support and timestamp-based delta listing #595
Conversation
…ing by commit time
…stamp tests for readability
streamTimestamp is a system property set at commit time, so it should be excluded from equivalence comparisons (like IDs and other locators). This fixes test_recursive_cross_catalog_copy failing because copied deltas have different timestamps than originals.
|
Nice - thanks for taking this on @IvanPartsunev! I like the idea of using a tuple for the stream position to ensure that no two timestamped deltas present conflicts - it's clever. However, since it does introduce the complexity of delta model parsers needing to now always check whether they're dealing with a legacy 1-part or new 2-part stream position, I've been wondering if we might be able to achieve the same end result using the existing 64-bit int delta stream position. One option may be to introduce a new Right now, we currently set aside the 0 to UINT32_MAX stream position range for strictly ordered deltas, and then everything from UINT32_MAX +1 to UINT64_MAX for unordered ADD deltas. I think we can safely carve another partition out of the unordered delta range from UINT32_MAX + 1 to UINT48_MAX for A millisecond-precision timestamp today already requires 41 bits, and 48 bits is enough to avoid an overflow until the year 10889. It also leaves a small 1 in 65536 chance that a random 64-bit ADD delta stream position falls within the reserved UINT48 range (which should be immediately resolved via retry at https://github.com/ray-project/deltacat/blob/2.0/deltacat/storage/main/impl.py#L2806-L2807). Also, since CHRONO should behave the same as ADD in terms of (1) raising an error when trying to write to table with merge keys and (2) being treated the same during compaction (i.e., concatenated in the order presented) then any existing tables that have any existing ADD deltas within this range should continue to behave the same. WDYT? |
| be greater than that of any prior delta in the partition. | ||
| Creates a partition delta locator. | ||
|
|
||
| Stream position, if provided, should be greater than that of any prior delta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking back on it, this comment has been incorrect since ADD deltas were introduced:
| Stream position, if provided, should be greater than that of any prior delta | |
| For APPEND, UPSERT, and DELETE deltas the stream position should be greater than that of any prior delta |
Summary
stream_timestampproperty toDeltaLocatorandDeltamodels for tracking delta commit times in Unix millisecondslist_partition_deltas_by_timestampmethod to list deltas sorted by commit time (newest first by default)get_latest_delta_by_timestampmethod to fetch the most recently committed deltaMotivation
For unordered (ADD) deltas, the existing
stream_positionordering may not reflect the actual commit sequence. This feature enables:Changes
stream_timestampproperty with Unix milliseconds validation (1_000_000_000_000 to 9_999_999_999_999)stream_timestampautomatically when committing deltaslist_partition_deltas_by_timestampwith 1-based position slicingget_latest_delta_by_timestampconvenience methodTest plan
stream_timestampproperty validation on DeltaLocatorstream_timestampdelegation on Delta modellist_partition_deltas_by_timestampwith various scenarios (empty partition, ascending/descending, position filtering)get_latest_delta_by_timestampwith empty/non-empty partitions