Skip to content

Commit 7263d40

Browse files
adam-richardson-openaierichwangfredliu-data
committed
amend! Read metadata and protocol information from Delta checksum files
Read metadata and protocol information from Delta checksum files Compliant Delta writers may emit optional checksum files alongside commits containing metadata and protocol information. Instead of loading the latest checkpoint and replaying intervening commits (which can be expensive, especially for large v1 checkpoints), Trino can read the latest commit’s checksum file to obtain this information with a single listing and small JSON read. Ref. https://github.com/delta-io/delta/blob/master/PROTOCOL.md#version-checksum-file If the checksum file is missing or does not contain both metadata and protocol, we fall back to the existing Delta log scanning approach. Behavior is gated by session property load_metadata_from_checksum_file (defaulting to config delta.load_metadata_from_checksum_file, which defaults to false). Internal testing reduced analysis time for large v1-checkpoint tables from ~10s to <500ms. Co-authored-by: Eric Hwang <eh@openai.com> Co-authored-by: Fred Liu <fredliu@openai.com>
1 parent 6965882 commit 7263d40

File tree

0 file changed

+0
-0
lines changed

    0 file changed

    +0
    -0
    lines changed

    0 commit comments

    Comments
     (0)