You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Avoid slow stats conversion fallback for iceberg clone (delta-io#4366)
<!--
Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
3. Be sure to keep the PR description updated to reflect all changes.
4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->
## Description
This PR proposes to
* Avoid slow stats conversion fallback for iceberg clone by default
* Allow partial stats conversion for iceberg clone by default
More specifically,
* When stats conversion from iceberg off, fallback to slow stats
conversion enabled
* When stats conversion from iceberg on, fallback to slow stats
conversion will not happen if partial stats conversion enabled. It will
only happen if partial stats conversion disabled and iceberg source has
partial stats - either minValues or maxValues is missing
## How was this patch tested?
UTs
## Does this PR introduce _any_ user-facing changes?
**Current**: delta tables cloned from iceberg source with only partial
stats will collect stats from parquet footers. Here, partial stats means
any of (maxValues, minValues, nullCounts) is missing
**Future**: delta tables cloned from iceberg source with only partial
stats will convert all available stats from iceberg source and not
fallback to collecting stats from parquet footers. Here, partial stats
means any of (maxValues, minValues, nullCounts) is missing
0 commit comments