You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Spark] Fix MinorCompaction not including RemoveFile (delta-io#4894)
<!--
Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
3. Be sure to keep the PR description updated to reflect all changes.
4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->
#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->
- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)
## Description
Currently, if the RemoveFile in a table did not have the
`deletionTimestamp` field for whatever reason, these actions would not
be included in the compaction file, causing state constructed from the
compaction file to include the removed files. This is because we default
the `deletionTimestamp` to 0 when this happens, and the log replay only
keeps the `RemoveFile` with `deletionTimestamp` larger than 0.
In PR fixes this issue by changing the type of
`minFileRetentionTimestamp` in `InMemoryLogReplay` to an `Option[Long]`,
so that we can specify `None` to include all `RemoveFiles`, regardless
of whether they have a `deletionTimestamp` or not.
The main call site change is in `MinorCompactionHook`, all other call
sites can safely change from 0 to None since they don't care about
tombstones.
## How was this patch tested?
New unit test
## Does this PR introduce _any_ user-facing changes?
No
0 commit comments