Helper function to get recently updated partitions 

**Is your feature request related to a problem? Please describe.**

Similar to https://github.com/MrPowers/mack/issues/130 , but for non-Spark projects

For streaming systems (or batch systems that run in high frequency) that write data into delta tables, it's a common problem to have lots of small files. In many cases, it's not practical to auto compact because of various reasons, for example
* Auto compaction is not available in Delta lake before 3.1.0
* Auto compaction might not be well supported outside Spark

One way to solve this is to have a separate process that perform optimization regularly on these delta tables. However it's not a good idea to optimize the entire table whenever without any constraint. A few example reasons:
* While in theory optimize is a no-op if the partitions weren't updated, it still takes some overhead per partition to determine it's a no-op. This could add up quite significantly when you have lots of partitions.
* If the optimize operation included z-order, subsequent z-order operations won't be no-op even if the partitions weren't updated

**Describe the solution you'd like**
A helper function to find out which partitions have been updated between some time period, for example

```python
def get_updated_partitions(delta_table: DeltaTable, start_time: datetime.datetime, end_time: datetime.datetime, exclude_optimize_operations: bool) -> list[dict[str, str]]
```

The `exclude_optimize_operations` flag is necessary because optimization operations themselves are also update operations. If the operations are not excluded, they might cause a feedback loop since they will keep showing up in the output.

All the information needed for this features should be available in the transaction log. 

**Describe alternatives you've considered**
Optimizing the entire table and accept the overhead

Not sure what's a good alternative once z-order is used however

**Additional context**

N/A

**Willingness to contribute**

Would you be willing to contribute an implementation of this feature?

- [x] Yes. I can contribute this feature independently.
- [ ] Yes. I would be willing to contribute this feature with guidance from the mack community.
- [ ] No. I cannot contribute this feature at this time.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Helper function to get recently updated partitions #23

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Helper function to get recently updated partitions #23

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions