Skip to content

Commit a8a23e6

Browse files
authored
fix(delta): transform query properly to make unsafe work (#1644)
* fix(delta): transform query properly to make unsafe work * review, cleanup, fix a few issues, add more tests * fix delta disabled logic + tests * change unsafe delta semantics to independent per source processing * take delta spec into hash consideration * update delta unsafe docs
1 parent aaa97d3 commit a8a23e6

File tree

9 files changed

+1100
-249
lines changed

9 files changed

+1100
-249
lines changed

docs/guide/delta.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -87,16 +87,23 @@ By default, delta updates cannot be combined with the following methods:
8787

8888
1. `merge`
8989
2. `union`
90-
3. `distinct`
91-
4. `agg`
92-
5. `group_by`
90+
3. `subtract`
91+
4. `diff`
92+
5. `file_diff`
93+
6. `distinct`
94+
7. `agg`
95+
8. `group_by`
9396

9497
These methods are restricted because they may produce **unexpected results** when used with delta processing. Delta runs the chain only on a subset of rows (new and changed records), while methods like `distinct`, `agg`, or `group_by` are designed to operate on the entire dataset.
9598

96-
Similarly, combining delta with methods like `merge` or `union` may result in duplicated rows when merging with a static dataset.
99+
Similarly, combining delta with methods like `merge`, `union`, `subtract`, `diff`, or `file_diff` may produce inconsistent results because the operation is being applied to replayed delta inputs rather than to the full logical datasets.
97100

98101
If you still need to use these methods together with delta, you can override this restriction by setting the additional flag:
99102

100103
```python
101104
delta_unsafe=True
102105
```
106+
107+
If more than one delta-enabled source participates in the same composed query, set `delta_unsafe=True` on every participating delta source.
108+
109+
`delta_unsafe=True` is an advanced option. Use it only when you know the participating delta sources are updated in a consistent way and the result of replaying only the changed rows will still match the result of recomputing the full query.

0 commit comments

Comments
 (0)