Skip to content

Commit 6ec9d65

Browse files
KamilPiechowiakzxqfd555
authored andcommitted
Docs and better error for exactly once join (#9374)
Co-authored-by: Sergey <[email protected]> GitOrigin-RevId: 4278479143415f75ceddb0e20dcde6eeae6fd0b0
1 parent a076478 commit 6ec9d65

File tree

2 files changed

+32
-3
lines changed

2 files changed

+32
-3
lines changed

python/pathway/internals/joins.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,9 +157,11 @@ def join(
157157
left_exactly_once: if you can guarantee that each row on the left side of the join will be
158158
joined at most once, then you can set this parameter to ``True``. Then each row after
159159
getting a match is removed from the join state. As a result, less memory is needed.
160+
Works only for append-only tables.
160161
right_exactly_once: if you can guarantee that each row on the right side of the join will be
161162
joined at most once, then you can set this parameter to ``True``. Then each row after
162163
getting a match is removed from the join state. As a result, less memory is needed.
164+
Works only for append-only tables.
163165
164166
Returns:
165167
JoinResult: an object on which `.select()` may be called to extract relevant
@@ -224,9 +226,11 @@ def join_inner(
224226
left_exactly_once: if you can guarantee that each row on the left side of the join will be
225227
joined at most once, then you can set this parameter to ``True``. Then each row after
226228
getting a match is removed from the join state. As a result, less memory is needed.
229+
Works only for append-only tables.
227230
right_exactly_once: if you can guarantee that each row on the right side of the join will be
228231
joined at most once, then you can set this parameter to ``True``. Then each row after
229232
getting a match is removed from the join state. As a result, less memory is needed.
233+
Works only for append-only tables.
230234
231235
Returns:
232236
JoinResult: an object on which `.select()` may be called to extract relevant
@@ -291,9 +295,11 @@ def join_left(
291295
left_exactly_once: if you can guarantee that each row on the left side of the join will be
292296
joined at most once, then you can set this parameter to ``True``. Then each row after
293297
getting a match is removed from the join state. As a result, less memory is needed.
298+
Works only for append-only tables.
294299
right_exactly_once: if you can guarantee that each row on the right side of the join will be
295300
joined at most once, then you can set this parameter to ``True``. Then each row after
296301
getting a match is removed from the join state. As a result, less memory is needed.
302+
Works only for append-only tables.
297303
298304
Remarks:
299305
args cannot contain id column from either of tables, \
@@ -378,9 +384,11 @@ def join_right(
378384
left_exactly_once: if you can guarantee that each row on the left side of the join will be
379385
joined at most once, then you can set this parameter to ``True``. Then each row after
380386
getting a match is removed from the join state. As a result, less memory is needed.
387+
Works only for append-only tables.
381388
right_exactly_once: if you can guarantee that each row on the right side of the join will be
382389
joined at most once, then you can set this parameter to ``True``. Then each row after
383390
getting a match is removed from the join state. As a result, less memory is needed.
391+
Works only for append-only tables.
384392
385393
Remarks: args cannot contain id column from either of tables, \
386394
as the result table has id column with auto-generated ids; \
@@ -466,9 +474,11 @@ def join_outer(
466474
left_exactly_once: if you can guarantee that each row on the left side of the join will be
467475
joined at most once, then you can set this parameter to ``True``. Then each row after
468476
getting a match is removed from the join state. As a result, less memory is needed.
477+
Works only for append-only tables.
469478
right_exactly_once: if you can guarantee that each row on the right side of the join will be
470479
joined at most once, then you can set this parameter to ``True``. Then each row after
471480
getting a match is removed from the join state. As a result, less memory is needed.
481+
Works only for append-only tables.
472482
473483
Remarks: args cannot contain id column from either of tables, \
474484
as the result table has id column with auto-generated ids; \
@@ -1184,9 +1194,11 @@ def join(
11841194
left_exactly_once: if you can guarantee that each row on the left side of the join will be
11851195
joined at most once, then you can set this parameter to ``True``. Then each row after
11861196
getting a match is removed from the join state. As a result, less memory is needed.
1197+
Works only for append-only tables.
11871198
right_exactly_once: if you can guarantee that each row on the right side of the join will be
11881199
joined at most once, then you can set this parameter to ``True``. Then each row after
11891200
getting a match is removed from the join state. As a result, less memory is needed.
1201+
Works only for append-only tables.
11901202
11911203
Returns:
11921204
JoinResult: an object on which `.select()` may be called to extract relevant
@@ -1249,9 +1261,11 @@ def join_inner(
12491261
left_exactly_once: if you can guarantee that each row on the left side of the join will be
12501262
joined at most once, then you can set this parameter to ``True``. Then each row after
12511263
getting a match is removed from the join state. As a result, less memory is needed.
1264+
Works only for append-only tables.
12521265
right_exactly_once: if you can guarantee that each row on the right side of the join will be
12531266
joined at most once, then you can set this parameter to ``True``. Then each row after
12541267
getting a match is removed from the join state. As a result, less memory is needed.
1268+
Works only for append-only tables.
12551269
12561270
Returns:
12571271
JoinResult: an object on which `.select()` may be called to extract relevant
@@ -1313,9 +1327,11 @@ def join_left(
13131327
left_exactly_once: if you can guarantee that each row on the left side of the join will be
13141328
joined at most once, then you can set this parameter to ``True``. Then each row after
13151329
getting a match is removed from the join state. As a result, less memory is needed.
1330+
Works only for append-only tables.
13161331
right_exactly_once: if you can guarantee that each row on the right side of the join will be
13171332
joined at most once, then you can set this parameter to ``True``. Then each row after
13181333
getting a match is removed from the join state. As a result, less memory is needed.
1334+
Works only for append-only tables.
13191335
13201336
Remarks:
13211337
args cannot contain id column from either of tables, \
@@ -1397,9 +1413,11 @@ def join_right(
13971413
left_exactly_once: if you can guarantee that each row on the left side of the join will be
13981414
joined at most once, then you can set this parameter to ``True``. Then each row after
13991415
getting a match is removed from the join state. As a result, less memory is needed.
1416+
Works only for append-only tables.
14001417
right_exactly_once: if you can guarantee that each row on the right side of the join will be
14011418
joined at most once, then you can set this parameter to ``True``. Then each row after
14021419
getting a match is removed from the join state. As a result, less memory is needed.
1420+
Works only for append-only tables.
14031421
14041422
Remarks: args cannot contain id column from either of tables, \
14051423
as the result table has id column with auto-generated ids; \
@@ -1483,9 +1501,11 @@ def join_outer(
14831501
left_exactly_once: if you can guarantee that each row on the left side of the join will be
14841502
joined at most once, then you can set this parameter to ``True``. Then each row after
14851503
getting a match is removed from the join state. As a result, less memory is needed.
1504+
Works only for append-only tables.
14861505
right_exactly_once: if you can guarantee that each row on the right side of the join will be
14871506
joined at most once, then you can set this parameter to ``True``. Then each row after
14881507
getting a match is removed from the join state. As a result, less memory is needed.
1508+
Works only for append-only tables.
14891509
14901510
Remarks: args cannot contain id column from either of tables, \
14911511
as the result table has id column with auto-generated ids; \

src/engine/dataflow.rs

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2820,9 +2820,18 @@ impl<S: MaybeTotalScope> DataflowGraphInner<S> {
28202820
let without_retractions = join_left_right_without_persisted
28212821
.filter_out_forgetting()
28222822
.consolidate();
2823-
without_retractions
2824-
.inner
2825-
.inspect(|(_data, _time, diff)| assert!(*diff > 0));
2823+
let error_logger = self.create_error_logger()?;
2824+
let trace = table_properties.trace();
2825+
without_retractions.inner.inspect(
2826+
move |((join_key, _left, _right), _time, diff)| {
2827+
if *diff < 0 {
2828+
error_logger.log_error_with_trace(
2829+
DataError::ExpectedAppendOnly(*join_key).into(),
2830+
&trace,
2831+
);
2832+
}
2833+
},
2834+
);
28262835

28272836
if let Some(left_retractions) = left_retractions {
28282837
let left_side = without_retractions.map_named(

0 commit comments

Comments
 (0)