-
Notifications
You must be signed in to change notification settings - Fork 486
[flink] Delta Join additional IT tests and docs improvement #2268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[flink] Delta Join additional IT tests and docs improvement #2268
Conversation
|
@xuyangzhong @wuchong |
ac23b6b to
6035422
Compare
|
Hi, @fresh-borzoni I'm a bit busy these days, I'll try my best to take a look after next Wednesday |
xuyangzhong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for driving this! I have added some comments. Furthermore, some methods for table creation could be reused.
| import static org.assertj.core.api.Assertions.assertThatThrownBy; | ||
|
|
||
| /** IT case for {@link FlinkTableSource} in Flink 2.2. */ | ||
| public class Flink22TableSourceITCase extends FlinkTableSourceITCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m thinking it might be better to use a separate Flink22DeltaJoinITCase to hold these Delta Join–related tests, since there are now more delta join related tests, and there’s some common logic we could factor out, for example, setting the FORCE strategy, setting the default parallelism to 2, and creating the source and sink tables. WDYT?
| + " 'table.delete.behavior' = 'IGNORE' " | ||
| + ")", | ||
| leftTableName)); | ||
| List<InternalRow> rows1 = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could we prepare some data with the same primary key to test it?
| "INSERT INTO %s SELECT a1, c1, a2 FROM %s LEFT JOIN %s ON c1 = c2 AND d1 = d2", | ||
| sinkTableName, leftTableName, rightTableName); | ||
|
|
||
| assertThatThrownBy(() -> tEnv.explainSql(sql)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit Merge testDeltaJoinFailsWithLeftJoin, testDeltaJoinFailsWithRightJoin and testDeltaJoinFailsWithFullOuterJoin to testDeltaJoinFailsWithOuterJoin and assert the thrown exception here three times.
|
|
||
| assertThatThrownBy(() -> tEnv.explainSql(sql)) | ||
| .isInstanceOf(ValidationException.class) | ||
| .hasMessageContaining("doesn't support to do delta join optimization"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a more test here.
// Non-equiv-cond on e1 > e2, where e1 and e2 are NOT part of the upsert key
String sql2 =
String.format(
"INSERT INTO %s SELECT * FROM %s INNER JOIN %s ON c1 = c2 AND d1 = d2 AND e1 > e2",
sinkTableName, leftTableName, rightTableName);
assertThatThrownBy(() -> tEnv.explainSql(sql2))
.isInstanceOf(ValidationException.class)
.hasMessageContaining("doesn't support to do delta join optimization");
| } | ||
|
|
||
| @Test | ||
| void testDeltaJoinFailsWithCascadeJoin() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about adding more tests from the doc like:
- the join key includes more fields than the primary key.
- sink materializer(the sink's primary key is different from the primary keys of the two upstream sources.)
- filters or projections contain non-deterministic functions like
rand.
Purpose
Linked issue: close #2231
Add comprehensive test coverage for Delta Join feature in Flink 2.2 and improve documentation.
Brief change log
table.delete.behavior=IGNORE(not justfirst_rowmerge engine)Tests
testDeltaJoinWithPrimaryKeyTableNoDeletes- normal PK table withdelete.behavior=IGNOREtestDeltaJoinOnBucketKey- join on bucket key onlytestDeltaJoinFailsWhenFilterOnNonUpsertKeys- filter on non-upsert-key columns failstestDeltaJoinOnBucketKey- join on bucket key only (not full PK)testDeltaJoinFailsWhenSourceHasDelete- source with DELETE records failstestDeltaJoinFailsWhenJoinKeyNotContainIndex- join key not containing index failtestDeltaJoinFailsWithLeftJoin- LEFT JOIN wouldn't be converted to DeltaJointestDeltaJoinFailsWithRightJoin- RIGHT JOIN wouldn't be converted to DeltaJointestDeltaJoinFailsWithFullOuterJoin- FULL OUTER JOIN wouldn't be converted to DeltaJointestDeltaJoinFailsWithCascadeJoin- cascade join wouldn't be converted to DeltaJoinAPI and Format
No
Documentation
Yes - updated
docs/engine-flink/delta-joins.mdin Flink 2.2 part.