Skip to content

[flink] Support specifying what partitions to scan in Flink#5090

Merged
JingsongLi merged 2 commits intoapache:masterfrom
tsreaper:scan-partitions
Feb 17, 2025
Merged

[flink] Support specifying what partitions to scan in Flink#5090
JingsongLi merged 2 commits intoapache:masterfrom
tsreaper:scan-partitions

Conversation

@tsreaper
Copy link
Copy Markdown
Contributor

Purpose

Lookup joins in streaming SQL is the same as normal joins in batch SQL. However, when specifying what partitions to scan in lookup joins, currently user can specify max_pt() through SQL hints to read the latest partition, without specifying a fixed partition. Such SQL hint is not supported in batch joins.

Paimon is a streaming-batch unified lake format. To also support streaming-batch unification in SQL, this PR introduces a new option scan.partitions, which accepts both max_pt() (in lookup joins) and fixed partitions (in all joins). Users only need to change the value of this option to specify different partitions for streaming and batch jobs, and they don't need to change SQL itself.

Tests

Unit tests and IT cases.

API and Format

No format changes.

Documentation

Document is also added.

private static final String MAX_PT = "max_pt()";
private static final String MAX_TWO_PT = "max_two_pt()";

protected final FileStoreTable table;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use FileStoreTable instead of Table?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need specific methods like schema() in its subclass.


@Override
public boolean checkRefresh() {
if (partitions.isEmpty()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just return false;, StaticPartitionLoader should never refresh partitions.

You can add partitions in open.

Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit 4277e0f into apache:master Feb 17, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants