[core][flink] Introduce postpone bucket tables.#5095
[core][flink] Introduce postpone bucket tables.#5095JingsongLi merged 12 commits intoapache:masterfrom tsreaper:postpone
Conversation
… postpone bucket table
docs/content/flink/procedures.md
Outdated
| CALL sys.compact_postpone_bucket(`table` => 'identifier', `default_bucket_num` => bucket_num, `parallelism` => parallelism) | ||
| </td> | ||
| <td> | ||
| Compact postpone bucket tables, which distributes records in bucket--2 directory into real bucket directories. Arguments: |
There was a problem hiding this comment.
Maybe we can name this directory to a meaningful name. Maybe bucket-postpone?
|
|
||
| @Override | ||
| public void processElement(StreamRecord<InternalRow> element) throws Exception { | ||
| write.write(element.getValue(), BucketMode.POSTPONE_BUCKET); |
There was a problem hiding this comment.
Not only buckets, but we also want to avoid scanning old files in this mode. Currently, in order to obtain the latest sequenceNumbers, the primary key table still needs to scan old files. We need to consider removing this logic.
docs/content/flink/procedures.md
Outdated
| <tr> | ||
| <td>compact_postpone_bucket</td> | ||
| <td> | ||
| CALL sys.compact_postpone_bucket(`table` => 'identifier', `default_bucket_num` => bucket_num, `parallelism` => parallelism) |
There was a problem hiding this comment.
We can merge this into compact procedure.
docs/content/flink/procedures.md
Outdated
| <tr> | ||
| <td>compact_postpone_bucket</td> | ||
| <td> | ||
| CALL sys.compact_postpone_bucket(`table` => 'identifier', `default_bucket_num` => bucket_num, `parallelism` => parallelism) |
There was a problem hiding this comment.
default_bucket_num can be a separate table option.
docs/content/flink/procedures.md
Outdated
| <tr> | ||
| <td>rescale_postpone_bucket</td> | ||
| <td> | ||
| CALL sys.rescale_postpone_bucket(`table` => 'identifier', `bucket_num` => bucket_num, `partition` => 'partition') |
There was a problem hiding this comment.
We can have an unify rescale procedure.
|
|
||
| But please note that this may also cause data duplication. | ||
|
|
||
| ## Postpone Bucket |
There was a problem hiding this comment.
What about also support append queue table?
There was a problem hiding this comment.
Currently there is no plan for append queue table.
| import org.apache.paimon.options.description.DescribedEnum; | ||
| import org.apache.paimon.options.description.Description; | ||
| import org.apache.paimon.options.description.InlineElement; | ||
| import org.apache.paimon.table.BucketMode; |
There was a problem hiding this comment.
You should modify the description of the config BUCKET in this class.
| checkArgument( | ||
| options.changelogProducer() == ChangelogProducer.NONE | ||
| || options.changelogProducer() == ChangelogProducer.LOOKUP, | ||
| "Currently, postpone bucket tables (bucket = -2) only supports none or lookup changelog producer"); |
| sink.doCommit(written.union(sourcePair.getRight()), commitUser); | ||
| } | ||
|
|
||
| List<String> ret = new ArrayList<>(); |
There was a problem hiding this comment.
List ret = new ArrayList<>(partitions.size());
| public class RescalePostponeBucketAction extends TableActionBase { | ||
|
|
||
| private final int bucketNum; | ||
| private Map<String, String> partition = new HashMap<>(); |
There was a problem hiding this comment.
Why not support multi partitions?
There was a problem hiding this comment.
If this feature is needed, others can implement it later.
|
|
||
| /** | ||
| * Action to compact postpone bucket tables, which distributes records in {@code bucket = -2} | ||
| * directory into real bucket directories. |
There was a problem hiding this comment.
I think the directory should not named bucket = -2, it should be a meaningful name.
docs/content/flink/procedures.md
Outdated
| <li>partition: What partition to rescale. For partitioned table this argument cannot be empty.</li> | ||
| </td> | ||
| <td> | ||
| CALL sys.rescale_postpone_bucket(`table` => 'default.T', `bucket_num` => 16, `partition` => 'dt=20250217,hh=08') |
|
|
||
| public boolean writeOnly() { | ||
| return options.get(WRITE_ONLY); | ||
| return options.get(WRITE_ONLY) || options.get(BUCKET) == BucketMode.POSTPONE_BUCKET; |
| this.onlyReadRealBuckets = onlyReadRealBuckets; | ||
|
|
||
| if (onlyReadRealBuckets) { | ||
| super.withBucketFilter(bucket -> bucket >= 0); |
There was a problem hiding this comment.
this will overwrite old bucket filter, or be overwrited new bucket filter, I see you handle withBucketFilter too, but it looks not so safe.. maybe we should add a method to skip negative buckets...
Purpose
In this PR, we're introducing the postpone bucket mode.
Postpone bucket mode is configured by
'bucket' = '-2'. This mode aims to solve the difficulty to determine a fixed number of buckets and support different buckets for different partitions.Currently, only Flink supports this mode.
When writing records into the table, all records will first be stored in the
bucket-postponedirectory of each partition and are not available to readers.To move the records into the correct bucket and make them readable, you need to run a compaction job.
Finally, when you feel that the bucket number of some partition is too small, you can also run a rescale job.
Tests
org.apache.paimon.flink.PostponeBucketTableITCaseAPI and Format
Introduce a new storage format.
Documentation
Document is also added.