[core][flink] Introduce postpone bucket tables. by tsreaper · Pull Request #5095 · apache/paimon

tsreaper · 2025-02-17T03:19:07Z

Purpose

In this PR, we're introducing the postpone bucket mode.

Postpone bucket mode is configured by 'bucket' = '-2'. This mode aims to solve the difficulty to determine a fixed number of buckets and support different buckets for different partitions.

Currently, only Flink supports this mode.

When writing records into the table, all records will first be stored in the bucket-postpone directory of each partition and are not available to readers.

To move the records into the correct bucket and make them readable, you need to run a compaction job.

Finally, when you feel that the bucket number of some partition is too small, you can also run a rescale job.

Tests

org.apache.paimon.flink.PostponeBucketTableITCase

API and Format

Introduce a new storage format.

Documentation

Document is also added.

…e in Flink

… postpone bucket table

JingsongLi · 2025-02-17T05:49:44Z

docs/content/flink/procedures.md

+         CALL sys.compact_postpone_bucket(`table` => 'identifier', `default_bucket_num` => bucket_num, `parallelism` => parallelism)
+      </td>
+      <td>
+         Compact postpone bucket tables, which distributes records in bucket--2 directory into real bucket directories. Arguments:


Maybe we can name this directory to a meaningful name. Maybe bucket-postpone?

JingsongLi · 2025-02-17T06:27:53Z

...link-common/src/main/java/org/apache/paimon/flink/sink/PostponeBucketTableWriteOperator.java

+
+    @Override
+    public void processElement(StreamRecord<InternalRow> element) throws Exception {
+        write.write(element.getValue(), BucketMode.POSTPONE_BUCKET);


Not only buckets, but we also want to avoid scanning old files in this mode. Currently, in order to obtain the latest sequenceNumbers, the primary key table still needs to scan old files. We need to consider removing this logic.

JingsongLi · 2025-02-17T07:53:26Z

docs/content/flink/procedures.md

+   <tr>
+      <td>compact_postpone_bucket</td>
+      <td>
+         CALL sys.compact_postpone_bucket(`table` => 'identifier', `default_bucket_num` => bucket_num, `parallelism` => parallelism)


We can merge this into compact procedure.

JingsongLi · 2025-02-17T07:53:42Z

docs/content/flink/procedures.md

+   <tr>
+      <td>compact_postpone_bucket</td>
+      <td>
+         CALL sys.compact_postpone_bucket(`table` => 'identifier', `default_bucket_num` => bucket_num, `parallelism` => parallelism)


default_bucket_num can be a separate table option.

JingsongLi · 2025-02-17T07:54:06Z

docs/content/flink/procedures.md

+   <tr>
+      <td>rescale_postpone_bucket</td>
+      <td>
+         CALL sys.rescale_postpone_bucket(`table` => 'identifier', `bucket_num` => bucket_num, `partition` => 'partition')


We can have an unify rescale procedure.

wwj6591812 · 2025-02-17T23:48:35Z

docs/content/primary-key-table/data-distribution.md


 But please note that this may also cause data duplication.

+## Postpone Bucket


What about also support append queue table?

Currently there is no plan for append queue table.

wwj6591812 · 2025-02-18T00:02:38Z

paimon-common/src/main/java/org/apache/paimon/CoreOptions.java

 import org.apache.paimon.options.description.DescribedEnum;
 import org.apache.paimon.options.description.Description;
 import org.apache.paimon.options.description.InlineElement;
+import org.apache.paimon.table.BucketMode;


You should modify the description of the config BUCKET in this class.

wwj6591812 · 2025-02-18T00:10:25Z

paimon-core/src/main/java/org/apache/paimon/schema/SchemaValidation.java

+        checkArgument(
+                options.changelogProducer() == ChangelogProducer.NONE
+                        || options.changelogProducer() == ChangelogProducer.LOOKUP,
+                "Currently, postpone bucket tables (bucket = -2) only supports none or lookup changelog producer");


wwj6591812 · 2025-02-18T00:32:33Z

...n-flink-common/src/main/java/org/apache/paimon/flink/action/CompactPostponeBucketAction.java

+            sink.doCommit(written.union(sourcePair.getRight()), commitUser);
+        }
+
+        List<String> ret = new ArrayList<>();


List ret = new ArrayList<>(partitions.size());

wwj6591812 · 2025-02-18T00:39:37Z

...n-flink-common/src/main/java/org/apache/paimon/flink/action/RescalePostponeBucketAction.java

+public class RescalePostponeBucketAction extends TableActionBase {
+
+    private final int bucketNum;
+    private Map<String, String> partition = new HashMap<>();


Why not support multi partitions?

If this feature is needed, others can implement it later.

wwj6591812 · 2025-02-18T00:49:30Z

...n-flink-common/src/main/java/org/apache/paimon/flink/action/CompactPostponeBucketAction.java

+
+/**
+ * Action to compact postpone bucket tables, which distributes records in {@code bucket = -2}
+ * directory into real bucket directories.


I think the directory should not named bucket = -2, it should be a meaningful name.

…ilure

JingsongLi · 2025-02-19T08:39:01Z

docs/content/flink/procedures.md

+         <li>partition: What partition to rescale. For partitioned table this argument cannot be empty.</li>
+      </td>
+      <td>
+         CALL sys.rescale_postpone_bucket(`table` => 'default.T', `bucket_num` => 16, `partition` => 'dt=20250217,hh=08')


JingsongLi · 2025-02-19T08:40:20Z

paimon-common/src/main/java/org/apache/paimon/CoreOptions.java


    public boolean writeOnly() {
-        return options.get(WRITE_ONLY);
+        return options.get(WRITE_ONLY) || options.get(BUCKET) == BucketMode.POSTPONE_BUCKET;


revert this

JingsongLi · 2025-02-19T11:43:52Z

paimon-core/src/main/java/org/apache/paimon/operation/KeyValueFileStoreScan.java

+        this.onlyReadRealBuckets = onlyReadRealBuckets;
+
+        if (onlyReadRealBuckets) {
+            super.withBucketFilter(bucket -> bucket >= 0);


this will overwrite old bucket filter, or be overwrited new bucket filter, I see you handle withBucketFilter too, but it looks not so safe.. maybe we should add a method to skip negative buckets...

JingsongLi

+1

tsreaper added 4 commits February 17, 2025 10:32

[core][flink] Introduce postpone bucket tables and implement its writ…

d21a48a

…e in Flink

[flink] Introduce compact action & procedure for postpone bucket tables

d218e92

[flink] Introduce procedure & action to rescale a single partition of…

80c1a43

… postpone bucket table

[docs] Add document for postpone bucket tables

61e0669

JingsongLi reviewed Feb 17, 2025

View reviewed changes

tsreaper added 2 commits February 17, 2025 14:07

[fix 1] Fix comments and failed tests

b8c3cba

[fix 4] Fix comments

fae0032

JingsongLi reviewed Feb 17, 2025

View reviewed changes

wwj6591812 reviewed Feb 18, 2025

View reviewed changes

tsreaper added 4 commits February 19, 2025 11:01

[fix] Fix comments

c877fe5

[fix] Limit parallelism of streaming table environment to fix test fa…

5502dd6

…ilure

[fix] Implement special writer for postpone bucket tables

4e58d1a

[fix] Support changelog producer for postpone bucket tables

27b3aa3

JingsongLi reviewed Feb 19, 2025

View reviewed changes

[fix] Fix comments

583ccb3

JingsongLi reviewed Feb 19, 2025

View reviewed changes

[fix] Fix comments

adb88d5

JingsongLi approved these changes Feb 20, 2025

View reviewed changes

JingsongLi merged commit 3c0fc7d into apache:master Feb 20, 2025
26 checks passed


		But please note that this may also cause data duplication.

		## Postpone Bucket

Conversation

tsreaper commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Tests

API and Format

Documentation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JingsongLi Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tsreaper commented Feb 17, 2025 •

edited

Loading

JingsongLi Feb 19, 2025 •

edited

Loading