Skip to content

[Core] Support to order data by columns in append only writer#7886

Open
FangYongs wants to merge 1 commit into
apache:masterfrom
FangYongs:local-sort-for-append-table
Open

[Core] Support to order data by columns in append only writer#7886
FangYongs wants to merge 1 commit into
apache:masterfrom
FangYongs:local-sort-for-append-table

Conversation

@FangYongs
Copy link
Copy Markdown
Contributor

@FangYongs FangYongs commented May 18, 2026

Purpose

Order data by specific columns in single file which is written by append only writer

Tests

AppendOnlyWriterTest#testSortedBufferedSinkWriter

Close #7885

@FangYongs
Copy link
Copy Markdown
Contributor Author

@Aitozi @shidayang Have a look when you're free

@FangYongs FangYongs changed the title [#7885] Support to order data by columns in append only writer [Core] Support to order data by columns in append only writer May 18, 2026
coreOptions.clusteringIncrementalEnabled()
&& coreOptions.clusteringIncrementalOptimizeWrite()
&& coreOptions.clusteringIncrementalMode()
== CoreOptions.ClusteringIncrementalMode.LOCAL_SORT
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LOCAL_SORT is described as Task-Level sorting, but what we have actually implemented is File-Level sorting.

Do we need to introduce a mode similar to "file_local" to represent this specific granularity of File-Level sorting functionality?

 /**
         * Sort rows only within each compaction task (no global shuffle). Every output file is
         * internally ordered by the clustering columns, which is sufficient for per-file Parquet
         * lookup optimizations.
         */
        LOCAL_SORT(
                "local-sort",
                "Sort rows only within each compaction task without global shuffle. Every output file is internally ordered.");

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JingsongLi What do you think of LOCAL_SORT? In our previous discussion, this was meant for local sorting in a single file. However, judging from the current situation, it is used for data sorting at the task level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support to order data by keys in append only writer

2 participants