chore(query): Tuning sort spill memory usage #18520

forsaken628 · 2025-08-12T05:38:26Z

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Adjusting the trigger for the sort spill collect step
Adjusting the merge parameters for the sort spill restore step
Changing the default value for sort_spilling_batch_bytes
Fixing span loss by upgrading fastrace

Tests

Unit Test
Logic Test
Benchmark Test
No Test - Explain why

Type of change

Bug Fix (non-breaking change which fixes an issue)
New Feature (non-breaking change which adds functionality)
Breaking Change (fix or feature that could cause existing functionality not to work as expected)
Documentation Update
Refactoring
Performance Improvement
Other (please describe):

This change is

github-actions · 2025-08-12T06:14:22Z

🤖 Smart Auto-retry Analysis (Retry 1)

Workflow: 16902470099

📊 Summary

Total Jobs: 81
Failed Jobs: 1
Retryable: 0
Code Issues: 1

❌ NO RETRY NEEDED

All failures appear to be code/test issues requiring manual fixes.

🔍 Job Details

❌ linux / test_logs: Not retryable (Code/Test)

🤖 About

Automated analysis using job annotations to distinguish infrastructure issues (auto-retried) from code/test issues (manual fixes needed).

forsaken628 · 2025-08-12T07:45:52Z

Benchmark：

Settings

max_query_memory_usage 2147483648
max_threads 4
query_out_of_memory_behavior spilling
sort_spilling_batch_bytes  1048576

DataSet: tpch sf100

SQL

select * from orders order by o_totalprice desc ignore_result;

Analysis:
In this test, there are only two operators that use a lot of memory, one is sort and the other is scan.
The sort collect step eats up all the available memory within the limit, while scan has no memory limit logic, so if scan allocates a lot of memory at this time, for example, if there are many columns and the columns are relatively large, the limit will be exceeded.

Currently, the sort restore step does not read the global memory usage, but instead statically determines the available memory by referring to the memory usage of the collect step, which is an inelastic strategy. After the restore step, since there is no more scanning, we can't see any problem.

Bug fixed:
If the block coming from upstream is large, it is possible to directly spill it without sorting, which will cause huge deviation in memory estimation of the restore step. This bug can be made more obvious by making sort_spilling_batch_bytes smaller.
The current tuning should ensure that only the first spill in the collect step exceeds the limit.

It is recommended to use a larger sort_spilling_batch_bytes, because the smaller the sort_spilling_batch_bytes, the larger the num_merge, the more fragmented the blocks, and the larger the num_merge, the more destabilizing the elements will be. (More testing is needed on what the destabilizing elements are)

forsaken628 added 7 commits August 6, 2025 16:33

refine

94d057a

update

10dbb13

update

c6efb16

update

41a85ad

update

4bf7bfe

dump fastrace

94c203a

update

79d7cb4

github-actions bot added the pr-chore this PR only has small changes that no need to record, like coding styles. label Aug 12, 2025

forsaken628 added 2 commits August 12, 2025 14:17

update

f9173e3

fix

208c308

forsaken628 requested review from zhang2014 and sundy-li August 12, 2025 07:47

forsaken628 marked this pull request as ready for review August 12, 2025 07:48

zhang2014 approved these changes Aug 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(query): Tuning sort spill memory usage #18520

chore(query): Tuning sort spill memory usage #18520

Uh oh!

forsaken628 commented Aug 12, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 12, 2025 •

edited

Loading

Uh oh!

forsaken628 commented Aug 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

chore(query): Tuning sort spill memory usage #18520

Are you sure you want to change the base?

chore(query): Tuning sort spill memory usage #18520

Uh oh!

Conversation

forsaken628 commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Type of change

Uh oh!

github-actions bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Smart Auto-retry Analysis (Retry 1)

📊 Summary

❌ NO RETRY NEEDED

🔍 Job Details

Uh oh!

forsaken628 commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

forsaken628 commented Aug 12, 2025 •

edited

Loading

github-actions bot commented Aug 12, 2025 •

edited

Loading

forsaken628 commented Aug 12, 2025 •

edited

Loading