-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[opt](scanner) set number of file scanner to max_scanners_concurrency #59622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by anyone and no changes requested. |
|
PR approved by at least one committer and no changes requested. |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
2062e17 to
a0a0ffe
Compare
|
run buildall |
TPC-H: Total hot run time: 32006 ms |
TPC-DS: Total hot run time: 171890 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run check_coverage |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
kaka11chen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What problem does this PR solve?
Problem Summary:
For external tables, each scanner is not bound to a specific split. Instead, when a scanner is scheduled,
it dynamically fetches the next scan range from a unified split source for scanning.
Therefore, the number of scanners only needs to match max_scanners_concurrency to ensure full-speed execution.
It also fix a profile issue.
Before:
The time is not even
After:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)