-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feature](vparquet-reader) Implements parquet file page cache. #59307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 35539 ms |
TPC-DS: Total hot run time: 179978 ms |
ClickBench: Total hot run time: 27.34 s |
e32d9d4 to
b25ce38
Compare
|
run buildall |
TPC-H: Total hot run time: 35717 ms |
TPC-DS: Total hot run time: 179518 ms |
ClickBench: Total hot run time: 27.98 s |
b25ce38 to
7a15d5f
Compare
|
run buildall |
TPC-H: Total hot run time: 36141 ms |
TPC-DS: Total hot run time: 179957 ms |
ClickBench: Total hot run time: 27.16 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
7a15d5f to
eeb4023
Compare
|
run buildall |
eeb4023 to
3d89dda
Compare
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
3d89dda to
59ebab3
Compare
|
run buildall |
TPC-H: Total hot run time: 35311 ms |
TPC-DS: Total hot run time: 179267 ms |
ClickBench: Total hot run time: 27.13 s |
59ebab3 to
a8f6d42
Compare
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
797a4d6 to
a210324
Compare
|
run buildall |
a210324 to
8a72e50
Compare
|
run buildall |
8a72e50 to
6ca9a21
Compare
|
run buildall |
6ca9a21 to
1dc221f
Compare
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 31511 ms |
TPC-DS: Total hot run time: 173409 ms |
ClickBench: Total hot run time: 28.43 s |
|
run external |
1dc221f to
7cb3056
Compare
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 31504 ms |
TPC-DS: Total hot run time: 174421 ms |
ClickBench: Total hot run time: 27.91 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
What problem does this PR solve?
Problem Summary:
Release note
[Feature] Implementation of Parquet File Page Cache and Integration with Unified Page Cache Framework
Solution Overview
This PR implements a page-level caching mechanism for Parquet files and integrates it with Apache Doris's existing unified page cache framework, significantly improving query performance by caching decompressed (or compressed) data pages in memory.
Key Features
• Leverages Existing Framework: Directly integrates with Doris's StoragePageCache infrastructure used for internal tables
• Shared Resource Management: Parquet cache shares memory pool and eviction policies with internal table caches
• Consistent Monitoring: Reuses existing cache statistics and RuntimeProfile for unified performance monitoring
• Cache Type Identification: Uses segment_v2::DATA_PAGE as cache page type, consistent with internal table data page caching
• Compression Ratio Awareness: Automatically chooses between caching compressed or decompressed data based on parquet_page_cache_decompress_threshold (default: 1.5)
• Flexible Storage: Caches decompressed data when uncompressed_size/compressed_size ≤ threshold, otherwise caches compressed data if enable_parquet_cache_compressed_pages=true
• Cache Key Design: Uses file_path::mtime::offset as key to ensure cache consistency across file modifications
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)