-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feature](cache) support file cache admission control #59065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…ache_admission_control
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 35116 ms |
TPC-DS: Total hot run time: 178276 ms |
ClickBench: Total hot run time: 27.17 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 36571 ms |
TPC-DS: Total hot run time: 178175 ms |
ClickBench: Total hot run time: 27.96 s |
TPC-H: Total hot run time: 32095 ms |
TPC-DS: Total hot run time: 172999 ms |
ClickBench: Total hot run time: 27.9 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
TPC-H: Total hot run time: 31710 ms |
TPC-DS: Total hot run time: 172862 ms |
ClickBench: Total hot run time: 27.96 s |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
FE UT Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 31538 ms |
TPC-DS: Total hot run time: 173278 ms |
ClickBench: Total hot run time: 27.79 s |
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 31764 ms |
TPC-DS: Total hot run time: 172434 ms |
ClickBench: Total hot run time: 27.88 s |
FE Regression Coverage ReportIncrement line coverage |
To fully understand the implementation of the PR, please refer to the following link(It is a Chinese document): https://www.notion.so/V3-1-2c31293e1081807ca476dd5c87efb28e
1. PR Function Overview
The core function of this PR is the implementation of File Cache Admission Control.
2. Implementation Scheme Analysis
The implementation consists of the following key components:
2.1. FE Side: Admission Decision
The primary logic is located in the
createScanRangeLocationsmethod ofFileQueryScanNode.java.Config.enable_file_cache_admission_controlswitch.FileCacheAdmissionManager(Singleton) to execute the specific admission judgment logic.FileQueryScanNoderetrieves the current User Identity (userIdentity), Catalog, Database, and Table information.FileCacheAdmissionManager.getInstance().isAllowed(...)to obtain a boolean resultfileCacheAdmission.2.2. FE Side: Decision Propagation
The decision result needs to be propagated from the
FileQueryScanNodedown to the underlying split assignment logic.SplitAssignment Modification:
SplitAssignmentclass (located inorg.apache.doris.datasource) is modified to accept a newboolean fileCacheAdmissionparameter.SplitToScanRange Modification:
splitToScanRangemethod (or its corresponding Lambda expression) is updated to receive thefileCacheAdmissionparameter.2.3. Communication Protocol: Thrift Definition Update
To pass the FE's decision to the BE, the Thrift definition (likely
TFileRangeDescorTFileScanRangeParamsinPlanNodes.thrift) requires a new field.optional bool file_cache_admission, is added to theTFileRangeDescstruct.2.4. BE Side: Enforcement
Although the analysis focuses on the FE, the complete loop requires enforcement on the BE side:
FileReader(e.g.,HdfsFileReaderorS3FileReader) checks thefile_cache_admissionflag in the incomingTFileRangeDescduring initialization or reading.file_cache_admissionis true (default): It uses the standardFileCachePolicy, where data not found in the cache is written to the Block File Cache after reading.file_cache_admissionis false: It sets theFileCachePolicytoNO_CACHE, skips the cache writing step, reading directly from remote storage. This protects the existing cache from being polluted.3. Summary
This PR introduces an Admission Control Manager during the FE query planning phase and transparently passes this control signal through the Split Assignment and Scan Range Generation phases. This ultimately guides the BE side's file readers to selectively use the file cache.
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)