-
Notifications
You must be signed in to change notification settings - Fork 453
perf(mito2): speed up parquet scan via minmax caches #7708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9019f53482
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| self.pending | ||
| .push_back(mapper.convert(&batch, &self.cache_strategy)?); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clear pending queue on series conversion failure
If mapper.convert(...) fails after some sub-batches have already been pushed into pending, this function returns an error but leaves those queued batches intact; subsequent poll_next calls will emit stale data from the failed series before reading new input. This makes error handling non-atomic and can surface partial results after an error for callers that continue polling the stream.
Useful? React with 👍 / 👎.
Summary of ChangesHello @waynexia, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on enhancing the performance of Parquet scans within Mito2 by implementing several caching and optimization strategies. It introduces a mechanism to cache min-max pruning results, preventing repeated computations for common query predicates. Additionally, it improves initial query latency by preloading Parquet file metadata into the SST meta cache upon region opening. Further refinements include optimizing the internal batch processing stream and providing a more efficient way to construct row group selections after pruning, collectively contributing to a reported 4x latency improvement for basic filter queries. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant performance improvements for Parquet scans. The core changes include caching min-max pruning results to accelerate subsequent queries, and preloading Parquet metadata on region open to reduce first-query latency. The implementation is robust, correctly handling cache key generation and avoiding caching for dynamic filters. Additionally, the refactoring of ConvertBatchStream to stream record batches instead of concatenating them is a solid performance enhancement that reduces memory usage and improves throughput. The new functionality is well-tested. Overall, this is a high-quality contribution that I'm happy to approve.
evenyag
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions about the preload strategy.
| impl MinMaxKey { | ||
| pub fn new(exprs: Arc<Vec<String>>, schema_version: u64, skip_fields: bool) -> Self { | ||
| let mem_usage = | ||
| exprs.iter().map(|s| s.len()).sum::<usize>() + size_of::<u64>() + size_of::<bool>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking: Should we add size_of::<Self>() + size_of::<Vec<String>>() instead of u64 + bool?
| tokio::spawn(async move { | ||
| let region_id = region.region_id; | ||
| let table_dir = region.access_layer.table_dir().to_string(); | ||
| let path_type = region.access_layer.path_type(); | ||
| let object_store = region.access_layer.object_store().clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a way to limit the parallelism of this task? Maybe also provide a way to disable preloading without disabling the metadata cache.
| // Load older files first so the most recent files remain hot in the LRU cache. | ||
| files.sort_by(|a, b| a.meta_ref().time_range.1.cmp(&b.meta_ref().time_range.1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loading all files' metadata may be costly (on S3). In some environments, most regions are rarely queried.
Maybe we can adjust the strategy:
- Limit the total number of files to preload?
- Only preload metadata from file cache?
- Trigger a task to load metadata on a cache miss of the file cache. e.g. Maybe via something like the DownloadTask of the file cache. We can preload N files in adjacent time ranges.
| if loaded > 0 { | ||
| info!( | ||
| "Preloaded parquet metadata for region {}, loaded_files: {}", | ||
| region_id, loaded | ||
| ); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also log the load time.
| && let Some(result) = index_result_cache.get(predicate_key, file_id) | ||
| { | ||
| let num_row_groups = parquet_meta.num_row_groups(); | ||
| metrics.rg_minmax_filtered += num_row_groups.saturating_sub(result.row_group_count()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add metrics for minmax cache hit/miss?
| let mut exprs = predicate | ||
| .exprs() | ||
| .iter() | ||
| .map(|expr| format!("{expr:?}")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking: Since expr also implements Display, which is better for the cache key?
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
Implement minmax filter cache and preload file metadata on region open.
For the basic filter case (query with only a few pk eq filters, without external index) over ~30 files, it achieves 4x code run (the first query of a standalone instance) latency improvement on my environment (from 0.8s to 0.2s).
PR Checklist
Please convert it to a draft if some of the following conditions are not met.