Skip to content

Conversation

@waynexia
Copy link
Member

@waynexia waynexia commented Feb 12, 2026

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

Implement minmax filter cache and preload file metadata on region open.

For the basic filter case (query with only a few pk eq filters, without external index) over ~30 files, it achieves 4x code run (the first query of a standalone instance) latency improvement on my environment (from 0.8s to 0.2s).

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

@waynexia waynexia requested a review from discord9 February 12, 2026 17:34
@github-actions github-actions bot added size/M docs-not-required This change does not impact docs. labels Feb 12, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9019f53482

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +93 to +94
self.pending
.push_back(mapper.convert(&batch, &self.cache_strategy)?);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clear pending queue on series conversion failure

If mapper.convert(...) fails after some sub-batches have already been pushed into pending, this function returns an error but leaves those queued batches intact; subsequent poll_next calls will emit stale data from the failed series before reading new input. This makes error handling non-atomic and can surface partial results after an error for callers that continue polling the stream.

Useful? React with 👍 / 👎.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @waynexia, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the performance of Parquet scans within Mito2 by implementing several caching and optimization strategies. It introduces a mechanism to cache min-max pruning results, preventing repeated computations for common query predicates. Additionally, it improves initial query latency by preloading Parquet file metadata into the SST meta cache upon region opening. Further refinements include optimizing the internal batch processing stream and providing a more efficient way to construct row group selections after pruning, collectively contributing to a reported 4x latency improvement for basic filter queries.

Highlights

  • Min-Max Pruning Result Caching: Introduced a new MinMaxKey variant to PredicateKey and implemented caching for min-max pruning results in the Parquet reader. This avoids redundant computations for identical predicates across queries, significantly speeding up scan operations.
  • Parquet Metadata Preloading: Added functionality to preload Parquet metadata into the in-memory SST meta cache when a region is opened. This proactive caching reduces the latency of initial queries by preventing large on-demand metadata reads.
  • Optimized Batch Stream Processing: Refactored the ConvertBatchStream to utilize a VecDeque for pending record batches, improving the efficiency of batch handling and reducing memory reallocations during stream processing.
  • Efficient Row Group Selection: Implemented a new constructor from_full_row_group_ids for RowGroupSelection to efficiently build selections after coarse pruning, avoiding the overhead of creating a full selection and then removing unneeded row groups.
Changelog
  • src/mito2/src/cache/index/result_cache.rs
    • Added MinMax variant to PredicateKey enum to support min-max pruning caching.
    • Implemented new_minmax constructor for PredicateKey to create min-max specific keys.
    • Updated mem_usage calculation for PredicateKey to include MinMaxKey.
    • Defined MinMaxKey struct with exprs, schema_version, skip_fields, and mem_usage fields.
    • Added a test case to verify MinMaxKey correctly distinguishes based on schema version and skip fields.
  • src/mito2/src/read/stream.rs
    • Replaced buffer: Vec<DfRecordBatch> with pending: VecDeque<RecordBatch> for more efficient batch management.
    • Removed unused imports ArrowComputeSnafu and compute.
    • Modified poll_next to prioritize processing batches from the pending queue before fetching new ones from the inner stream.
  • src/mito2/src/region/opener.rs
    • Imported necessary types FileHandle, MetadataLoader, and MetadataCacheMetrics.
    • Added a call to maybe_preload_parquet_meta_cache during region opening to proactively load Parquet metadata.
    • Implemented preload_parquet_meta_cache_for_files to load metadata for a list of files, prioritizing older files.
    • Implemented maybe_preload_parquet_meta_cache to spawn an asynchronous task for preloading metadata if caching is enabled.
    • Added test cases to verify Parquet metadata preloading, including scenarios where metadata is retrieved from file cache or statted from object storage.
  • src/mito2/src/sst/parquet/reader.rs
    • Refactored prune_row_groups_by_minmax into row_groups_by_minmax which now returns a RowGroupSelection directly.
    • Integrated caching logic for min-max pruning results, using PredicateKey::MinMax to store and retrieve cached selections.
    • Modified the row group pruning process to construct RowGroupSelection more efficiently from filtered row group IDs.
    • Added a test to ensure that MinMaxKey is not constructed when the index result cache is disabled, preventing unnecessary debug formatting.
  • src/mito2/src/sst/parquet/row_selection.rs
    • Added from_full_row_group_ids constructor to RowGroupSelection for creating selections from a list of row group IDs, optimizing for coarse pruning.
    • Added a test case for from_full_row_group_ids to ensure duplicate row group IDs are handled correctly.
Activity
  • No human activity (comments, reviews) has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant performance improvements for Parquet scans. The core changes include caching min-max pruning results to accelerate subsequent queries, and preloading Parquet metadata on region open to reduce first-query latency. The implementation is robust, correctly handling cache key generation and avoiding caching for dynamic filters. Additionally, the refactoring of ConvertBatchStream to stream record batches instead of concatenating them is a solid performance enhancement that reduces memory usage and improves throughput. The new functionality is well-tested. Overall, this is a high-quality contribution that I'm happy to approve.

Copy link
Contributor

@evenyag evenyag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions about the preload strategy.

impl MinMaxKey {
pub fn new(exprs: Arc<Vec<String>>, schema_version: u64, skip_fields: bool) -> Self {
let mem_usage =
exprs.iter().map(|s| s.len()).sum::<usize>() + size_of::<u64>() + size_of::<bool>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking: Should we add size_of::<Self>() + size_of::<Vec<String>>() instead of u64 + bool?

Comment on lines +1018 to +1022
tokio::spawn(async move {
let region_id = region.region_id;
let table_dir = region.access_layer.table_dir().to_string();
let path_type = region.access_layer.path_type();
let object_store = region.access_layer.object_store().clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a way to limit the parallelism of this task? Maybe also provide a way to disable preloading without disabling the metadata cache.

Comment on lines +961 to +962
// Load older files first so the most recent files remain hot in the LRU cache.
files.sort_by(|a, b| a.meta_ref().time_range.1.cmp(&b.meta_ref().time_range.1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loading all files' metadata may be costly (on S3). In some environments, most regions are rarely queried.

Maybe we can adjust the strategy:

  • Limit the total number of files to preload?
  • Only preload metadata from file cache?
  • Trigger a task to load metadata on a cache miss of the file cache. e.g. Maybe via something like the DownloadTask of the file cache. We can preload N files in adjacent time ranges.

Comment on lines +1044 to +1049
if loaded > 0 {
info!(
"Preloaded parquet metadata for region {}, loaded_files: {}",
region_id, loaded
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also log the load time.

&& let Some(result) = index_result_cache.get(predicate_key, file_id)
{
let num_row_groups = parquet_meta.num_row_groups();
metrics.rg_minmax_filtered += num_row_groups.saturating_sub(result.row_group_count());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add metrics for minmax cache hit/miss?

let mut exprs = predicate
.exprs()
.iter()
.map(|expr| format!("{expr:?}"))
Copy link
Contributor

@evenyag evenyag Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking: Since expr also implements Display, which is better for the cache key?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required This change does not impact docs. size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants