-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Is your feature request related to a problem? Please describe
As of now, Lucene's DocValues fields maintain a multi-level skip list structure (DocValuesSkipper) that stores min/max value statistics for fixed-size blocks of documents (default 4,096 docs per block, grouped into 4 levels with 8x fanout). Range queries use this skip list to classify entire blocks as YES (all docs match), NO (no docs match), or MAYBE (need per-doc evaluation), avoiding expensive per-document value reads for YES and NO blocks.
Today, when a query ANDs multiple range filters on different fields (e.g., price:[10,50] AND rating:[4,5] AND date:[2024-01-01, 2024-12-31]), each field's skip list is evaluated independently. If the price skip list determines a block has no matching documents, the rating and date skip lists still read and evaluate their metadata for that same block, only for the conjunction(ConjunctionDISI) later on to discard it at the document level anyway.
Describe the solution you'd like
This proposal coordinates the skip list evaluation across fields. All fields' skip metadata is checked together at the block level, short-circuiting on the first field that says NO. When one field eliminates a block, the other fields skip it entirely, no skip reads, no per-doc evaluation.
This obviously will only operate if we have 2+ numeric range filters on different fields.
One way would be to rewrite and wrap multiple range queries and coordinator their skip evaluation via MultiFieldDocValuesRangeIterator. MultiFieldDocValuesRangeIterator that wraps the per-field skip iterators and advance them together ie when the lead field skips past a block, all other fields jump to the same position without reading their skip data for the skipped blocks.
Related component
Search:Performance
Describe alternatives you've considered
None at this point
Additional context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status