feat: Support stripe level batched index read#480
Closed
xiaoxmeng wants to merge 1 commit intofacebookincubator:mainfrom
Closed
feat: Support stripe level batched index read#480xiaoxmeng wants to merge 1 commit intofacebookincubator:mainfrom
xiaoxmeng wants to merge 1 commit intofacebookincubator:mainfrom
Conversation
|
@xiaoxmeng has exported this pull request. If you are a Meta employee, you can view the originating Diff in D92848948. |
xiaoxmeng
added a commit
to xiaoxmeng/velox
that referenced
this pull request
Feb 12, 2026
Summary: X-link: facebookincubator/nimble#480 This diff implements stripe-level batched index read support for Nimble files in Velox. The key motivation is to improve index lookup performance by processing multiple lookup requests in batches at the stripe level, rather than processing each request individually. **New `SelectiveNimbleIndexReader` class**: A new format-specific index reader that handles: - Encoding index bounds into Nimble-specific encoded keys - Looking up stripes and row ranges using the tablet index - Managing stripe iteration and data reading with batched processing - Returning results in request order via an iterator pattern (`startLookup`/`hasNext`/`next`) **Batched stripe processing**: Instead of loading stripes per-request, the reader: - Maps all lookup requests to their matching stripes upfront - Merges overlapping row ranges within stripes for efficient reading - Tracks output references with ref-counting to share read data across requests **Optimized row range handling**: - Without filters: Merges overlapping row ranges and each request extracts its portion - With filters: Splits overlapping ranges into non-overlapping segments to preserve filter semantics **HiveIndexReader refactoring**: Simplified to focus on index bounds creation and result assembly, delegating control logic to format-specific readers. **KeyEncoder enhancements**: Added support for encoding index bounds with constant values for more efficient range queries. **New runtime stats**: Added metrics for tracking index lookup performance: - `kNumIndexLookupRequests`: Total lookup requests - `kNumIndexLookupStripes`: Number of stripes accessed - `kNumIndexLookupReadSegments`: Number of read segments processed Differential Revision: D92848948
aa9186c to
65f19fe
Compare
xiaoxmeng
added a commit
to xiaoxmeng/nimble
that referenced
this pull request
Feb 12, 2026
Summary: X-link: facebookincubator/velox#16360 This diff implements stripe-level batched index read support for Nimble files in Velox. The key motivation is to improve index lookup performance by processing multiple lookup requests in batches at the stripe level, rather than processing each request individually. **New `SelectiveNimbleIndexReader` class**: A new format-specific index reader that handles: - Encoding index bounds into Nimble-specific encoded keys - Looking up stripes and row ranges using the tablet index - Managing stripe iteration and data reading with batched processing - Returning results in request order via an iterator pattern (`startLookup`/`hasNext`/`next`) **Batched stripe processing**: Instead of loading stripes per-request, the reader: - Maps all lookup requests to their matching stripes upfront - Merges overlapping row ranges within stripes for efficient reading - Tracks output references with ref-counting to share read data across requests **Optimized row range handling**: - Without filters: Merges overlapping row ranges and each request extracts its portion - With filters: Splits overlapping ranges into non-overlapping segments to preserve filter semantics **HiveIndexReader refactoring**: Simplified to focus on index bounds creation and result assembly, delegating control logic to format-specific readers. **KeyEncoder enhancements**: Added support for encoding index bounds with constant values for more efficient range queries. **New runtime stats**: Added metrics for tracking index lookup performance: - `kNumIndexLookupRequests`: Total lookup requests - `kNumIndexLookupStripes`: Number of stripes accessed - `kNumIndexLookupReadSegments`: Number of read segments processed Differential Revision: D92848948
xiaoxmeng
added a commit
to xiaoxmeng/velox
that referenced
this pull request
Feb 12, 2026
Summary: X-link: facebookincubator/nimble#480 This diff implements stripe-level batched index read support for Nimble files in Velox. The key motivation is to improve index lookup performance by processing multiple lookup requests in batches at the stripe level, rather than processing each request individually. **New `SelectiveNimbleIndexReader` class**: A new format-specific index reader that handles: - Encoding index bounds into Nimble-specific encoded keys - Looking up stripes and row ranges using the tablet index - Managing stripe iteration and data reading with batched processing - Returning results in request order via an iterator pattern (`startLookup`/`hasNext`/`next`) **Batched stripe processing**: Instead of loading stripes per-request, the reader: - Maps all lookup requests to their matching stripes upfront - Merges overlapping row ranges within stripes for efficient reading - Tracks output references with ref-counting to share read data across requests **Optimized row range handling**: - Without filters: Merges overlapping row ranges and each request extracts its portion - With filters: Splits overlapping ranges into non-overlapping segments to preserve filter semantics **HiveIndexReader refactoring**: Simplified to focus on index bounds creation and result assembly, delegating control logic to format-specific readers. **KeyEncoder enhancements**: Added support for encoding index bounds with constant values for more efficient range queries. **New runtime stats**: Added metrics for tracking index lookup performance: - `kNumIndexLookupRequests`: Total lookup requests - `kNumIndexLookupStripes`: Number of stripes accessed - `kNumIndexLookupReadSegments`: Number of read segments processed Differential Revision: D92848948
65f19fe to
78d972c
Compare
xiaoxmeng
added a commit
to xiaoxmeng/nimble
that referenced
this pull request
Feb 12, 2026
Summary: X-link: facebookincubator/velox#16360 This diff implements stripe-level batched index read support for Nimble files in Velox. The key motivation is to improve index lookup performance by processing multiple lookup requests in batches at the stripe level, rather than processing each request individually. **New `SelectiveNimbleIndexReader` class**: A new format-specific index reader that handles: - Encoding index bounds into Nimble-specific encoded keys - Looking up stripes and row ranges using the tablet index - Managing stripe iteration and data reading with batched processing - Returning results in request order via an iterator pattern (`startLookup`/`hasNext`/`next`) **Batched stripe processing**: Instead of loading stripes per-request, the reader: - Maps all lookup requests to their matching stripes upfront - Merges overlapping row ranges within stripes for efficient reading - Tracks output references with ref-counting to share read data across requests **Optimized row range handling**: - Without filters: Merges overlapping row ranges and each request extracts its portion - With filters: Splits overlapping ranges into non-overlapping segments to preserve filter semantics **HiveIndexReader refactoring**: Simplified to focus on index bounds creation and result assembly, delegating control logic to format-specific readers. **KeyEncoder enhancements**: Added support for encoding index bounds with constant values for more efficient range queries. **New runtime stats**: Added metrics for tracking index lookup performance: - `kNumIndexLookupRequests`: Total lookup requests - `kNumIndexLookupStripes`: Number of stripes accessed - `kNumIndexLookupReadSegments`: Number of read segments processed Reviewed By: HuamengJiang, tanjialiang Differential Revision: D92848948
xiaoxmeng
added a commit
to xiaoxmeng/velox
that referenced
this pull request
Feb 12, 2026
Summary: X-link: facebookincubator/nimble#480 This diff implements stripe-level batched index read support for Nimble files in Velox. The key motivation is to improve index lookup performance by processing multiple lookup requests in batches at the stripe level, rather than processing each request individually. **New `SelectiveNimbleIndexReader` class**: A new format-specific index reader that handles: - Encoding index bounds into Nimble-specific encoded keys - Looking up stripes and row ranges using the tablet index - Managing stripe iteration and data reading with batched processing - Returning results in request order via an iterator pattern (`startLookup`/`hasNext`/`next`) **Batched stripe processing**: Instead of loading stripes per-request, the reader: - Maps all lookup requests to their matching stripes upfront - Merges overlapping row ranges within stripes for efficient reading - Tracks output references with ref-counting to share read data across requests **Optimized row range handling**: - Without filters: Merges overlapping row ranges and each request extracts its portion - With filters: Splits overlapping ranges into non-overlapping segments to preserve filter semantics **HiveIndexReader refactoring**: Simplified to focus on index bounds creation and result assembly, delegating control logic to format-specific readers. **KeyEncoder enhancements**: Added support for encoding index bounds with constant values for more efficient range queries. **New runtime stats**: Added metrics for tracking index lookup performance: - `kNumIndexLookupRequests`: Total lookup requests - `kNumIndexLookupStripes`: Number of stripes accessed - `kNumIndexLookupReadSegments`: Number of read segments processed Reviewed By: HuamengJiang, tanjialiang Differential Revision: D92848948
xiaoxmeng
added a commit
to xiaoxmeng/nimble
that referenced
this pull request
Feb 13, 2026
Summary: X-link: facebookincubator/velox#16360 This diff implements stripe-level batched index read support for Nimble files in Velox. The key motivation is to improve index lookup performance by processing multiple lookup requests in batches at the stripe level, rather than processing each request individually. **New `SelectiveNimbleIndexReader` class**: A format-specific index reader that handles: - Encoding index bounds into Nimble-specific encoded keys - Looking up stripes and row ranges using the tablet index - Managing stripe iteration and data reading with batched processing - Returning results in request order via an iterator pattern (`startLookup`/`hasNext`/`next`) **Batched stripe processing**: Instead of loading stripes per-request, the reader: - Maps all lookup requests to their matching stripes upfront - Merges overlapping row ranges within stripes for efficient reading (without filters) - Splits overlapping ranges into non-overlapping segments to preserve filter semantics (with filters) - Tracks output references with ref-counting to share read data across requests **HiveIndexReader refactoring**: Simplified to focus on index bounds creation and result assembly, delegating format-specific control logic to `SelectiveNimbleIndexReader`. **KeyEncoder enhancements**: Added support for encoding index bounds with constant values for efficient multi-row range queries, and extended test coverage for edge cases. **New runtime stats**: Added metrics for tracking index lookup performance: - `kNumIndexLookupRequests`: Total lookup requests processed - `kNumIndexLookupStripes`: Number of stripes accessed - `kNumIndexLookupReadSegments`: Number of read segments processed Reviewed By: HuamengJiang, tanjialiang Differential Revision: D92848948
78d972c to
6bee7e2
Compare
xiaoxmeng
added a commit
to xiaoxmeng/velox
that referenced
this pull request
Feb 13, 2026
Summary: X-link: facebookincubator/nimble#480 This diff implements stripe-level batched index read support for Nimble files in Velox. The key motivation is to improve index lookup performance by processing multiple lookup requests in batches at the stripe level, rather than processing each request individually. **New `SelectiveNimbleIndexReader` class**: A format-specific index reader that handles: - Encoding index bounds into Nimble-specific encoded keys - Looking up stripes and row ranges using the tablet index - Managing stripe iteration and data reading with batched processing - Returning results in request order via an iterator pattern (`startLookup`/`hasNext`/`next`) **Batched stripe processing**: Instead of loading stripes per-request, the reader: - Maps all lookup requests to their matching stripes upfront - Merges overlapping row ranges within stripes for efficient reading (without filters) - Splits overlapping ranges into non-overlapping segments to preserve filter semantics (with filters) - Tracks output references with ref-counting to share read data across requests **HiveIndexReader refactoring**: Simplified to focus on index bounds creation and result assembly, delegating format-specific control logic to `SelectiveNimbleIndexReader`. **KeyEncoder enhancements**: Added support for encoding index bounds with constant values for efficient multi-row range queries, and extended test coverage for edge cases. **New runtime stats**: Added metrics for tracking index lookup performance: - `kNumIndexLookupRequests`: Total lookup requests processed - `kNumIndexLookupStripes`: Number of stripes accessed - `kNumIndexLookupReadSegments`: Number of read segments processed Reviewed By: HuamengJiang, tanjialiang Differential Revision: D92848948
Summary: X-link: facebookincubator/velox#16360 This diff implements stripe-level batched index read support for Nimble files in Velox. The key motivation is to improve index lookup performance by processing multiple lookup requests in batches at the stripe level, rather than processing each request individually. **New `SelectiveNimbleIndexReader` class**: A format-specific index reader that handles: - Encoding index bounds into Nimble-specific encoded keys - Looking up stripes and row ranges using the tablet index - Managing stripe iteration and data reading with batched processing - Returning results in request order via an iterator pattern (`startLookup`/`hasNext`/`next`) **Batched stripe processing**: Instead of loading stripes per-request, the reader: - Maps all lookup requests to their matching stripes upfront - Merges overlapping row ranges within stripes for efficient reading (without filters) - Splits overlapping ranges into non-overlapping segments to preserve filter semantics (with filters) - Tracks output references with ref-counting to share read data across requests **HiveIndexReader refactoring**: Simplified to focus on index bounds creation and result assembly, delegating format-specific control logic to `SelectiveNimbleIndexReader`. **KeyEncoder enhancements**: Added support for encoding index bounds with constant values for efficient multi-row range queries, and extended test coverage for edge cases. **New runtime stats**: Added metrics for tracking index lookup performance: - `kNumIndexLookupRequests`: Total lookup requests processed - `kNumIndexLookupStripes`: Number of stripes accessed - `kNumIndexLookupReadSegments`: Number of read segments processed Reviewed By: HuamengJiang, tanjialiang Differential Revision: D92848948
6bee7e2 to
4809bfe
Compare
xiaoxmeng
added a commit
to xiaoxmeng/velox
that referenced
this pull request
Feb 13, 2026
Summary: X-link: facebookincubator/nimble#480 This diff implements stripe-level batched index read support for Nimble files in Velox. The key motivation is to improve index lookup performance by processing multiple lookup requests in batches at the stripe level, rather than processing each request individually. **New `SelectiveNimbleIndexReader` class**: A format-specific index reader that handles: - Encoding index bounds into Nimble-specific encoded keys - Looking up stripes and row ranges using the tablet index - Managing stripe iteration and data reading with batched processing - Returning results in request order via an iterator pattern (`startLookup`/`hasNext`/`next`) **Batched stripe processing**: Instead of loading stripes per-request, the reader: - Maps all lookup requests to their matching stripes upfront - Merges overlapping row ranges within stripes for efficient reading (without filters) - Splits overlapping ranges into non-overlapping segments to preserve filter semantics (with filters) - Tracks output references with ref-counting to share read data across requests **HiveIndexReader refactoring**: Simplified to focus on index bounds creation and result assembly, delegating format-specific control logic to `SelectiveNimbleIndexReader`. **KeyEncoder enhancements**: Added support for encoding index bounds with constant values for efficient multi-row range queries, and extended test coverage for edge cases. **New runtime stats**: Added metrics for tracking index lookup performance: - `kNumIndexLookupRequests`: Total lookup requests processed - `kNumIndexLookupStripes`: Number of stripes accessed - `kNumIndexLookupReadSegments`: Number of read segments processed Reviewed By: HuamengJiang, tanjialiang Differential Revision: D92848948
meta-codesync bot
pushed a commit
to facebookincubator/velox
that referenced
this pull request
Feb 13, 2026
Summary: Pull Request resolved: #16360 X-link: facebookincubator/nimble#480 This diff implements stripe-level batched index read support for Nimble files in Velox. The key motivation is to improve index lookup performance by processing multiple lookup requests in batches at the stripe level, rather than processing each request individually. **New `SelectiveNimbleIndexReader` class**: A format-specific index reader that handles: - Encoding index bounds into Nimble-specific encoded keys - Looking up stripes and row ranges using the tablet index - Managing stripe iteration and data reading with batched processing - Returning results in request order via an iterator pattern (`startLookup`/`hasNext`/`next`) **Batched stripe processing**: Instead of loading stripes per-request, the reader: - Maps all lookup requests to their matching stripes upfront - Merges overlapping row ranges within stripes for efficient reading (without filters) - Splits overlapping ranges into non-overlapping segments to preserve filter semantics (with filters) - Tracks output references with ref-counting to share read data across requests **HiveIndexReader refactoring**: Simplified to focus on index bounds creation and result assembly, delegating format-specific control logic to `SelectiveNimbleIndexReader`. **KeyEncoder enhancements**: Added support for encoding index bounds with constant values for efficient multi-row range queries, and extended test coverage for edge cases. **New runtime stats**: Added metrics for tracking index lookup performance: - `kNumIndexLookupRequests`: Total lookup requests processed - `kNumIndexLookupStripes`: Number of stripes accessed - `kNumIndexLookupReadSegments`: Number of read segments processed Reviewed By: HuamengJiang, tanjialiang Differential Revision: D92848948 fbshipit-source-id: d6f5c78ef23dca1d4d8faa2d27acf85f77612805
|
This pull request has been merged in c039c5f. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
This diff implements stripe-level batched index read support for Nimble files in Velox. The key motivation is to improve index lookup performance by processing multiple lookup requests in batches at the stripe level, rather than processing each request individually.
New
SelectiveNimbleIndexReaderclass: A new format-specific index reader that handles:startLookup/hasNext/next)Batched stripe processing: Instead of loading stripes per-request, the reader:
Optimized row range handling:
HiveIndexReader refactoring: Simplified to focus on index bounds creation and result assembly, delegating control logic to format-specific readers.
KeyEncoder enhancements: Added support for encoding index bounds with constant values for more efficient range queries.
New runtime stats: Added metrics for tracking index lookup performance:
kNumIndexLookupRequests: Total lookup requestskNumIndexLookupStripes: Number of stripes accessedkNumIndexLookupReadSegments: Number of read segments processedDifferential Revision: D92848948