Skip to content

feat: Support stripe level batched index read#480

Closed
xiaoxmeng wants to merge 1 commit intofacebookincubator:mainfrom
xiaoxmeng:export-D92848948
Closed

feat: Support stripe level batched index read#480
xiaoxmeng wants to merge 1 commit intofacebookincubator:mainfrom
xiaoxmeng:export-D92848948

Conversation

@xiaoxmeng
Copy link
Contributor

Summary:
This diff implements stripe-level batched index read support for Nimble files in Velox. The key motivation is to improve index lookup performance by processing multiple lookup requests in batches at the stripe level, rather than processing each request individually.

New SelectiveNimbleIndexReader class: A new format-specific index reader that handles:

  • Encoding index bounds into Nimble-specific encoded keys
  • Looking up stripes and row ranges using the tablet index
  • Managing stripe iteration and data reading with batched processing
  • Returning results in request order via an iterator pattern (startLookup/hasNext/next)

Batched stripe processing: Instead of loading stripes per-request, the reader:

  • Maps all lookup requests to their matching stripes upfront
  • Merges overlapping row ranges within stripes for efficient reading
  • Tracks output references with ref-counting to share read data across requests

Optimized row range handling:

  • Without filters: Merges overlapping row ranges and each request extracts its portion
  • With filters: Splits overlapping ranges into non-overlapping segments to preserve filter semantics

HiveIndexReader refactoring: Simplified to focus on index bounds creation and result assembly, delegating control logic to format-specific readers.

KeyEncoder enhancements: Added support for encoding index bounds with constant values for more efficient range queries.

New runtime stats: Added metrics for tracking index lookup performance:

  • kNumIndexLookupRequests: Total lookup requests
  • kNumIndexLookupStripes: Number of stripes accessed
  • kNumIndexLookupReadSegments: Number of read segments processed

Differential Revision: D92848948

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 12, 2026
@meta-codesync
Copy link

meta-codesync bot commented Feb 12, 2026

@xiaoxmeng has exported this pull request. If you are a Meta employee, you can view the originating Diff in D92848948.

xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request Feb 12, 2026
Summary:
X-link: facebookincubator/nimble#480

This diff implements stripe-level batched index read support for Nimble files in Velox. The key motivation is to improve index lookup performance by processing multiple lookup requests in batches at the stripe level, rather than processing each request individually.

**New `SelectiveNimbleIndexReader` class**: A new format-specific index reader that handles:
   - Encoding index bounds into Nimble-specific encoded keys
   - Looking up stripes and row ranges using the tablet index
   - Managing stripe iteration and data reading with batched processing
   - Returning results in request order via an iterator pattern (`startLookup`/`hasNext`/`next`)

**Batched stripe processing**: Instead of loading stripes per-request, the reader:
   - Maps all lookup requests to their matching stripes upfront
   - Merges overlapping row ranges within stripes for efficient reading
   - Tracks output references with ref-counting to share read data across requests

**Optimized row range handling**:
   - Without filters: Merges overlapping row ranges and each request extracts its portion
   - With filters: Splits overlapping ranges into non-overlapping segments to preserve filter semantics

**HiveIndexReader refactoring**: Simplified to focus on index bounds creation and result assembly, delegating control logic to format-specific readers.

**KeyEncoder enhancements**: Added support for encoding index bounds with constant values for more efficient range queries.

**New runtime stats**: Added metrics for tracking index lookup performance:
   - `kNumIndexLookupRequests`: Total lookup requests
   - `kNumIndexLookupStripes`: Number of stripes accessed
   - `kNumIndexLookupReadSegments`: Number of read segments processed

Differential Revision: D92848948
xiaoxmeng added a commit to xiaoxmeng/nimble that referenced this pull request Feb 12, 2026
Summary:
X-link: facebookincubator/velox#16360


This diff implements stripe-level batched index read support for Nimble files in Velox. The key motivation is to improve index lookup performance by processing multiple lookup requests in batches at the stripe level, rather than processing each request individually.

**New `SelectiveNimbleIndexReader` class**: A new format-specific index reader that handles:
   - Encoding index bounds into Nimble-specific encoded keys
   - Looking up stripes and row ranges using the tablet index
   - Managing stripe iteration and data reading with batched processing
   - Returning results in request order via an iterator pattern (`startLookup`/`hasNext`/`next`)

**Batched stripe processing**: Instead of loading stripes per-request, the reader:
   - Maps all lookup requests to their matching stripes upfront
   - Merges overlapping row ranges within stripes for efficient reading
   - Tracks output references with ref-counting to share read data across requests

**Optimized row range handling**:
   - Without filters: Merges overlapping row ranges and each request extracts its portion
   - With filters: Splits overlapping ranges into non-overlapping segments to preserve filter semantics

**HiveIndexReader refactoring**: Simplified to focus on index bounds creation and result assembly, delegating control logic to format-specific readers.

**KeyEncoder enhancements**: Added support for encoding index bounds with constant values for more efficient range queries.

**New runtime stats**: Added metrics for tracking index lookup performance:
   - `kNumIndexLookupRequests`: Total lookup requests
   - `kNumIndexLookupStripes`: Number of stripes accessed
   - `kNumIndexLookupReadSegments`: Number of read segments processed

Differential Revision: D92848948
xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request Feb 12, 2026
Summary:

X-link: facebookincubator/nimble#480

This diff implements stripe-level batched index read support for Nimble files in Velox. The key motivation is to improve index lookup performance by processing multiple lookup requests in batches at the stripe level, rather than processing each request individually.

**New `SelectiveNimbleIndexReader` class**: A new format-specific index reader that handles:
   - Encoding index bounds into Nimble-specific encoded keys
   - Looking up stripes and row ranges using the tablet index
   - Managing stripe iteration and data reading with batched processing
   - Returning results in request order via an iterator pattern (`startLookup`/`hasNext`/`next`)

**Batched stripe processing**: Instead of loading stripes per-request, the reader:
   - Maps all lookup requests to their matching stripes upfront
   - Merges overlapping row ranges within stripes for efficient reading
   - Tracks output references with ref-counting to share read data across requests

**Optimized row range handling**:
   - Without filters: Merges overlapping row ranges and each request extracts its portion
   - With filters: Splits overlapping ranges into non-overlapping segments to preserve filter semantics

**HiveIndexReader refactoring**: Simplified to focus on index bounds creation and result assembly, delegating control logic to format-specific readers.

**KeyEncoder enhancements**: Added support for encoding index bounds with constant values for more efficient range queries.

**New runtime stats**: Added metrics for tracking index lookup performance:
   - `kNumIndexLookupRequests`: Total lookup requests
   - `kNumIndexLookupStripes`: Number of stripes accessed
   - `kNumIndexLookupReadSegments`: Number of read segments processed

Differential Revision: D92848948
xiaoxmeng added a commit to xiaoxmeng/nimble that referenced this pull request Feb 12, 2026
Summary:
X-link: facebookincubator/velox#16360


This diff implements stripe-level batched index read support for Nimble files in Velox. The key motivation is to improve index lookup performance by processing multiple lookup requests in batches at the stripe level, rather than processing each request individually.

**New `SelectiveNimbleIndexReader` class**: A new format-specific index reader that handles:
   - Encoding index bounds into Nimble-specific encoded keys
   - Looking up stripes and row ranges using the tablet index
   - Managing stripe iteration and data reading with batched processing
   - Returning results in request order via an iterator pattern (`startLookup`/`hasNext`/`next`)

**Batched stripe processing**: Instead of loading stripes per-request, the reader:
   - Maps all lookup requests to their matching stripes upfront
   - Merges overlapping row ranges within stripes for efficient reading
   - Tracks output references with ref-counting to share read data across requests

**Optimized row range handling**:
   - Without filters: Merges overlapping row ranges and each request extracts its portion
   - With filters: Splits overlapping ranges into non-overlapping segments to preserve filter semantics

**HiveIndexReader refactoring**: Simplified to focus on index bounds creation and result assembly, delegating control logic to format-specific readers.

**KeyEncoder enhancements**: Added support for encoding index bounds with constant values for more efficient range queries.

**New runtime stats**: Added metrics for tracking index lookup performance:
   - `kNumIndexLookupRequests`: Total lookup requests
   - `kNumIndexLookupStripes`: Number of stripes accessed
   - `kNumIndexLookupReadSegments`: Number of read segments processed

Reviewed By: HuamengJiang, tanjialiang

Differential Revision: D92848948
xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request Feb 12, 2026
Summary:

X-link: facebookincubator/nimble#480

This diff implements stripe-level batched index read support for Nimble files in Velox. The key motivation is to improve index lookup performance by processing multiple lookup requests in batches at the stripe level, rather than processing each request individually.

**New `SelectiveNimbleIndexReader` class**: A new format-specific index reader that handles:
   - Encoding index bounds into Nimble-specific encoded keys
   - Looking up stripes and row ranges using the tablet index
   - Managing stripe iteration and data reading with batched processing
   - Returning results in request order via an iterator pattern (`startLookup`/`hasNext`/`next`)

**Batched stripe processing**: Instead of loading stripes per-request, the reader:
   - Maps all lookup requests to their matching stripes upfront
   - Merges overlapping row ranges within stripes for efficient reading
   - Tracks output references with ref-counting to share read data across requests

**Optimized row range handling**:
   - Without filters: Merges overlapping row ranges and each request extracts its portion
   - With filters: Splits overlapping ranges into non-overlapping segments to preserve filter semantics

**HiveIndexReader refactoring**: Simplified to focus on index bounds creation and result assembly, delegating control logic to format-specific readers.

**KeyEncoder enhancements**: Added support for encoding index bounds with constant values for more efficient range queries.

**New runtime stats**: Added metrics for tracking index lookup performance:
   - `kNumIndexLookupRequests`: Total lookup requests
   - `kNumIndexLookupStripes`: Number of stripes accessed
   - `kNumIndexLookupReadSegments`: Number of read segments processed

Reviewed By: HuamengJiang, tanjialiang

Differential Revision: D92848948
xiaoxmeng added a commit to xiaoxmeng/nimble that referenced this pull request Feb 13, 2026
Summary:
X-link: facebookincubator/velox#16360


This diff implements stripe-level batched index read support for Nimble files in Velox. The key motivation is to improve index lookup performance by processing multiple lookup requests in batches at the stripe level, rather than processing each request individually.

**New `SelectiveNimbleIndexReader` class**: A format-specific index reader that handles:
   - Encoding index bounds into Nimble-specific encoded keys
   - Looking up stripes and row ranges using the tablet index
   - Managing stripe iteration and data reading with batched processing
   - Returning results in request order via an iterator pattern (`startLookup`/`hasNext`/`next`)

**Batched stripe processing**: Instead of loading stripes per-request, the reader:
   - Maps all lookup requests to their matching stripes upfront
   - Merges overlapping row ranges within stripes for efficient reading (without filters)
   - Splits overlapping ranges into non-overlapping segments to preserve filter semantics (with filters)
   - Tracks output references with ref-counting to share read data across requests

**HiveIndexReader refactoring**: Simplified to focus on index bounds creation and result assembly, delegating format-specific control logic to `SelectiveNimbleIndexReader`.

**KeyEncoder enhancements**: Added support for encoding index bounds with constant values for efficient multi-row range queries, and extended test coverage for edge cases.

**New runtime stats**: Added metrics for tracking index lookup performance:
   - `kNumIndexLookupRequests`: Total lookup requests processed
   - `kNumIndexLookupStripes`: Number of stripes accessed
   - `kNumIndexLookupReadSegments`: Number of read segments processed

Reviewed By: HuamengJiang, tanjialiang

Differential Revision: D92848948
xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request Feb 13, 2026
Summary:

X-link: facebookincubator/nimble#480

This diff implements stripe-level batched index read support for Nimble files in Velox. The key motivation is to improve index lookup performance by processing multiple lookup requests in batches at the stripe level, rather than processing each request individually.

**New `SelectiveNimbleIndexReader` class**: A format-specific index reader that handles:
   - Encoding index bounds into Nimble-specific encoded keys
   - Looking up stripes and row ranges using the tablet index
   - Managing stripe iteration and data reading with batched processing
   - Returning results in request order via an iterator pattern (`startLookup`/`hasNext`/`next`)

**Batched stripe processing**: Instead of loading stripes per-request, the reader:
   - Maps all lookup requests to their matching stripes upfront
   - Merges overlapping row ranges within stripes for efficient reading (without filters)
   - Splits overlapping ranges into non-overlapping segments to preserve filter semantics (with filters)
   - Tracks output references with ref-counting to share read data across requests

**HiveIndexReader refactoring**: Simplified to focus on index bounds creation and result assembly, delegating format-specific control logic to `SelectiveNimbleIndexReader`.

**KeyEncoder enhancements**: Added support for encoding index bounds with constant values for efficient multi-row range queries, and extended test coverage for edge cases.

**New runtime stats**: Added metrics for tracking index lookup performance:
   - `kNumIndexLookupRequests`: Total lookup requests processed
   - `kNumIndexLookupStripes`: Number of stripes accessed
   - `kNumIndexLookupReadSegments`: Number of read segments processed

Reviewed By: HuamengJiang, tanjialiang

Differential Revision: D92848948
Summary:
X-link: facebookincubator/velox#16360


This diff implements stripe-level batched index read support for Nimble files in Velox. The key motivation is to improve index lookup performance by processing multiple lookup requests in batches at the stripe level, rather than processing each request individually.

**New `SelectiveNimbleIndexReader` class**: A format-specific index reader that handles:
   - Encoding index bounds into Nimble-specific encoded keys
   - Looking up stripes and row ranges using the tablet index
   - Managing stripe iteration and data reading with batched processing
   - Returning results in request order via an iterator pattern (`startLookup`/`hasNext`/`next`)

**Batched stripe processing**: Instead of loading stripes per-request, the reader:
   - Maps all lookup requests to their matching stripes upfront
   - Merges overlapping row ranges within stripes for efficient reading (without filters)
   - Splits overlapping ranges into non-overlapping segments to preserve filter semantics (with filters)
   - Tracks output references with ref-counting to share read data across requests

**HiveIndexReader refactoring**: Simplified to focus on index bounds creation and result assembly, delegating format-specific control logic to `SelectiveNimbleIndexReader`.

**KeyEncoder enhancements**: Added support for encoding index bounds with constant values for efficient multi-row range queries, and extended test coverage for edge cases.

**New runtime stats**: Added metrics for tracking index lookup performance:
   - `kNumIndexLookupRequests`: Total lookup requests processed
   - `kNumIndexLookupStripes`: Number of stripes accessed
   - `kNumIndexLookupReadSegments`: Number of read segments processed

Reviewed By: HuamengJiang, tanjialiang

Differential Revision: D92848948
xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request Feb 13, 2026
Summary:

X-link: facebookincubator/nimble#480

This diff implements stripe-level batched index read support for Nimble files in Velox. The key motivation is to improve index lookup performance by processing multiple lookup requests in batches at the stripe level, rather than processing each request individually.

**New `SelectiveNimbleIndexReader` class**: A format-specific index reader that handles:
   - Encoding index bounds into Nimble-specific encoded keys
   - Looking up stripes and row ranges using the tablet index
   - Managing stripe iteration and data reading with batched processing
   - Returning results in request order via an iterator pattern (`startLookup`/`hasNext`/`next`)

**Batched stripe processing**: Instead of loading stripes per-request, the reader:
   - Maps all lookup requests to their matching stripes upfront
   - Merges overlapping row ranges within stripes for efficient reading (without filters)
   - Splits overlapping ranges into non-overlapping segments to preserve filter semantics (with filters)
   - Tracks output references with ref-counting to share read data across requests

**HiveIndexReader refactoring**: Simplified to focus on index bounds creation and result assembly, delegating format-specific control logic to `SelectiveNimbleIndexReader`.

**KeyEncoder enhancements**: Added support for encoding index bounds with constant values for efficient multi-row range queries, and extended test coverage for edge cases.

**New runtime stats**: Added metrics for tracking index lookup performance:
   - `kNumIndexLookupRequests`: Total lookup requests processed
   - `kNumIndexLookupStripes`: Number of stripes accessed
   - `kNumIndexLookupReadSegments`: Number of read segments processed

Reviewed By: HuamengJiang, tanjialiang

Differential Revision: D92848948
meta-codesync bot pushed a commit to facebookincubator/velox that referenced this pull request Feb 13, 2026
Summary:
Pull Request resolved: #16360

X-link: facebookincubator/nimble#480

This diff implements stripe-level batched index read support for Nimble files in Velox. The key motivation is to improve index lookup performance by processing multiple lookup requests in batches at the stripe level, rather than processing each request individually.

**New `SelectiveNimbleIndexReader` class**: A format-specific index reader that handles:
   - Encoding index bounds into Nimble-specific encoded keys
   - Looking up stripes and row ranges using the tablet index
   - Managing stripe iteration and data reading with batched processing
   - Returning results in request order via an iterator pattern (`startLookup`/`hasNext`/`next`)

**Batched stripe processing**: Instead of loading stripes per-request, the reader:
   - Maps all lookup requests to their matching stripes upfront
   - Merges overlapping row ranges within stripes for efficient reading (without filters)
   - Splits overlapping ranges into non-overlapping segments to preserve filter semantics (with filters)
   - Tracks output references with ref-counting to share read data across requests

**HiveIndexReader refactoring**: Simplified to focus on index bounds creation and result assembly, delegating format-specific control logic to `SelectiveNimbleIndexReader`.

**KeyEncoder enhancements**: Added support for encoding index bounds with constant values for efficient multi-row range queries, and extended test coverage for edge cases.

**New runtime stats**: Added metrics for tracking index lookup performance:
   - `kNumIndexLookupRequests`: Total lookup requests processed
   - `kNumIndexLookupStripes`: Number of stripes accessed
   - `kNumIndexLookupReadSegments`: Number of read segments processed

Reviewed By: HuamengJiang, tanjialiang

Differential Revision: D92848948

fbshipit-source-id: d6f5c78ef23dca1d4d8faa2d27acf85f77612805
@meta-codesync meta-codesync bot closed this in c039c5f Feb 13, 2026
@meta-codesync
Copy link

meta-codesync bot commented Feb 13, 2026

This pull request has been merged in c039c5f.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported Merged meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants