Skip to content

Commit 44e10f4

Browse files
xiaoxmengmeta-codesync[bot]
authored andcommitted
feat: Support stripe level batched index read (#16360)
Summary: Pull Request resolved: #16360 X-link: facebookincubator/nimble#480 This diff implements stripe-level batched index read support for Nimble files in Velox. The key motivation is to improve index lookup performance by processing multiple lookup requests in batches at the stripe level, rather than processing each request individually. **New `SelectiveNimbleIndexReader` class**: A format-specific index reader that handles: - Encoding index bounds into Nimble-specific encoded keys - Looking up stripes and row ranges using the tablet index - Managing stripe iteration and data reading with batched processing - Returning results in request order via an iterator pattern (`startLookup`/`hasNext`/`next`) **Batched stripe processing**: Instead of loading stripes per-request, the reader: - Maps all lookup requests to their matching stripes upfront - Merges overlapping row ranges within stripes for efficient reading (without filters) - Splits overlapping ranges into non-overlapping segments to preserve filter semantics (with filters) - Tracks output references with ref-counting to share read data across requests **HiveIndexReader refactoring**: Simplified to focus on index bounds creation and result assembly, delegating format-specific control logic to `SelectiveNimbleIndexReader`. **KeyEncoder enhancements**: Added support for encoding index bounds with constant values for efficient multi-row range queries, and extended test coverage for edge cases. **New runtime stats**: Added metrics for tracking index lookup performance: - `kNumIndexLookupRequests`: Total lookup requests processed - `kNumIndexLookupStripes`: Number of stripes accessed - `kNumIndexLookupReadSegments`: Number of read segments processed Reviewed By: HuamengJiang, tanjialiang Differential Revision: D92848948 fbshipit-source-id: d6f5c78ef23dca1d4d8faa2d27acf85f77612805
1 parent 54f4662 commit 44e10f4

File tree

11 files changed

+1609
-541
lines changed

11 files changed

+1609
-541
lines changed

velox/connectors/hive/HiveConnectorUtil.cpp

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -371,6 +371,30 @@ void checkColumnHandleConsistent(
371371
y.hiveType()->toString());
372372
}
373373

374+
std::shared_ptr<common::ScanSpec> makeScanSpec(
375+
const RowTypePtr& rowType,
376+
const folly::F14FastMap<std::string, std::vector<const common::Subfield*>>&
377+
outputSubfields,
378+
const common::SubfieldFilters& subfieldFilters,
379+
const RowTypePtr& dataColumns,
380+
const std::unordered_map<std::string, HiveColumnHandlePtr>& partitionKeys,
381+
const std::unordered_map<std::string, HiveColumnHandlePtr>& infoColumns,
382+
const SpecialColumnNames& specialColumns,
383+
bool disableStatsBasedFilterReorder,
384+
memory::MemoryPool* pool) {
385+
return makeScanSpec(
386+
rowType,
387+
outputSubfields,
388+
subfieldFilters,
389+
/*indexColumns=*/{},
390+
dataColumns,
391+
partitionKeys,
392+
infoColumns,
393+
specialColumns,
394+
disableStatsBasedFilterReorder,
395+
pool);
396+
}
397+
374398
std::shared_ptr<common::ScanSpec> makeScanSpec(
375399
const RowTypePtr& rowType,
376400
const folly::F14FastMap<std::string, std::vector<const common::Subfield*>>&

velox/connectors/hive/HiveConnectorUtil.h

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,25 @@ void checkColumnHandleConsistent(
8787
/// filters based on statistics.
8888
/// @param pool Memory pool for allocations during scan spec construction.
8989
/// @return A ScanSpec that can be used to configure a reader.
90+
std::shared_ptr<common::ScanSpec> makeScanSpec(
91+
const RowTypePtr& rowType,
92+
const folly::F14FastMap<std::string, std::vector<const common::Subfield*>>&
93+
outputSubfields,
94+
const common::SubfieldFilters& subfieldFilters,
95+
const RowTypePtr& dataColumns,
96+
const std::unordered_map<
97+
std::string,
98+
std::shared_ptr<const HiveColumnHandle>>& partitionKeys,
99+
const std::unordered_map<
100+
std::string,
101+
std::shared_ptr<const HiveColumnHandle>>& infoColumns,
102+
const SpecialColumnNames& specialColumns,
103+
bool disableStatsBasedFilterReorder,
104+
memory::MemoryPool* pool);
105+
106+
/// @deprecated Use the overload without indexColumns parameter instead.
107+
/// This overload is kept for backward compatibility and will be removed in a
108+
/// future release.
90109
std::shared_ptr<common::ScanSpec> makeScanSpec(
91110
const RowTypePtr& rowType,
92111
const folly::F14FastMap<std::string, std::vector<const common::Subfield*>>&

0 commit comments

Comments
 (0)