Skip to content

Commit 6bee7e2

Browse files
xiaoxmengfacebook-github-bot
authored andcommitted
feat: Support stripe level batched index read (facebookincubator#480)
Summary: X-link: facebookincubator/velox#16360 This diff implements stripe-level batched index read support for Nimble files in Velox. The key motivation is to improve index lookup performance by processing multiple lookup requests in batches at the stripe level, rather than processing each request individually. **New `SelectiveNimbleIndexReader` class**: A format-specific index reader that handles: - Encoding index bounds into Nimble-specific encoded keys - Looking up stripes and row ranges using the tablet index - Managing stripe iteration and data reading with batched processing - Returning results in request order via an iterator pattern (`startLookup`/`hasNext`/`next`) **Batched stripe processing**: Instead of loading stripes per-request, the reader: - Maps all lookup requests to their matching stripes upfront - Merges overlapping row ranges within stripes for efficient reading (without filters) - Splits overlapping ranges into non-overlapping segments to preserve filter semantics (with filters) - Tracks output references with ref-counting to share read data across requests **HiveIndexReader refactoring**: Simplified to focus on index bounds creation and result assembly, delegating format-specific control logic to `SelectiveNimbleIndexReader`. **KeyEncoder enhancements**: Added support for encoding index bounds with constant values for efficient multi-row range queries, and extended test coverage for edge cases. **New runtime stats**: Added metrics for tracking index lookup performance: - `kNumIndexLookupRequests`: Total lookup requests processed - `kNumIndexLookupStripes`: Number of stripes accessed - `kNumIndexLookupReadSegments`: Number of read segments processed Reviewed By: HuamengJiang, tanjialiang Differential Revision: D92848948
1 parent 36e5b55 commit 6bee7e2

File tree

6 files changed

+1103
-114
lines changed

6 files changed

+1103
-114
lines changed

dwio/nimble/index/TabletIndex.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,9 @@
2020
#include "dwio/nimble/common/Exceptions.h"
2121
#include "dwio/nimble/tablet/IndexGenerated.h"
2222
#include "dwio/nimble/tablet/MetadataBuffer.h"
23+
#include "folly/String.h"
2324
#include "folly/json/json.h"
25+
#include "folly/logging/xlog.h"
2426

2527
namespace facebook::nimble::index {
2628

@@ -122,7 +124,6 @@ std::optional<StripeLocation> TabletIndex::lookup(
122124

123125
// Calculate which stripe contains the key
124126
const uint32_t targetStripe = (it - stripeKeys_.begin()) - 1;
125-
126127
// Check if the key is beyond all stripes
127128
if (targetStripe >= numStripes_) {
128129
return std::nullopt;

dwio/nimble/index/tests/IndexTestUtils.cpp

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,12 +60,16 @@ void writeFile(
6060
const std::string& filePath,
6161
const std::vector<velox::RowVectorPtr>& data,
6262
IndexConfig indexConfig,
63-
velox::memory::MemoryPool& pool) {
63+
velox::memory::MemoryPool& pool,
64+
std::function<std::unique_ptr<FlushPolicy>()> flushPolicyFactory) {
6465
NIMBLE_CHECK(!data.empty(), "Data must not be empty");
6566

6667
VeloxWriterOptions options;
6768
options.enableChunking = true;
6869
options.indexConfig = std::move(indexConfig);
70+
if (flushPolicyFactory) {
71+
options.flushPolicyFactory = std::move(flushPolicyFactory);
72+
}
6973

7074
auto fs = velox::filesystems::getFileSystem(filePath, {});
7175
auto writeFile = fs->openFileForWrite(

dwio/nimble/index/tests/IndexTestUtils.h

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,11 +65,15 @@ std::vector<velox::RowVectorPtr> generateData(
6565
/// @param data Vector of RowVectorPtr to write.
6666
/// @param indexConfig Index configuration specifying columns, sort orders, etc.
6767
/// @param pool Memory pool for writing.
68+
/// @param flushPolicyFactory Optional factory for custom flush policy to
69+
/// control
70+
/// stripe sizes. If not provided, uses default flush policy.
6871
void writeFile(
6972
const std::string& filePath,
7073
const std::vector<velox::RowVectorPtr>& data,
7174
IndexConfig indexConfig,
72-
velox::memory::MemoryPool& pool);
75+
velox::memory::MemoryPool& pool,
76+
std::function<std::unique_ptr<FlushPolicy>()> flushPolicyFactory = nullptr);
7377

7478
/// Simple StreamLoader implementation for testing that holds data in memory.
7579
class TestStreamLoader : public StreamLoader {

0 commit comments

Comments
 (0)