Commit 18a98c5
feat: Add cluster index support in velox selective reader (facebookincubator#406)
Summary:
Pull Request resolved: facebookincubator#406
This diff integrates the cluster index into the Nimble selective reader to enable filter-based row pruning at read time:
**NimbleRowReaderOptions**: Added format-specific options to control index-based filtering via `setIndexEnabled()`.
**ReaderBase refactoring**:
- Changed constructor to static `create()` factory method for proper initialization ordering
- Renamed `memoryPool_` to `pool_` for consistency
- Added `StripeStreams::streamIndex()` to get stream index for decoders
- Added `StripeStreams::enqueueKeyStream()` to load key stream data for IO coalescing with data streams
**SelectiveNimbleRowReader index integration**:
- Added `convertIndexColumnsToFileSchema()` to convert index column names from nimble schema to file schema (user-facing names)
- `initIndexBounds()`: Converts index columns to file schema names, then converts ScanSpec filters to encoded key bounds
- `updateStartStripeFromLowerIndexBound()` / `updateEndStripeFromUpperIndexBound()`: Uses TabletIndex lookup to skip entire stripes outside the filter range
- `buildIndexReader()`: Creates IndexReader for first/last stripes only (they can be the same) to find exact row positions
- `setStripeRowRange()`: Adjusts row range within stripes based on key bounds, then calls `columnReader_->seekTo()` to skip to the starting position
**NimbleData updates**:
- Changed `streams_` and `memoryPool_` from references to pointers for consistency
- Updated decoder creation to pass stream index for index-aware decoding
- Changed `VELOX_UNREACHABLE()` to `NIMBLE_UNSUPPORTED()` for unsupported operations
**Random Skip Handling with Cluster Index: When random sampling (random skip) is enabled along with cluster index filtering, the random skip tracker must be updated to account for rows that are skipped due to index bounds (not just rows that are actually read). This ensures that random sampling produces consistent results whether or not index filtering is enabled. The implementation handles three scenarios:
- Leading stripes filtered out: When updateStartStripeFromLowerIndexBound() skips entire stripes at the beginning due to the lower bound, maybeUpdateRandomSkip() is called immediately with the total row count of those filtered stripes.
- Trailing stripes filtered out: When updateEndStripeFromUpperIndexBound() skips entire stripes at the end due to the upper bound, the row counts are accumulated into trailingSkippedRows_.
Partial stripe rows filtered: When setStripeRowRange() adjusts the row range within stripes (rows skipped at the start of the first stripe due to lower bound, or rows skipped at the end of the last stripe due to upper bound), the skipped row counts are handled: leading rows trigger immediate maybeUpdateRandomSkip() call, while trailing rows are accumulated into trailingSkippedRows_.
- Deferred update for trailing rows: All trailing skipped rows (from both entire stripes and partial last stripe) are updated together via maybeUpdateRandomSkip(trailingSkippedRows_) in nextRowNumber() when reaching kAtEnd.
Added randomSkipWithIndex test case and integrated random skip testing into the randomSchemaAndFilters fuzzer test.
**E2EIndexTest**: Comprehensive test suite covering:
- Single-column keys: bigint, double, float, timestamp, boolean, varchar
- Multi-column composite keys with various filter combinations (point + point, point + range, range + point, etc.)
- Fuzzer testing with randomly generated table and index schema and randomly generated index column filter for conversion and non-index column filter combinations.
- Verification that index-enabled and index-disabled reads produce identical results
Reviewed By: Yuhta
Differential Revision: D90074133
fbshipit-source-id: b64323dc51303e3c5ab968e11d73f166e0a2f5921 parent b3b68ad commit 18a98c5
File tree
8 files changed
+3554
-70
lines changed- dwio/nimble/velox/selective
- tests
8 files changed
+3554
-70
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | | - | |
36 | | - | |
| 35 | + | |
| 36 | + | |
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
| |||
97 | 97 | | |
98 | 98 | | |
99 | 99 | | |
100 | | - | |
| 100 | + | |
101 | 101 | | |
102 | 102 | | |
103 | 103 | | |
104 | 104 | | |
105 | 105 | | |
106 | | - | |
| 106 | + | |
107 | 107 | | |
108 | 108 | | |
109 | 109 | | |
| |||
155 | 155 | | |
156 | 156 | | |
157 | 157 | | |
| 158 | + | |
158 | 159 | | |
159 | | - | |
| 160 | + | |
160 | 161 | | |
161 | | - | |
162 | | - | |
| 162 | + | |
| 163 | + | |
163 | 164 | | |
164 | 165 | | |
165 | 166 | | |
166 | 167 | | |
| 168 | + | |
| 169 | + | |
167 | 170 | | |
168 | | - | |
169 | | - | |
| 171 | + | |
170 | 172 | | |
171 | | - | |
172 | | - | |
| 173 | + | |
| 174 | + | |
173 | 175 | | |
174 | 176 | | |
175 | 177 | | |
176 | 178 | | |
| 179 | + | |
| 180 | + | |
177 | 181 | | |
178 | | - | |
179 | | - | |
| 182 | + | |
180 | 183 | | |
181 | | - | |
182 | | - | |
| 184 | + | |
| 185 | + | |
183 | 186 | | |
184 | 187 | | |
185 | 188 | | |
| |||
199 | 202 | | |
200 | 203 | | |
201 | 204 | | |
202 | | - | |
| 205 | + | |
203 | 206 | | |
204 | 207 | | |
205 | 208 | | |
206 | 209 | | |
207 | 210 | | |
208 | 211 | | |
209 | | - | |
210 | | - | |
211 | | - | |
| 212 | + | |
| 213 | + | |
212 | 214 | | |
213 | 215 | | |
214 | 216 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
59 | | - | |
| 59 | + | |
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
67 | | - | |
| 67 | + | |
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
| |||
103 | 103 | | |
104 | 104 | | |
105 | 105 | | |
106 | | - | |
107 | | - | |
| 106 | + | |
| 107 | + | |
108 | 108 | | |
109 | 109 | | |
110 | 110 | | |
| |||
169 | 169 | | |
170 | 170 | | |
171 | 171 | | |
172 | | - | |
| 172 | + | |
173 | 173 | | |
174 | 174 | | |
175 | 175 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| 19 | + | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
| |||
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
29 | | - | |
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
50 | | - | |
51 | 50 | | |
52 | 51 | | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
53 | 79 | | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
69 | 94 | | |
70 | 95 | | |
71 | 96 | | |
| |||
94 | 119 | | |
95 | 120 | | |
96 | 121 | | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
97 | 145 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| 19 | + | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
| |||
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
28 | | - | |
| 29 | + | |
29 | 30 | | |
30 | 31 | | |
31 | 32 | | |
| |||
37 | 38 | | |
38 | 39 | | |
39 | 40 | | |
40 | | - | |
41 | | - | |
| 41 | + | |
| 42 | + | |
42 | 43 | | |
43 | 44 | | |
44 | 45 | | |
| |||
64 | 65 | | |
65 | 66 | | |
66 | 67 | | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
67 | 77 | | |
68 | 78 | | |
69 | | - | |
| 79 | + | |
70 | 80 | | |
71 | 81 | | |
72 | 82 | | |
| |||
80 | 90 | | |
81 | 91 | | |
82 | 92 | | |
83 | | - | |
| 93 | + | |
84 | 94 | | |
85 | | - | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
86 | 102 | | |
87 | 103 | | |
88 | 104 | | |
| |||
92 | 108 | | |
93 | 109 | | |
94 | 110 | | |
| 111 | + | |
| 112 | + | |
95 | 113 | | |
96 | 114 | | |
97 | 115 | | |
98 | 116 | | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
99 | 127 | | |
100 | 128 | | |
101 | 129 | | |
| |||
0 commit comments