Skip to content

tp: add IndexedFilterIn bytecode for In on indexed columns#5158

Open
LalitMaganti wants to merge 22 commits intomainfrom
dev/lalitm/indexed-filter-in
Open

tp: add IndexedFilterIn bytecode for In on indexed columns#5158
LalitMaganti wants to merge 22 commits intomainfrom
dev/lalitm/indexed-filter-in

Conversation

@LalitMaganti
Copy link
Member

@LalitMaganti LalitMaganti commented Mar 17, 2026

Summary

  • Add IndexedFilterIn bytecode that uses binary search on index permutation vectors for In filters
  • When a column has an index and the query uses In, the planner now emits IndexedFilterIn instead of the generic In bytecode
  • For each value in the list, binary-searches the index permutation vector (O(log N) per value) and concatenates matching ranges
  • Reduces In filter cost from O(N) to O(k log N + matches) where k is the number of values

Stack

  1. tp: add In filter support to TypedCursor and optimize In bytecode #5154 - tp: add In filter support to TypedCursor and optimize In bytecode
  2. tp: add IndexedFilterIn bytecode for In on indexed columns #5158 - tp: add IndexedFilterIn bytecode for In on indexed columns (this PR)
  3. tp: migrate experimental_slice_layout to use In filter on track_id #5155 - tp: migrate experimental_slice_layout to use In filter on track_id

Test plan

  • 4 new bytecode interpreter tests (IndexedFilterIn_Uint32_NonNull_MultipleValues, _NoMatch, _SingleValue, _String_SparseNull_MultipleValues)
  • 1 new query planner test (PlanQuery_SingleColIndex_InFilter_NonNullInt)
  • 1 new end-to-end TypedCursor test (TypedCursorInFilterWithIndex)
  • All existing indexed filter tests updated and passing

@LalitMaganti LalitMaganti requested a review from a team as a code owner March 17, 2026 03:17
@LalitMaganti LalitMaganti changed the title dev/lalitm/indexed filter in tp: add IndexedFilterIn bytecode for In on indexed columns Mar 17, 2026
@LalitMaganti LalitMaganti changed the base branch from main to dev/lalitm/in March 17, 2026 03:17
@LalitMaganti LalitMaganti force-pushed the dev/lalitm/indexed-filter-in branch from f19f7dd to ad716eb Compare March 17, 2026 03:51
Add SetFilterValueListUnchecked to TypedCursor allowing callers to
pass a pointer+size array of FilterValue for In filters without
allocation. Plumb this through the codegen'd ConstCursor/Cursor.

Optimize the In bytecode by pre-building lookup structures during
CastFilterValueList instead of rebuilding on every Execute():
- For dense Id/Uint32: BitVector (built once, not per-call)
- For large sparse integer/string lists: FlatHashMapV2 for O(1)
- For small lists (<=16): linear scan (cache-friendly)

The lookup is stored as a variant in CastFilterValueListResult,
replacing the separate value_list field.

Migrate experimental_slice_layout to use In filter on track_id.
When a column has an index and the query uses an In filter, the
planner now emits IndexedFilterIn instead of the generic In
bytecode. For each value in the list, IndexedFilterIn binary-
searches the index permutation vector (O(log N) per value) and
concatenates the matching ranges.

This reduces In filter cost from O(N) to O(k log N + matches)
where k is the number of values and N is the table size.
@LalitMaganti LalitMaganti force-pushed the dev/lalitm/indexed-filter-in branch from ad716eb to c4641da Compare March 17, 2026 04:03
Base automatically changed from dev/lalitm/in to main March 18, 2026 13:04
PrefixPopcount was emitted after IndexedFilterEq/In because
alloc_popcount() was called inside the AddOpcode block. For
SparseNull columns, this meant the popcount register was
uninitialized when the filter executed, causing a SIGSEGV on
LEFT JOINs over indexed columns.

Move alloc_popcount() before AddOpcode so PrefixPopcount is
emitted first.
IndexedFilterIn wrote results back into the source register's memory
via memcpy, but the source register points directly to the persistent
index permutation vector. This corrupted the index for all subsequent
queries on the same table.

Fix: allocate a separate slab+span pair (via AllocateIndices) for the
dest register, following the existing EnsureIndicesAreInSlab pattern.
IndexedFilterIn now writes into this pre-allocated buffer instead of
the source. The dest_register is changed to RwHandle since we need to
both read the pre-allocated span and write back the adjusted boundaries.

Added a regression test that verifies an Eq query on the same index
returns correct results after an IN query has executed.
@LalitMaganti LalitMaganti enabled auto-merge (squash) March 19, 2026 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant