Skip to content

Conversation

@CascadingRadium
Copy link
Member

@CascadingRadium CascadingRadium commented Dec 28, 2025

  • Use a bitset to track eligible documents instead of a slice of N uint64s, reducing memory usage from 8N bytes to N/8 bytes per segment (up to 64× reduction) and improving cache locality.
  • Pass an iterator over eligible documents that iterates the bitset directly, allowing direct translation into a bitset of eligible vector IDs in the storage layer and eliminating the need for a separate slice intermediary.
  • Fix garbage creation in the UnadornedPostingsIterator, which previously allocated a temporary struct per Next() call to wrap a doc number and satisfy the Postings interface; the iterator now returns a single reusable struct (one-time allocation) consistent with the working of the PostingsIterator in the storage-layer.
  • Avoid unnecessary BytesRead statistics computation when executing searches in no-scoring mode, removing redundant work as a micro-optimization.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR re-architects vector search to improve memory efficiency and reduce garbage collection pressure. The changes replace slice-based eligible document tracking with bitsets, achieving up to 64× memory reduction per segment, and optimize the iterator pattern to eliminate per-call allocations in the unadorned postings iterator.

Key changes:

  • Replaced slice-based eligible document tracking ([]uint64) with bitsets, reducing memory from 8N bytes to N/8 bytes per segment
  • Introduced iterator-based API for eligible documents that directly translates to bitset iteration at the storage layer
  • Fixed garbage creation in UnadornedPostingsIterator by reusing a single struct instance instead of allocating per Next() call
  • Optimized bytes read tracking to skip computation in no-scoring mode

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
index/scorch/snapshot_vector_index.go Introduces bitset-based eligible document storage and iterator API, replacing the previous slice-based approach
index/scorch/unadorned.go Changes UnadornedPosting from uint64 to struct with pointer receivers and adds reusable struct fields to iterators to eliminate per-call allocations
index/scorch/snapshot_index_tfr.go Adds conditional bytes read tracking via updateBytesRead flag to skip computation in no-scoring mode
index/scorch/snapshot_index.go Initializes updateBytesRead flag based on scoring requirements
index/scorch/optimize_knn.go Removes requiresFiltering flag and updates to use new SegmentEligibleDocuments API
index/scorch/optimize.go Sets updateBytesRead to false for unadorned term field readers
index/scorch/snapshot_index_vr.go Updates InterpretVectorIndex call to remove filtering parameter
index_test.go Updates expected bytes read values to reflect the optimization that skips unnecessary computation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@abhinavdangeti abhinavdangeti changed the title MB-69881: Re-architect vector search MB-69881: Improved APIs for vector search Jan 8, 2026
@abhinavdangeti abhinavdangeti added this to the v2.6.0 milestone Jan 8, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@CascadingRadium CascadingRadium moved this from Done to In Progress in Vector Search v2 Jan 10, 2026
abhinavdangeti
abhinavdangeti previously approved these changes Jan 12, 2026
@abhinavdangeti abhinavdangeti changed the title MB-69881: Improved APIs for vector search MB-69881: Improved APIs and perf optimizations for vector search Jan 12, 2026
@coveralls
Copy link

Coverage Status

coverage: 53.677% (-0.05%) from 53.722%
when pulling abac255 on perf
into 9808f42 on master.

@CascadingRadium CascadingRadium merged commit 32d9882 into master Jan 13, 2026
10 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in Vector Search v2 Jan 13, 2026
@CascadingRadium CascadingRadium deleted the perf branch January 13, 2026 13:33
CascadingRadium added a commit that referenced this pull request Jan 20, 2026
- Use a `bitset` to track eligible documents instead of a slice of `N
uint64s`, reducing memory usage from `8N bytes` to `N/8 bytes` per
segment (up to `64×` reduction) and improving cache locality.
- Pass an iterator over eligible documents that iterates the bitset
directly, allowing direct translation into a bitset of eligible vector
IDs in the storage layer and eliminating the need for a separate slice
intermediary.
- Fix garbage creation in the `UnadornedPostingsIterator`, which
previously allocated a temporary struct per Next() call to wrap a doc
number and satisfy the `Postings` interface; the iterator now returns a
single reusable struct (one-time allocation) consistent with the working
of the `PostingsIterator` in the storage-layer.
- Avoid unnecessary `BytesRead` statistics computation when executing
searches in no-scoring mode, removing redundant work as a
micro-optimization.

---------

Co-authored-by: Abhinav Dangeti <[email protected]>
CascadingRadium added a commit that referenced this pull request Jan 21, 2026
- Backport the following commits into `v8.0.x-couchbase` via cherry-pick
    - #2270
    - #2224
    - #2272

---------

Co-authored-by: Abhinav Dangeti <[email protected]>
Co-authored-by: Copilot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants